4.6 Accessing data
Please refer to this section of the emLab SOP for a description of the data directory structure for our emLab GRIT data storage space.
All data in the emLab GRIT data storage space can be directly accessed
on each of the servers (sequoia and quebracho) without any changes to
the directory paths. All data in the emlab/data
and
emlab/projects/current-projects
directory physically lives on
high-speed hard drives attached to sequoia, so if you need to work on
data in these directories, you will have the best computing performance
when using sequoia. Please refer to this
section
of the emLab SOP for a code snippet that can be used to directly access
data on the server in R.
In addition to having access to our emLab GRIT data storage space, which
is shared across all members of our team, all individual users also have
a private user-specific storage space. All GRIT users get a free 50GB
personal storage space by default. As a general best practice, we
recommend storing all data on the emLab data storage space, and only
storing cloned GitHub repos and user-specific R packages and settings in
your personal user space. For example, you should store all
project-specific data in the appropriate directory under the emLab data
storage space, but you should store all of your cloned GitHub repos s in
your personal storage space. By default, when you clone repos from
GitHub they are stored in your personal storage space, along with any of
your user-specific R packages and configurations. If for whatever reason
your personal storage space exceeds 50GB, it will stop working, so you
should ensure you always have a safe buffer. However, we envision that
if users only keep cloned GitHub repos and R packages in their personal
user space, they should not need to worry about hitting the 50GB limit.
You can check your current personal storage by typing df -h in the
terminal and then looking for your username.