4.6 Accessing data

Please refer to this section of the emLab SOP for a description of the data directory structure for our emLab GRIT data storage space.

All data in the emLab GRIT data storage space can be directly accessed on each of the servers (sequoia and quebracho) without any changes to the directory paths. All data in the emlab/data and emlab/projects/current-projects directory physically lives on high-speed hard drives attached to sequoia, so if you need to work on data in these directories, you will have the best computing performance when using sequoia. Please refer to this section of the emLab SOP for a code snippet that can be used to directly access data on the server in R.


In addition to having access to our emLab GRIT data storage space, which is shared across all members of our team, all individual users also have a private user-specific storage space. All GRIT users get a free 50GB personal storage space by default. As a general best practice, we recommend storing all data on the emLab data storage space, and only storing cloned GitHub repos and user-specific R packages and settings in your personal user space. For example, you should store all project-specific data in the appropriate directory under the emLab data storage space, but you should store all of your cloned GitHub repos s in your personal storage space. By default, when you clone repos from GitHub they are stored in your personal storage space, along with any of your user-specific R packages and configurations. If for whatever reason your personal storage space exceeds 50GB, it will stop working, so you should ensure you always have a safe buffer. However, we envision that if users only keep cloned GitHub repos and R packages in their personal user space, they should not need to worry about hitting the 50GB limit. You can check your current personal storage by typing df -h in the terminal and then looking for your username.