4 High Performance Computing
Certain analysis use cases require high performance computing resources:
- big data
- parallel computing
- lengthy computation times
- restricted-use data
For analyses involving big data or models that take a long time to estimate, a single laptop or desktop computer is often not powerful enough or becomes inconvenient to use. Additionally, for analyses involving restricted-use data, such as datasets containing personally identifiable information, data use agreements typically stipulate that the data should be stored and analyzed in a secure manner.
In these cases, you should use the high performance computing resources available to emLab. emLab currently has two high performance computing servers that are managed by UCSB’s General Research IT (GRIT). These servers are named sequoia and quebracho. This section of the manual describes how to use these two servers for high performance computing.
For now, please use sequoia for general emLab computing for most projects. Quebracho is currently restricted to land use projects (e.g., land-based-solutions and projects starting with cel), so please only use quebracho if you have already been doing so and have already discussed this with Kathy or Robert. If you have any doubts about which server to use, please use sequoia. Note that sequoia does not have a GPU, but quebracho does. If you need access to a GPU and are not already using quebracho, please contact Robert and Kathy to talk about using quebracho.
4.1 Available resources
| Cores | RAM | GPU | USE | |
|---|---|---|---|---|
| quebracho | 64 | 1TB | Yes | Only land-use projects (please check before using) |
| sequoia | 192 | 1.5TB | No | All other emLab research |
| Knot | 1,500 | 48 GB - 1TB | Yes | UCSB shared resource |
| Pod | ~2,600 | 190 GB - 1.5 TB | Yes | UCSB shared resource |
| Braid2 | ~2,200 | 192 - 368 GB | Yes | UCSB condo cluster (PI must buy node) |
The emLab SOP will focus on using quebracho and sequoia. For further information on using other UCSB campus resources, you can refer to our specific guide on that. However please note that this guide is several years out-of-date, and you may find better and more current information directly on a UCSB website. Additionally, now that we have our own HPC servers, we no longer recommend using Google Compute Engine, which is a pay-as-you-go cloud computing server. It can be quite expensive, and has setup challenges as compared to our own servers. However, if you need to use GCE for whatever reason, emLab alumni Grant McDermott wrote a very helpful tutorial on using R Studio Server on GCE.
4.2 Available software
Both quebracho and sequoia currently have R Studio Server, Jupyter Notebook, and VSCode installed. Positron is available on sequoia through a remote desktop Linux server. GRIT manages these installations for us. They will also manage updates for these.
Both quebracho and sequoia also leverage SLURM queueing systems. All computational activity is forced to go through SLURM, even interactive R Studio and Jupyter Notebook sessions (see section on Open OnDemand below). Scrontab can be used to manage SLURM crontab files.
If we wish to install additional software, we will need to decide on these as a group and have GRIT install them for us. When considering new software to install, we should consider whether or not it is already available on other campus servers; what it will cost; and how many people in emLab would use it. Generally speaking, if a specific piece of software is expensive (e.g., Stata or Matlab), will not be used by too many emLab folks, and is already available on other campus servers, we should rely on these other campus servers and not install it on our own servers. Users interested in MatLab should first try Pod which has the necessary licenses and is available for free.
If users wish to use python it is recommended that they install Visual Studio Code (VS Code) available for free from Microsoft. With VS Code installed, users can add the Remote SSH extension and access sequoia via SSH tunnel. Further instructions can be found in the VS Code Documentation. After accessing sequoia via SSH tunnel, users may install their preferred python distribution. Miniconda is a good starting point, though other options are available. This Medium article is a good place for further installation guidance. Finally, it is recommended to create custom python environments for each project. All ssh sessions go into the slurm queue.
For users that need Stata, it is already available on both UCSB’s Knot cluster. More details for using Stata on Knot can be found here. We will not be installing Stata on quebracho or sequoia.
For users of Matlab, it is already available on all campus clusters. More details can be found hehttps://csc.cnsi.ucsb.edu/docs/using-matlabre. We will not be installing Matlab on quebracho or sequoia.
4.3 Installing packages
You can install regular user-level R packages just like you would normally using R Studio on your local machine. We recommend using the renv R package to manage package dependencies for each project (i.e., GitHub repo) you work in. Please refer to the emLab SOP section on reproducibility for more information on renv.
Additionally, GRIT installs and updates many commonly used R packages on the servers, which are accessible in a “site-library” for each server. They update these once or twice a year. To add the GRIT R package library to your library paths, you can run this line of code: .libPaths(c("/usr/local/lib/R/site-library/", .libPaths()))
For system-level packages that you would normally need to install through the terminal on your local machine (e.g., packages like gdal or libproj), we will need to have GRIT install and manage these for us. We have already had GRIT install many commonly-needed system-level packages, which they will update once or twice a year. If you need a particular package that is not yet installed, please start a help ticket directly with GRIT: help@grit.ucsb.edu .
4.4 Setting up a GRIT account
To use emLab’s HPC servers, you must have a GRIT account. Please refer to the emLab Manual section on setting up and managing an account with GRIT.
4.5 Using sequoia
Sequoia leverages Open OnDemand (OOD), a system managed by GRIT to connect our HPC resources to commonly used software. OOD is accessed via an online dashboard in your web browser: https://hpc.grit.ucsb.edu
Once at the website, you can use your GRIT user ID and password. The system does not require you to be on campus or use a VPN.
Through OOD, you can use the following software:
- R Studio Server
- Jupyter Notebook
- VS Code Server
- Positron (via a Linux remote desktop server)
To use any of these, click “Interactive Apps”, then click on your software of choice (to access Positron, click “Desktop”). Each time you launch an interactive app, you need to specify the following up-front before launching the job:
Partition name: Type
emlab_nodesto use emLab’s private sequoia resources (this should be used in most cases). Alternatively, you can typegrit_nodesto use GRIT’s campus-wide shared resourcesJob duration (up to 168 hours): Your interactive session will run for this amount of time, and then shut down (you can cancel jobs before the ending time if you desire)
Number of cores: How many CPU cores you want to use for your interactive session. Currently, we have this set to a maximum of 24 cores per session. Since sequoia is a shared resource, please be considerate of others when requesting cores. If you are unsure of how many cores to use, we have another section of the SOP below to help you figure this out.
RAM: How much RAM you want to use for your interactive session. Currently, we have this set to a maximum of 256GB per session. Since sequoia is a shared resource, please be considerate of others when requesting cores. If you are unsure of how much RAM to use, we have another section of the SOP below to help you figure this out.
Please do not enable to “Use GPU” option unless you have already discussed this with Kathy or Robert (it is only available on quebracho for land use projects at this time).
Note that for convenience, your last used settings will be saved for the next time you launch an interactive app.
Once you’ve configured your session, click “Launch”. It may take a few minutes for your session to start up. Once it is ready, you will see a link to open R Studio Server (or whichever app you selected). Click the link to open R Studio Server in a new browser tab.
Note that you may use OOD to launch multiple interactive sessions at the same time! They are each still managed through SLURM.
More resources for OOD are available on GRIT’s bookstack:
4.6 Best practices
Here we outline our best practices for using shared computational resources. These are meant to be living guidelines that will be adapted by our team as needed:
Sharing is caring! Common courtesy can go a long way. As much as possible, try to use only the resources you need.
Leverage the tools below to monitor how many cores and how much RAM you and others are currently using
In general on sequoia, feel free to run analyses that use up to 24 cores and 256GB of RAM. We will likely adaptively manage these specific numbers once we start using sequoia and getting a better understanding of how many resources we are using. And if you don’t need that much, please request less so that others can use the resources they need.
For larger analyses that require lots of cores or RAM, coordinate with others over the server slack channel (
#hpc-core-dination) to ensure that workflows are not disrupted and that everyone has reasonable access to computational resourcesGenerally, we recommend piloting your code using a small subset of your data and/or just a single core, either on your local computer or on one of our HPC servers. Then once you know it works and have a sense of how much memory it will use and how long it will take to execute, you can go ahead and run the full analysis on the server. And if it looks like the full analysis will require resources beyond the standard recommend 24 cores and 256GB, coordinate with the team on the Slack channel
#hpc-core-dination.
4.7 Resource allocation
It is up to each researcher to: 1) decide how much computational resources they need prior to starting each job (i.e., how many cores and how much RAM will your job need); and 2) monitor their resource usage during each job to not only ensure they are not exceeding their allocated resources, but to help build personal awareness on how much resources certain jobs require. It will generally take each person some time to develop an intuition on how many resources each job will need; we always recommend piloting code locally on your own machine to start getting a sense for this. Additionally, as you use the server more and the tools below, you will further develop this intuition.
Available tools:
Zabbix dashboard: GRIT manages a dashboard for us which provides a high-level system-level view of how many resources are being used on each of our servers. This is a good place to start to get a sense of overall resource usage across the whole team. It is also a good place to check before starting a new job to see how busy the servers are. It is also a place that will show the current status of each server and whether or not there are currently any problems.
Job Resource Utilization Analyzer: If using sequoia via OOD, you can navigate to the OOD dashboard, click “Apps”, then click “Job Resource Utilization Analyzer”. For each job that is currently running on a GRIT server, this will show you how many cores and how much RAM was requested, as well as the actual peak core and RAM usage. It further provides some information on whether or not you over-requested core and/or RAM resources, and provides some recommendations if any adjustments should be made in the future. This is a nice way to monitor live usage.
Post-job email: If using sequoia via OOD, at the end of each job you will receive an email summary of your job resource usage. This email provides similar information to the Job Resource Utilization Analyzer, but is sent to you automatically at the end of each job. It also provides information on whether or not you over-requested core and/or RAM resources, and provides some recommendations if any adjustments should be made in the future. This is a nice way to monitor your resource usage after the fact. GRIT provides some helpful rules of thumb:
- “If Max RSS is much lower than requested memory, you may be over-requesting RAM.”
- “If the job failed with OUT_OF_MEMORY and Max RSS is close to the request, you likely need to request more memory.”
- “If Total CPU << (Elapsed time * AllocCPUS), your job may be I/O bound or under-utilizing its cores.”
htop: This terminal tool is installed on each server and provides a real-time view of resource usage on each server. It is a great tool to use during an interactive session to see how many cores and how much RAM your job is using, and also how many cores and how much RAM are currently being used by others. You can customize the htop display to make things easier to see. For example:
- After entering htop, press F2 to enter setup. You can also click directly on setup to enter it.
- Once you enter setup, if you have trouble seeing the setup options, you can try reducing your browser’s text size temporarily in order to see the setup options.
- Sequoia has 192 cores so the default view with 4 columns means a pretty large display. In the Meters setup, you can change the left column to be
CPUs (1-4/8) [Bar]and the right column to beCPUs (5-8/8) [Bar]This will condense the output and force 8 columns. - I also like to add disc IO to the left column below memory.
- In the “Display options” setup you can select some that will clean up the process information below the resources monitor. I like to make sure to select
- Tree view
- Tree view sorted by PID
- Shadow other users’ process (makes it easier to see your own)
- Count CPUs from 1
- Enable the mouse
- Press F10 when done
4.8 Using quebracho
On your local machine, connect to the UCSB Campus VPN. You can do so by downloading a VPN client for your operating system, such as Ivanti Pulse Secure. More details for connecting to the UCSB VPN and installation instructions are provided here. Note: Even if you are on the UCSB campus you will still need to connect to the campus VPN.
Once you’ve connected to the VPN, you are ready to access the server. You can access quebracho via SSH if you wish to use something like VS Code. To access R Studio Server, you can simply navigate to the following link. Once there, you will be prompted to enter your GRIT user ID and password. Once you’ve done this, you are ready to use R Studio Server!
- Quebracho: https://quebracho.geog.ucsb.edu/rstudio/
4.9 Accessing data
Please refer to this section of the emLab SOP for a description of the data directory structure for our emLab GRIT data storage space.
All data in the emLab GRIT data storage space can be directly accessed on each of the servers (sequoia and quebracho) without any changes to the directory paths. All data in the emlab/data and emlab/projects/current-projects directory physically lives on high-speed hard drives attached to sequoia, so if you need to work on data in these directories, you will have the best computing performance when using sequoia. Please refer to this section of the emLab SOP for a code snippet that can be used to directly access data on the server in R.
In addition to having access to our emLab GRIT data storage space, which is shared across all members of our team, all individual users also have a private user-specific storage space. All GRIT users get a free 50GB personal storage space by default. As a general best practice, we recommend storing all data on the emLab data storage space, and only storing cloned GitHub repos and user-specific R packages and settings in your personal user space. For example, you should store all project-specific data in the appropriate directory under the emLab data storage space, but you should store all of your cloned GitHub repos s in your personal storage space. By default, when you clone repos from GitHub they are stored in your personal storage space, along with any of your user-specific R packages and configurations. If for whatever reason your personal storage space exceeds 50GB, it will stop working, so you should ensure you always have a safe buffer. However, we envision that if users only keep cloned GitHub repos and R packages in their personal user space, they should not need to worry about hitting the 50GB limit. You can check your current personal storage by typing df -h in the terminal and then looking for your username.
4.10 Accessing code
Here we can talk about how to use the servers with GitHub code management. Essentially, you can work with projects and GitHub repositories on RStudio Server exactly like you can on your personal machine. One major difference is how to set up GitHub authentication, which is a little different on the servers than it would be on your personal machine. So we can provide explicit instructions on doing that.
Please refer to this section of the emLab SOP for directions on how to set up and manage git and GitHub for your new server workspace. One important difference between your personal laptop and using a server is that file permissions may be such that other users can see and sometimes read or write files in your directories. Ideally, any confidential information such as your git credentials should be secured differently from your personal computer. Step 6 of the Git and Github section of the emLab manual is therefore not recommended in a multi-user server environment because your token may end up viewable to other users as plain text.
Instead of storing your Personal Authentication Token (PAT) as plain text, it is recommended to use one of the following options. Using either of these two approaches will also mean that credentials are stored between sessions, which should make the user experience a bit easier. The first approach, using an SSH key, is recommended
Use an SSH key instead of a PAT
Set up your SSH key on the GRIT server.
If you are not using R Studio Server, or prefer to use the terminal, follow these instructions:
You can generate a new SSH key with the terminal command
- ssh-keygen -t ed25519 -C “email@example.com”
You are prompted to select a location (hit enter for the default location)
You are prompted to set a password (hit enter to not require one)
Start your SSH agent in the background with
- eval “$(ssh-agent -s)”
Add your private key to the SSH agent with
- ssh-add ~/.ssh/id_ed25519
Copy your public key with
- cat ~/.ssh/id_rsa.pub | xclip -selection clipboard
If you are using R Studio Server and you prefer to not use the terminal approach, the instructions are a bit more streamlined. Follow the instructions in this link.
- If after going through these instructions you prefer to not use a password, you can remove it using the instructions provided in this link.
Add your public key to your GitHub account
GitHub > Settings > SSH and GPG keys > New SSH key
Paste in your key (either from Step 6 in the terminal option above, or copied from R Studio Server in the R Studio option above)
Now you can clone your repository onto the GRIT server. This means that you need to either: 1) when cloning the repository for the first time you need to use the SSH url rather than the HTTPS url; or 2) if you’ve already clone the repository, set your repository URL to the SSH version
If cloning a repo for the first time using R Studio Server, you can simply click “File > New Project > Version Control > Git”, and then enter your repo’s SSH link
- git@github.com:username/example_repo.git (for example, this might look like git@github.com:emlab-ucsb/ocean-ghg.git)
Alternatively, in the terminal You can manually set specific repo URLs to SSH with:
- git remote set-url origin git@github.com:username/example_repo.git (for example, this might look like git@github.com:emlab-ucsb/ocean-ghg.git)
Caching your PAT temporarily
Create a PAT on GitHub
Add credential cache timeout instructions to your git config file
git config –global credential.helper ‘cache –timeout=3600’
Adjust the timeout length (in seconds) as needed
Push changes to GitHub
When prompted enter your username
For the password enter your PAT
- Future pushes will not not require you to enter credentials within the timeout window
This is not a good long term solution because you will need to re-enter your credentials anytime the server restarts or when your cache timeout ends