High-performance computing at emLab
Group learning and brainstorming session
October 12, 2022
Road map for today
- Why high-performance / cloud computing?
- Group discussion: What systems do folks currently use, and what use-cases do folks foresee?
- Overview of Google VMs
- Overview of UCSB Center for Scientific Computing (CSC) servers
- Comparing the two approaches
- Live walk-throughs of Google VMs and UCSB servers
- Group discussion: Where does emLab go from here?
Questions for the group
What high-performance computing options do you currently use?
What do folks foresee as use-cases for using these types of high-performance systems?
Google virtual machines (VM) using Compute Engine
- Uses Google’s Compute Engine, part of the Google Cloud Platform ecosystem of products
- Spin up virtual machines (VMs) while specifying the number of cores you want and how much RAM you want
- Up to 224 cores and 896 GB RAM!
- GPUs are also available (up to 16!)
Google virtual machines (VM) using Compute Engine
- Pricing is straightforward and on a per-hour basis
- 8 cores and 32 GB of RAM costs $0.27/hour
- 16 cores and 64 GB of RAM costs $0.54/hour
- 32 cores and 128 GB of RAM costs $1.08/hour
- 224 cores and 896 GB RAM costs $7.57/hour
- GPU pricing can be found here
Google virtual machines (VM) using Compute Engine
- Three ways to spin up VMs:
- Can install R Studio, allowing you to run R Studio interactively through your web browser
- R Studio gives you all the power of R, Python, SQL, Stan, etc
Working with data
- Manually move data
- For Google VMs - Interactively upload/download data through R Studio in your browser
- For UCSB servers: Applications like FileZilla can help with transferring data between local and remote servers
Working with data
- For Google VMs - Can also use Google Cloud Storage with the googleCloudStorageR package, or by setting up a “persistent R Studio server”
- For Google VMs: Mount Google Drive as disk (not currently working on Google VMs - but hopefully will be soon!)
Working with data
- For both approaches: googledrive R package can read data from the emLab Team Drive (but it’s not recommended - it’s very clunky!)
Comparing the two approaches
- Cost
- Computing power
- Working with data
- Ease of use
Our recommendation (for most applications)
Google VMs using Compute Engine
Discussion
- How many people foresee using these types of tools?
- What do folks think of our recommendation to generally use Google VMs with Compute Engine?
- Should we update the emLab SOP?
- Should we think about a standard emLab VM Docker image that includes our commonly used packages?
- Should we make an emLab R package to help automate some of this?
- We may need to think about how to better monitor and manage billing across projects