High-performance computing at emLab

Group learning and brainstorming session

October 12, 2022

Road map for today

  • Why high-performance / cloud computing?
  • Group discussion: What systems do folks currently use, and what use-cases do folks foresee?
  • Overview of Google VMs
  • Overview of UCSB Center for Scientific Computing (CSC) servers
  • Comparing the two approaches
  • Live walk-throughs of Google VMs and UCSB servers
  • Group discussion: Where does emLab go from here?

Why high-performance / cloud computing?

  • Running analyses in parallel

    • (where more cores are good!)
  • Running analyses with data that are too big for your personal computer

    • (where more RAM is good!)
  • Access to more GPUs for certain types of analyses

  • Freeing up your personal computer to do other things

  • The ability to run software and packages on specific operating systems or environments

Questions for the group

  1. What high-performance computing options do you currently use?

  2. What do folks foresee as use-cases for using these types of high-performance systems?

Google virtual machines (VM) using Compute Engine

  • Uses Google’s Compute Engine, part of the Google Cloud Platform ecosystem of products
  • Spin up virtual machines (VMs) while specifying the number of cores you want and how much RAM you want
    • Up to 224 cores and 896 GB RAM!
  • GPUs are also available (up to 16!)

Google virtual machines (VM) using Compute Engine

  • Pricing is straightforward and on a per-hour basis
    • 8 cores and 32 GB of RAM costs $0.27/hour
    • 16 cores and 64 GB of RAM costs $0.54/hour
    • 32 cores and 128 GB of RAM costs $1.08/hour
    • 224 cores and 896 GB RAM costs $7.57/hour
  • GPU pricing can be found here

Google virtual machines (VM) using Compute Engine

UCSB high performance computing clusters

  • Uses the Pod, a high performance computing cluster, available through the UCSB Center for Scientific Computing (CSC)
  • The pod has 71 nodes:
    • 64 regular nodes each with 40 cores
    • 4 large memory nodes with more than 1TB of RAM
    • 3 Graphic Processing Unit (GPU) nodes with 4 32GB V100s connected via NVLink (for image and video processing and some types of machine learning)

UCSB high performance computing clusters

  • Pricing: It’s free!
    • The clusters are funded through an NSF grant and free to use for USCB faculty with proper citation

UCSB high performance computing clusters

  • Getting started on pod:
    • Register for an account
    • Access via the terminal
    • With X2GO client
    • With Visual Studio Code remote connections
    • Off-campus: can use any of the above methods but have to connect to the campus VPN first

Working with data

  • Manually move data
    • For Google VMs - Interactively upload/download data through R Studio in your browser
    • For UCSB servers: Applications like FileZilla can help with transferring data between local and remote servers

Working with data

Working with data

  • For both approaches: googledrive R package can read data from the emLab Team Drive (but it’s not recommended - it’s very clunky!)

Comparing the two approaches

  • Cost
  • Computing power
  • Working with data
  • Ease of use

Our recommendation (for most applications)

Google VMs using Compute Engine

Live walk-through time!

  • r/google_compute_engine_setup.R

  • r/google_compute_engine_analysis.R

  • r/ucsb_server_demo.R

Discussion

  • How many people foresee using these types of tools?
  • What do folks think of our recommendation to generally use Google VMs with Compute Engine?
  • Should we update the emLab SOP?
  • Should we think about a standard emLab VM Docker image that includes our commonly used packages?
  • Should we make an emLab R package to help automate some of this?
  • We may need to think about how to better monitor and manage billing across projects