class: inverse, center, middle <br> # A Gentle Introduction to Parallel Computing for emLab <br> ### Danielle Ferraro and Gavin McDonald <br> <img src="img/emlab_logo_cutout_w.png" width="20%" style="display: block; margin: auto;" /> ### emLab Study Club<br>9 Jan 2024 --- class:top # Overview .pull-left[ #### Presentation * [What is parallel processing?](#what-is) * [When to use it](#when-to) * [When *not* to use it](#when-not) * [Caveats and things to keep in mind](#caveats) * [R packages for parallelization](#r-packages) - `furrr` ] .pull-right[ #### Demo * Accessing R Studio Server (quebracho) * Parallelizing R code with `furrr` * Using `htop` to observe core/memory usage * Parallel computing within `targets` * Parallel computing for machine learning using `tidymodels` ] -- .center[ <svg viewBox="0 0 496 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M165.9 397.4c0 2-2.3 3.6-5.2 3.6-3.3.3-5.6-1.3-5.6-3.6 0-2 2.3-3.6 5.2-3.6 3-.3 5.6 1.3 5.6 3.6zm-31.1-4.5c-.7 2 1.3 4.3 4.3 4.9 2.6 1 5.6 0 6.2-2s-1.3-4.3-4.3-5.2c-2.6-.7-5.5.3-6.2 2.3zm44.2-1.7c-2.9.7-4.9 2.6-4.6 4.9.3 2 2.9 3.3 5.9 2.6 2.9-.7 4.9-2.6 4.6-4.6-.3-1.9-3-3.2-5.9-2.9zM244.8 8C106.1 8 0 113.3 0 252c0 110.9 69.8 205.8 169.5 239.2 12.8 2.3 17.3-5.6 17.3-12.1 0-6.2-.3-40.4-.3-61.4 0 0-70 15-84.7-29.8 0 0-11.4-29.1-27.8-36.6 0 0-22.9-15.7 1.6-15.4 0 0 24.9 2 38.6 25.8 21.9 38.6 58.6 27.5 72.9 20.9 2.3-16 8.8-27.1 16-33.7-55.9-6.2-112.3-14.3-112.3-110.5 0-27.5 7.6-41.3 23.6-58.9-2.6-6.5-11.1-33.3 2.6-67.9 20.9-6.5 69 27 69 27 20-5.6 41.5-8.5 62.8-8.5s42.8 2.9 62.8 8.5c0 0 48.1-33.6 69-27 13.7 34.7 5.2 61.4 2.6 67.9 16 17.7 25.8 31.5 25.8 58.9 0 96.5-58.9 104.2-114.8 110.5 9.2 7.9 17 22.9 17 46.4 0 33.7-.3 75.4-.3 83.6 0 6.5 4.6 14.4 17.3 12.1C428.2 457.8 496 362.9 496 252 496 113.3 383.5 8 244.8 8zM97.2 352.9c-1.3 1-1 3.3.7 5.2 1.6 1.6 3.9 2.3 5.2 1 1.3-1 1-3.3-.7-5.2-1.6-1.6-3.9-2.3-5.2-1zm-10.8-8.1c-.7 1.3.3 2.9 2.3 3.9 1.6 1 3.6.7 4.3-.7.7-1.3-.3-2.9-2.3-3.9-2-.6-3.6-.3-4.3.7zm32.4 35.6c-1.6 1.3-1 4.3 1.3 6.2 2.3 2.3 5.2 2.6 6.5 1 1.3-1.3.7-4.3-1.3-6.2-2.2-2.3-5.2-2.6-6.5-1zm-11.4-14.7c-1.6 1-1.6 3.6 0 5.9 1.6 2.3 4.3 3.3 5.6 2.3 1.6-1.3 1.6-3.9 0-6.2-1.4-2.3-4-3.3-5.6-2z"></path></svg><br> These slides are available at:<br> [ https://emlab-ucsb.github.io/emlab-parallel-computing]( https://emlab-ucsb.github.io/emlab-parallel-computing)<br><br> GitHub repo is available at:<br> [https://github.com/emlab-ucsb/emlab-parallel-computing](https://github.com/emlab-ucsb/emlab-parallel-computing) ] --- # Disclaimer .center[ <img src="img/googling.png" width="80%" /> ] --- # Recommended resources Resources we used: - [Parallel Computing in R, James Henderson, University of Michigan ](https://jbhender.github.io/Stats506/F20/Parallel_Computing_R.html) - [Introduction to Parallel Computing Tutorial, Lawrence Livermore National Laboratory](https://hpc.llnl.gov/training/tutorials/introduction-parallel-computing-tutorial) - [Parallel programming lecture, Grant McDermott, University of Oregon](https://raw.githack.com/uo-ec607/lectures/master/12-parallel/12-parallel.html) - [tidymodels and tune: Optimizations and parallel processing](https://tune.tidymodels.org/articles/extras/optimizations.html) Other helpful reference material: - [CRAN Task View: High-Performance and Parallel Computing with R](https://CRAN.R-project.org/view=HighPerformanceComputing) --- class: center, middle, inverse name: what-is # What is parallel processing? --- # What is parallel processing? -- A method where a process is broken up into smaller parts that can then be carried out **simultaneously**, i.e., **in parallel** --- # What is parallel processing? -- Traditionally, software is written for **serial processing** -- .center[ <img src="img/serial1.png" width="100%" /> ] --- # What is parallel processing? Traditionally, software is written for **serial processing** .center[ <img src="img/serial2.png" width="100%" /> ] --- # What is parallel processing? Traditionally, software is written for **serial processing** .center[ <img src="img/serial3.png" width="100%" /> ] --- # What is parallel processing? Traditionally, software is written for **serial processing** .center[ <img src="img/serial4.png" width="100%" /> ] --- # What is parallel processing? Running tasks **in parallel** enables us to use multiple computing resources simultaneously to solve a computational problem -- .center[ <img src="img/serial1.png" width="100%" /> ] --- # What is parallel processing? Running tasks **in parallel** enables us to use multiple computing resources simultaneously to solve a computational problem .center[ <img src="img/parallel1.png" width="100%" /> ] --- # What is parallel processing? Running tasks **in parallel** enables us to use multiple computing resources simultaneously to solve a computational problem .center[ <img src="img/parallel2.png" width="100%" /> ] --- # What is parallel processing? Running tasks **in parallel** enables us to use multiple computing resources simultaneously to solve a computational problem .center[ <img src="img/parallel3.png" width="100%" /> ] --- # What is parallel processing? Running tasks **in parallel** enables us to use multiple computing resources simultaneously to solve a computational problem .center[ <img src="img/parallel4.png" width="100%" /> ] --- # What is parallel processing? ### Benefit: task speedup More of your computer's resources used -> Multiple computations run at the same time -> less overall time needed --- # What is parallel processing? -- ### Some jargon - A **core** is the part of your computer's processor that performs computations - Most modern computers have >1 core -- - A **process** is a single running task or program (like R) - each core runs one process at a time -- - A **cluster** typically refers to a network of computers that work together (each with many cores), but it can also mean the collection of cores on your personal computer --- class: center, middle, inverse name: when-to # When to use parallel processing --- # When to use parallel processing When tasks are **pleasingly parallel**, i.e. if your analysis could easily be separated into many identical but separate tasks that do not rely on one another Examples: - Bootstrapping - Monte Carlo simulations - Machine learning (e.g., cross-validation, model selection, hyperparameter tuning, models like random forests that can run in parallel) - Computations by group -- Parallel processing is not effective with something like: ```r function(a) { b <- a + 1 c <- b + 2 d <- c + 3 return(d) } ``` where each step depends on the one before it, so they cannot be completed at the same time. --- # When to use parallel processing When tasks are **pleasingly parallel**, i.e. if your analysis could easily be separated into many identical but separate tasks that do not rely on one another Examples: - Bootstrapping - Monte Carlo simulations - Machine learning (e.g., cross-validation, model selection, hyperparameter tuning, models like random forests that can run in parallel) - Computations by group -- **AND / OR** When tasks are computationally intensive and take time to complete --- class: center, middle, inverse name: when-not # When *not* to use parallel processing --- # When *not* to use parallel processing .center[ <img src="img/traffic_jam.jpg" width="60%" /> ] -- .center[ ### Just because we can, doesn't mean we should ] --- # When *not* to use parallel processing .center[ ### It is not magic! ] -- .center[ <img src="img/distracted_boyfriend.png" width="60%" /> ] --- # When *not* to use parallel processing ### As a first step, always **profile** your code. -- > 1. Find the biggest bottleneck (the slowest part of your code). > 2. Try to eliminate it (you may not succeed but that’s ok). > 3. Repeat until your code is “fast enough.” .right[ -[Advanced R, Hadley Wickham](http://adv-r.had.co.nz/Profiling.html) ] -- ### A few suggestions: - Embrace vectorization - Use functional programming: the `apply` (base) or `map` (`purrr` package) functions - Test out different packages: the `data.table` world is significantly faster than the `tidyverse` --- # When *not* to use parallel processing ### It may not be the right tool for the job. -- .center[Look familiar? You may be memory limited<sup>1</sup> <img src="img/computer_on_fire.png" width="60%" /> ] .footnote[ <sup>1</sup> Try chunking or simplifying your data ] --- # When *not* to use parallel processing ### It may not be efficient. .pull-left[ .center[(A) There is computational overhead for setting up, maintaining, and terminating multiple processors For small processes, the overhead cost may be greater than the speedup! ] ] -- .pull-right[ .center[(B) Efficiency gains are limited by the proportion of your code that can be parallelized - see [Amdahl's Law](https://en.wikipedia.org/wiki/Amdahl%27s_law) <img src="img/amdahls_law_graph.png" width="100%" /><img src="img/amdahls_law_eq.png" width="100%" /> ] ] --- class: center, middle, inverse name: caveats # Caveats and things to keep in mind --- # Caveats and things to keep in mind - Implementation varies across operating systems - Two ways that code can be parallelized: **forking** vs **sockets** - **Forking** is faster, but doesn't work on Windows and may cause problems in IDEs like RStudio - **Sockets** are a bit slower, but work across operating systems and in IDEs -- - Choose an **optimal number of cores** for the task at hand (consider the cost of overhead vs. the speedup, the number of cores available, the number of cores needed for other tasks, if you are on a shared system, etc) - On a personal machine: `future::availableCores(omit = 1)` - On a shared system, like the emLab server: best practice TBD! -- - **Debugging** can be tricky, so try to get your code running sequentially first -- - Be careful when generating **random numbers** across processes (R has some [strategies](https://www.r-bloggers.com/2020/09/future-1-19-1-making-sure-proper-random-numbers-are-produced-in-parallel-processing/) to deal with this) -- - When finished, [explicitly close the parallel workers](https://cran.r-project.org/web/packages/future/vignettes/future-7-for-package-developers.html) and return to the default sequential processing - `future::plan("sequential")` - `parallel::stopCluster(cl)` --- class: center, middle, inverse name: r-packages # R packages for parallelization --- # R packages for parallelization ### There are numerous approaches for parallelizing code in R. -- 1. The `parallel` package is a built-in base R package 2. The `future` package is newer and a bit friendlier (**recommended**) Many different R packages for parallel processing exist, but they leverage either `parallel` or `future` as a *backend*, or both (and let us decide which to use). --- # R packages for parallelization Today's demo will use `furrr` (which uses `future`). `furrr` is the parallel version of `purrr` and contains near drop-in replacements for `purrr`'s family of `map()` and `walk()` functions. |purrr |furrr | |:-----------|:------------------| |`map()` |`future_map()` | |`map_df()` |`future_map_dfr()` | |`map_chr()` |`future_map_chr()` | |`map2()` |`future_map2()` | |`pmap()` |`future_pmap()` | |`walk()` |`future_walk()` | |`walk2()` |`future_walk2()` | .center[...and so on and so forth] --- # R packages for parallelization See slides at the end of this presentation for brief comparisons of serial vs. parallel code within the following R packages: - `parallel` - `dopar` (can use `parallel` or `future`) - `furrr` (uses `future`) - `future.apply` (uses `future`) ...but there are many others - Check out [multidplyr](https://github.com/tidyverse/multidplyr) - See the [CRAN Task View: High-Performance and Parallel Computing with R](https://CRAN.R-project.org/view=HighPerformanceComputing) for a very extensive list # Demo time! --- class: center, middle, inverse # Appendix <br> ### Serial vs. parallel code in common R packages for parallel computation --- # The [`parallel`](https://stat.ethz.ch/R-manual/R-devel/library/parallel/doc/parallel.pdf) package **Purpose:** parallel versions of apply functions. This package is a part of core R and thus you do not need to install it. Note that it is operating system dependent. Unix/Linux (forking): ```r # Serial lapply(1:10, sqrt) # Parallel mclapply(1:10, sqrt, mc.cores = 4) # Specify 4 cores ``` -- ‍Windows (sockets): ```r # Serial lapply(1:10, sqrt) # Parallel cluster <- makePSOCKcluster(4) # Specify 4 cores parLapply(cluster, 1:10, sqrt) ``` .footnote[ <sup>1</sup> Disclaimer: The code on this and the following slides would never be parallelized in practice - they are just examples to highlight package syntax. ] --- # The [`foreach`](https://github.com/RevolutionAnalytics/foreach) package **Purpose:** allows you to use `for` loops in parallel and choose your backend (e.g. `parallel` vs. `future`) ```r # Serial for(i in 1:10) print(sqrt(i)) # Parallel cluster <- makeCluster(4) # Specify 4 cores doParallel::registerDoParallel(cluster) # using {parallel} backend foreach(i = 1:10) %dopar% { sqrt(i) } cluster <- makeCluster(4) # Specify 4 cores doFuture::registerDoFuture(cluster) # using {future} backend foreach(i = 1:10) %dopar% { sqrt(i) } ``` --- # The [`furrr`](https://github.com/DavisVaughan/furrr) package **Purpose:** parallel versions of `purrr::map()` functions using futures ```r # Serial map_dbl(1:10, sqrt) # Parallel plan(multisession) future_map_dbl(1:10, sqrt) ``` --- # The [`future.apply`](https://github.com/HenrikBengtsson/future.apply) package **Purpose:** parallel versions of `apply()` functions using futures ```r # Serial lapply(1:10, sqrt) # Parallel plan(multisession) future_lapply(1:10, sqrt) ``` --- name: samplecode class: center, middle # Sample code For short examples of how to use these R packages for parallel computing, see [sample_code.Rmd](https://github.com/danielleferraro/rladies-sb-parallel/blob/main/sample_code/sample_code.md)<br>