Package and language version management
Reproducible research requires not just the same code, but the same computational environment. To ensure that our analyses can be replicated years later or shared across different machines today, emLab adheres to a tiered strategy for managing software versions. This “future-proofing” approach prevents the common frustration where code breaks because a package was updated or a system dependency changed.
R
For R projects, we manage three distinct layers of the environment: the R version itself, the project-specific library of R packages, and the source of the packages. By managing each of these layers, we ensure that an update to your global R installation doesn’t inadvertently break a legacy project.
R version management
Before managing packages, we must manage R itself. Different versions of R can introduce breaking changes in the underlying C++ code or require different package binaries. Using the “latest” version of R is often fine for new work, but reproducing an analysis from three years ago often requires the exact R version used at that time.
Using Positron (recommended)
If you use Positron, managing multiple R versions is built in. Positron detects all R installations on your machine and lets you switch between them per-session from the interpreter selector in the top-right corner of the IDE — no command-line tools required. To install a new R version, download and install it from CRAN and Positron will pick it up automatically.
Using rig (non-Positron users)
If you are not using Positron, we recommend rig (The R Installation Manager) to handle multiple R versions on a single machine seamlessly.
- Why use rig? It allows you to quickly switch between R versions without manual uninstalls. Crucially, it installs R in a way that doesn’t require
sudoprivileges for library paths, preventing permissions issues that often plague multi-user servers. This ensures thatrenvis always drawing from the correct, isolated R binary. - Workflow:
- Installation: Install the version specified in the project documentation (e.g.,
rig add 4.3.2). - Context Switching: Use
rig default 4.3.2to set the system-wide default, or userig run <version>to start a specific session.
- Installation: Install the version specified in the project documentation (e.g.,
Documentation
Regardless of how you manage R versions, always document the R version used in the project’s README.md and ensure it matches the R field in the renv.lock file.
Package version management with renv
Every emLab project should be “hermetic”—meaning its packages are isolated from the rest of your system. We use renv to create a private library of packages for each project.
- Initialization: Run
renv::init()at the start of a project. This creates a local.Rprofilethat tells R to use a project-specific library instead of your global one. - Daily Workflow:
renv::snapshot(): Save the state of your library to therenv.lockfile. This JSON file contains the exact version and source (CRAN, GitHub, Bioconductor) of every package. Commit this file to GitHub.renv::status(): Frequently check if your local library is in sync with your lockfile.renv::restore(): When pulling changes from GitHub, run this to sync your local library with the project’s lockfile. This is the “magic button” that rebuilds the environment on a colleague’s computer.
- Efficiency:
renvuses a global cache. This means that if ten different projects useggplot2version 3.4.0, it is only stored on your hard drive once, but linked into each project’s library. - Collaboration: Never include the
renv/libraryfolder in your git commits (it should be in.gitignore). Only commit therenv.lock,.Rprofile, andrenv/activate.Rfiles.
Posit Package Manager (PPM)
To ensure that renv::restore() always finds the exact same package versions and to speed up installation via pre-compiled binaries, we use Posit Package Manager.
Fixed Snapshots: Instead of pointing to the “latest” CRAN, which changes daily, we point to a specific “frozen” date. This eliminates the risk of “dependency hell” where Package A updates and is no longer compatible with Package B.
Binary Advantages: PPM provides binaries for specific Linux distributions (like Ubuntu Jammy). This means that instead of waiting 20 minutes to compile a complex spatial package like
sforterrafrom source, you can download a pre-built version in seconds.Configuration: You can configure your IDE (RStudio or Positron) to use PPM by default for all new work so you don’t have to manually call the options every time.
The universal R profile (recommended)
In your R console, run the following to edit your .Rprofile:
usethis::edit_r_profile()Paste the following line and save:
options(repos = c(PPM = "https://packagemanager.posit.co/cran/__linux__/jammy/latest"))IDE-specific options
RStudio: Go to Tools > Global Options… > Packages and paste the PPM URL into the “Primary CRAN repository” field.
Positron: Open Settings (Cmd + ,), search for r.defaultRepositories, and add the CRAN/PPM URL to your
settings.json.
IMPORTANT Global vs. Project Settings: Global defaults are for your convenience. However, you must still include the specific repository URL in each project’s .Rprofile to ensure reproducibility for your collaborators.
Common pitfalls & troubleshooting (R)
Using rig and renv together is robust, but keep an eye out for these common issues that can stall a project:
- The R Version Mismatch: You might open a project and find
renvcomplaining that it was initialized with R 4.2.0 while you are currently running 4.4.1.- The Consequence: R may try to install packages meant for a newer version into an older environment, causing “binary not found” errors.
- The Fix: Use
rig run 4.2.0(or set it as default) before opening the project. Always check your current R version withversionorR.version.stringif things feel “off.”
- Stale Lockfiles: If you install a new package via
install.packages()but forget to runrenv::snapshot(), yourrenv.lockfile won’t reflect the change.- The Consequence: When a collaborator tries to
renv::restore(), they won’t get the new package, and their code will break with “package not found.” - The Fix: Make
renv::status()a habit before you commit and push to GitHub. It will tell you if your lockfile and library are out of sync.
- The Consequence: When a collaborator tries to
- Missing System Dependencies:
renvmanages R packages, but it does not manage system-level software like GDAL, GEOS, or PROJ (essential for emLab spatial work).- The Consequence: Package installation fails during the “compilation” phase with cryptic errors about missing headers or
.sofiles. - The Fix: You must install the system library on your OS (e.g., via
brew install gdalon Mac orsudo apt install libgdal-devon Linux) beforerenvcan successfully build the R package.
- The Consequence: Package installation fails during the “compilation” phase with cryptic errors about missing headers or
- Committing the Library: Occasionally, users accidentally add the
renv/libraryfolder to Git.- The Consequence: This bloats the repository size by hundreds of megabytes and causes errors for others because those binaries are specific to your operating system and R version.
- The Fix: Ensure your
.gitignoreincludesrenv/library/. If you’ve already committed it, you’ll need to usegit rm -r --cached renv/libraryto remove it from tracking.
Python
For Python, we follow a similar isolation philosophy as we do with R to avoid “dependency drift.”
Python version management
Specifying the Python version your project uses is just as important as pinning package versions. New Python releases can introduce breaking syntax changes, and your collaborators may have different versions installed.
Using Positron (recommended)
If you use Positron, it automatically detects all Python interpreters on your machine — including those managed by pyenv, Conda, or system installs — and lets you select the interpreter per-project from the interpreter selector in the top-right corner of the IDE. This means you can switch Python versions without touching the command line.
Using pyenv (non-Positron users)
If you are not using Positron, pyenv is the standard tool for managing multiple Python versions on a single machine.
- Installation: Follow the pyenv installer instructions for your OS.
- Install a version:
pyenv install 3.11.9 - Set a project version: Run
pyenv local 3.11.9in your project directory. This creates a.python-versionfile that pyenv reads automatically whenever you enter that directory. Commit this file to GitHub so collaborators use the same version. - Set a global default:
pyenv global 3.11.9
Documentation
Regardless of how you manage Python versions, document the required version in the project README.md and specify it explicitly in your environment.yml (for Conda/Mamba projects) or .python-version file.
Virtual environments (venv)
Avoid installing packages to your “Global” Python, as this can break system-level tools. Always create a virtual environment within your project directory to keep dependencies contained.
- Convention: We use the name
.venvfor our environment folders. - Creation:
python -m venv .venv - Activation:
- Windows:
.venv\Scripts\activate - Mac/Linux:
source .venv/bin/activate - Once activated, your terminal prompt will usually change to show
(.venv), indicating that anypip installcommands will only affect this project.
- Windows:
- Git: Ensure
.venv/is added to your.gitignoreto prevent committing thousands of small library files.
Dependency tracking
To replicate a Python environment, we use requirements.txt for standard pip-based projects or environment.yml for more complex stacks.
- The Importance of Pinning: Simply listing
pandasin a file is not enough; we should listpandas==2.1.0. This prevents “silent failures” where code runs but produces different numerical results due to underlying algorithm changes in newer package versions. - Exporting: Use
pip freeze > requirements.txtto capture every sub-dependency and its exact version. - Installing: When joining a project, run
pip install -r requirements.txtto recreate the environment instantly.
Conda and Mamba
For projects with complex non-Python dependencies—specifically spatial libraries like GDAL, GEOS, or PROJ—standard pip often fails. In these cases, we recommend Mamba.
- Why Mamba? Standard Conda can take hours to “solve” a complex environment (finding a set of versions that all work together). Mamba is a C++ implementation that does this in seconds.
- Environment Files: Use an
environment.ymlfile to define both the Python version and the required packages from theconda-forgechannel. - Documentation: Document the creation command clearly:
mamba env create -f environment.yml
Common pitfalls & troubleshooting (Python)
Python environment management is notoriously “leaky.” Even with virtual environments, it is easy to accidentally run code in the wrong context. Watch out for these common emLab hurdles:
- The “Shadow” Global Install: You run
pip installwithout realizing your virtual environment isn’t active.- The Consequence: The package is installed to your system’s global Python. Your code runs fine on your machine, but when you share the
requirements.txtorenvironment.ymlfile, the package is missing, and your collaborator’s code fails. - The Fix: Always check your terminal prompt for the
(.venv)prefix. When in doubt, runwhich python(Mac/Linux) orwhere python(Windows) to ensure it points to your project folder, not/usr/bin/python.
- The Consequence: The package is installed to your system’s global Python. Your code runs fine on your machine, but when you share the
- Mixing Pip and Conda: You use
pip installinside a Mamba/Conda environment for a package that has complex C-dependencies.- The Consequence: This is the leading cause of “Environment Inconsistency” errors. Pip and Conda do not communicate well; Pip might overwrite a library that Conda relies on, leading to a broken environment that won’t update.
- The Fix: If you are using Mamba, always try
mamba install package_namefirst. Only usepip installif the package is unavailable onconda-forge.
- The GDAL/Spatial Nightmare: Trying to install spatial libraries like
geopandas,fiona, orrasterioviapip.- The Consequence: These packages require specific versions of system libraries (GDAL, PROJ, GEOS). Pip tries to compile these from source, which almost always fails on standard laptops without a massive headache.
- The Fix: Use Mamba and an
environment.ymlfile for any project involving spatial data. Mamba handles the system-level binaries so you don’t have to.
- Stale Requirements Files: You’ve been working for weeks, installing new packages, but haven’t updated your tracking file.
- The Consequence: The
requirements.txtfile in your GitHub repo is months out of date. New lab members spend hours trying to debug “ModuleNotFoundError.” - The Fix: Regularly run
pip freeze > requirements.txt(for venv) ormamba env export > environment.yml(for Mamba) before every major push to GitHub.
- The Consequence: The
- Python Version Drift: Your code uses f-strings or new syntax from Python 3.12, but your collaborator is on 3.9.
- The Consequence: Syntax errors that look like code bugs but are actually version issues.
- The Fix: Specifically define the Python version at the top of your
environment.ymlor in yourREADME.md.
Summary table
| Feature | R Strategy | Python Strategy | Primary Goal |
|---|---|---|---|
| Language Version | Positron or rig |
Positron or pyenv |
Isolate the interpreter from OS updates. |
| Project Isolation | renv |
venv or conda/mamba |
Prevent projects from conflicting with each other. |
| Lockfile | renv.lock |
requirements.txt or yml |
Record exact versions for 100% reproducibility. |
| Binary Source | Posit Package Manager | PyPI or Conda-Forge | Ensure fast, reliable installation of libraries. |
External resources & further reading
For a deeper dive into the tools and philosophies discussed in this SOP, we recommend the following resources:
R & package management
- rig: The R Installation Manager: The official repository for
rig. Includes detailed installation guides for macOS, Windows, and Linux, and advanced commands for managing Rtools. - renv: Introduction to Project Environments: The official “Get Started” guide for
renv. It provides a clear overview of the package’s philosophy and a comprehensive list of commands. - Using Posit Package Manager (PPM): The public instance of PPM. Use the “Setup” button here to generate the exact repository URL for a specific date or operating system.
- CRAN Task View: Reproducible Research: A curated list of R packages and tools dedicated to making research more reproducible, from literate programming (Quarto/Knitr) to specialized environment managers.
Python & environment management
- Real Python: Virtual Environments Guide: An excellent, beginner-friendly primer on why virtual environments are necessary and how to use the built-in
venvmodule. - Mamba Documentation: The user guide for Mamba. Essential reading if you are transitioning from standard Conda and want to understand why Mamba’s “solver” is so much faster.
- Conda-Forge: The Community Repository: Since emLab relies on
conda-forgefor spatial libraries, this guide explains how the community maintains these packages and how to troubleshoot version conflicts.