Version control with commits

A commit is a snapshot of your code at a particular moment, paired with a message describing what changed and why. The history of a project is the accumulation of its commits.

What a commit is

When you run git commit, git takes a snapshot of all the files you have staged and stores it permanently in the repository’s history. Each commit has:

  • A unique identifier (a SHA hash)
  • The name and email of the author
  • A timestamp
  • A commit message
  • A reference to the previous commit, creating a chain

This chain is what allows you to see the history of any file, compare versions, and restore earlier states of the project.

Staging changes

Before you commit, you stage the specific changes you want to include. This is done with git add:

git add scripts/01_clean.R           # Stage a single file
git add scripts/                     # Stage all changes in a directory
git add -p                           # Interactively stage chunks within files
git add .                           # Stage all changes in the repository

Even if you have modified several files, you can group related changes into a single, focused commit rather than committing everything at once.

In Positron and VS Code, the Source Control panel (the branching icon in the Activity Bar) shows all modified files and lets you stage individual files or specific lines with a click.

Atomic commits

A commit should represent one logical change. This principle is sometimes called making commits atomic. An atomic commit might be: adding a new function, fixing a specific bug, renaming a variable for clarity, or updating the README. A single commit should not include multiple unrelated changes like cleaning data, fitting a model, generating three figures, and updating the README all in one.

Atomic commits have practical benefits. When something breaks, a focused commit history makes it much easier to identify which change introduced the problem. When a collaborator reviews your work, smaller and more targeted commits are easier to understand. When you need to undo a change, you can revert a specific commit without also undoing unrelated work.

Writing good commit messages

The commit message is addressed to a reader – a future collaborator, a reviewer, or yourself in six months – who is trying to understand why a change was made. The what is visible in the diff. The message should explain the why.

A commit message has two parts: a subject line and an optional body.

The subject line should be a concise summary of the change. It should be capitalized, written in the imperative mood (as though giving an instruction), ended without a period, and no longer than 50 characters.1 For example:

Filter observations with missing species ID

Rather than:

fixed the thing
updated script
more changes

The body, if needed, provides context: what motivated the change, what alternatives were considered, or what side effects to be aware of. Separate the body from the subject with a blank line.

A full example:

Filter observations with missing species ID

Previously, missing species IDs were carried through the pipeline and
caused the model to drop rows silently. This change makes the filtering
explicit and logs how many rows are removed at each step.

For the majority of commits, a one-line subject is sufficient. Reserve the body for changes that require explanation.

If your project uses GitHub Issues, you can reference the relevant issue in the commit message with #issue-number (e.g., Add initial data visualization for mapping fishing effort (#42)). This creates a link between the commit and the issue. You can even use your commit message to close an issue by adding Closes (e.g., Add initial data visualization for mapping fishing effort (Closes #42)). This will automatically close the issue when the commit is merged into main.

If you use Positron or VS Code and have GitHub Co-Pilot enabled, you can get suggestions for commit messages based on the staged changes. This can be a helpful starting point, but always review and edit the suggested message to ensure it accurately reflects the change and provides useful context. To see a Co-Pilot suggestion, stage your changes and then click the “Generate commit message” button in the Source Control pane which looks like a small Sparkle icon.

Commit frequency

Commit often enough that individual commits are meaningful, but not so infrequently that a single commit contains a week’s worth of work. A reasonable heuristic: commit whenever you complete a coherent unit of work and the code is in a state where it runs (or at minimum, you understand why it does not).

Do not wait until the end of a work session to commit. Small, frequent commits are much easier to review and debug than large, infrequent ones!

Viewing history

To see the commit history for the current branch:

git log --oneline

To see what changed in a specific commit:

git show <commit-sha>

To see the history of a specific file:

git log --oneline -- scripts/01_clean.R

GitHub’s web interface also provides a commit history view for every repository and file, which is often more convenient for browsing.

Footnotes

  1. If you wish, see https://cbea.ms/git-commit/ for an explanation of the reasoning behind these rules.↩︎