1.6 Git and GitHub
Since most of our projects at emLab involve code, we use Git to track changes made to our code and faciliate collaboration by merging changes made by others, and GitHub to organize, share, and backup our code.
This section provides a brief overview of how Git and Github work, how to install them on your computer (and how to join the emLab GitHub page), and some general guidelines for how to use GitHub to organize code associated with emLab projects.
1.6.1 What are Git and GitHub?
Git is an open-source version control system designed for programmers. Git can operate as a standalone program on your computer, but can also operated through many other programs (or “clients”). GitHub (really github.com) is a hosting service that provides online storage for your Git-projects. Think of Git as a little creature that keeps a record of all of the changes made to a file stored on your computer, and GitHub as a safe place on the internet that the little creature can go and put a copy of that file (and the changes you’ve made) when you tell it to do so.
There are a number of good tutorials with more information on how Git and GitHub work (as well as how you can set them up to sync directly through other programs such as RStudio). The Ocean Health Index team at the National Center for Ecological Analysis and Synthesis (NCEAS) here in Santa Barbara created a very detailed data science training that includes two excellent tutorials on setting up and collaborating with GitHub:
If you’re new to using Git and GitHub, the two tutorials listed above are a great place to start since NCEAS and emLab often operate in a similar way. Additionally, see the Software Carpentry’s lesson for the Git novice. If you primarily use (or will use) R for coding, Jenny Bryan also has an excellent tutorial specifically about how to integrate Git and GitHub with R:
If you’re interested in learning more about all of the functionality GitHub has to offer, the Openscapes team at NCEAS has also tutorials on how to use GitHub for publishing code and for project management:
1.6.2 Helpful Terminology
Git and GitHub use some weird terms that might be unfamiliar. Before installing and setting up Git and GitHub, here are a few key terms you may come across:
- repository (“repo”): a collection of files pertaining to the same project, document, goal, etc. Generally there’s a single repository for each project at emLab containing all of the code associated with that project. This repository can be organized with multiple folders and subfolders.
- commit: a set of changes made by a user to one or more files in a repository that the user wants to prepare to send to GitHub.
- push: the action of sending a commit from your local machine to the remote GitHub directory.
- pull: the action of retrieving any commits that have been made to the repository and are stored in the remote GitHub directory but are NOT on your local machine.
1.6.3 How to Install Git and GitHub
Most of tutorials listed above include detailed instructions on how to install Git and GitHub. The short version (and steps specific to getting incorporated with the emLab GitHub page) are listed below. For more detailed instructions, please refer to the tutorials listed above.
- Create a free GitHub account
- use your @ucsb.edu email
- make sure you remember your username and password, you’ll need this later
Since GitHub is a company, and is used by many different types of organizations in many different industries, they offer a few different pricing schemes/deals. As an individual, once you create a username and sign up for an account, you get an unlimited number of free public and private repositories, but the number of external collaborators allowed in private repositories is limited to three. GitHub also offers a “Pro” plan for $7/month giving you unlimited external collaborators on all of your private repositories. However, for students, faculty, and research staff, or official nonprofit organizations and charities GitHub waives this fee through its GitHub Education and GitHub for Good programs.
Good news! emLab qualifies as an educational organization through the GitHub Education program, and as a UCSB staff member you qualify for the individual educational discount. So, once you’ve signed up for a free account on GitHub…
- Go to the GitHub Education page and register as a researcher (Note: this is why you should use your @ucsb.edu email for step 1).
Click on the “Get benefits” link in the top right-hand corner and follow the directions to upgrade your account to a “Pro” account for free. You may need to take a picture of your UCSB ID card to submit as part of this process. GitHub may also periodically ask you to re-verify your eligability to qualify for this program.
- Send Erin O’Reilly a Slack message (or an email if you must… email@example.com) with your new GitHub username so you can be added to the emLab GitHub page!
The emLab GitHub page is where the repositories for all emLab projects live (more on this later), and once you are a member of the organization you will be able to create new public and private repositories that appear here (as well as on your personal page).
- Install Git
If you’re very very lucky, Git will already be installed on your computer. Open the shell for your opperating system. If you’re using Mac OS X, this is called Terminal. If you’re using Windows, you have multiple types of shells, but you should be using a Git Bash shell (NOT Power Shell). The easiest way to find out whether Git is already installed on your machine is to type the following:
If it returns a version number, you already have Git installed! However, if it returns something like
git: command not found, you need to install Git.
If you’re using Mac OS X, Git can also be installed as part of the XCode Command Line tools, or you can also install it using Homebrew. If you’re interested in either of those options, follow the cooresponding directions in Jenny Bryan’s tutorial. If that sentence doesn’t mean anything to you, download the installer from the link above and follow the prompts.
Once you’ve installed Git via whichever method you’ve chosen for your operating system, open the shell again and retype the same command to verify that the installation was successful:
It should now return a version number.
- Tell Git who you are
Git needs to know a little bit more about you in order to play nicely. In particular there are two things that it’s helpful to configure: 1) The name that will be associated with any commits you make, and 2) the email address asssociated with your GitHub account. To set these two things, type the following into the shell using your name and email:
The user name input here should be your full name (i.e. it does not need to be the same as your username for GitHub), but the email DOES need to be the same as that associated with your GitHub account.
You can then check to make sure these were entered correctly by typing:
- Optional: Store your credentials (so you don’t have to type your password every time):
Git will sometimes want to make sure you are you when performing certain operations. For example, when cloning a private repo or when you want to push changes to a repo. If you don’t want to do this every time, you can tell
Git to remember your password too. You can read more about
Git’s credential management here.
On your terminal, navigate to a repository on your computer. (You can also use the Terminal pane within RStudio) and type the following into the shell:
What we just did was to tell
Git to store our credentials. So, after typing them this one time, you should not need to type it again.
- Optional: Install a client for Git to make your life easier
If you actually tried step 6 and you’re still reading this, you probably don’t usually spend a lot of time running commands in the shell and the last step didn’t make a lot of sense. If that’s the case, you might want to also install a Git client in order to help you visualize what Git is actually doing. You do not need a Git client to take advantage of version-control functionality of Git, as everything can be done using the shell (as in the previous step). However, the shell is not user-friendly.
There are a number of Git/GitHub clients that you can download to interact with Git and GitHub in a more visual way.
If you use RStudio, there is a very basic Git client built in that may be enough to get you started (more on this later). Other nice free Git clients include:
- GitKraken (available for all platforms, plus the logo octopus is pretty sweet…)
- GitFiend (cross-platform)
- SourceTree (has some problems on Mac OS X)
- GitHub Desktop (not available for Linux)
- GitUp (only for Mac OS X)
There are many more. See Jenny Bryan’s tutorial if you’re not satisfied with those choices.
Once you’ve installed a Git client, follow the directions to connect to your GitHub account. Once you’ve done this, try opening the local version of the repository you made in step 6, and notice the nice visual representation of the changes you made.
1.6.4 General Guidelines for using GitHub at emLab
In general, each emLab project should have its own repository. There may be some cases in which multiple repositories may be associated with the same project, but this should be avoided if possible. The project repository should be created within the emLab GitHub page (exceptions may exist for example if a parner organization requires that the project repository be created within their organization’s GitHub page).
Repositories can be made public or private when they are created (depending on the nature of the project) while the project is ongoing, but should be made public when the project is complete.
Since many previous (and ongoing) projects were created within the personal GitHub pages of emLab members, the ownership of these repositories should be transfered to the emLab GitHub page at the conclusion of the project if possible.