The goal of this tutorial is to show how to use Git with Jupyter notebooks. The primary audience for this tutorial are data scientists and data analysts who have some experience with Jupyter notebooks but little to no experience with Git or the command line. To that end, this tutorial uses JupyterLab and the JupyterLab Git extension but also provides the equivalent git commands, for the curious.
This tutorial is inspired by the Katacoda Git tutorial in that it follows the same basic flow of introducing Git commands but applied to Jupyter notebooks. The actual notebook used here is from this Google Colab notebook for teaching the pandas Python library.
There are multiple run Jupyter notebooks. This tutorial has been designed to work with two options:
- Google Cloud Platform AI Platform Notebooks, which provides cloud instances of JupyterLab
- Local installation of JuptyerLab
This section describes how to create a JupyterLab instance with GCP's AI Platform Notebooks, which automatically includes the Git extension. More detailed instructions can be found here.
- Open the AI Platform Notebooks console
- Click
+NEW INSTANCEand select "Python 3"- Instance name: git-tutorial-python-[YOUR NAME]
- Region: us-east1 (South Carolina)
- Zone: us-east1-b
- Instance properties: use all defaults
- Click
CREATE
- Click
OPEN JUPYTERLAB
This section describes how to install JupyterLab and the Git extension on your local machine. This section assumes you already have Python installed. Note that this section has only been tested on macOS, so far.
- (Optional, but recommended) Create a virtual environment. There are many options. If you don't have a preference, an arbitrary recommendation is virtualenvwrapper.
- Install Node.js
- Install JupyterLab and related packages:
pip install jupyterlab~=2.2.9 jupyterlab-git==0.24.0 - Install the Git extension:
jupyter labextension install @jupyterlab/git - Install the nbdime extension:
nbdime extensions --enable - Run JupyterLab:
jupyter-lab
To allow you to push and pull commits to and from this repo, you must create your own copy of it, known as creating a fork. The following instructions describe how to fork this repo. More detailed instructions can be found here.
- Create a GitHub account, if you don't have one
- Open https://github.com/hahns0lo/jupyter-git-tutorial in a browser
- In the upper right-hand corner, click
Fork - Select your user
To allow you to push and pull commits without being prompted for a password, you must setup your GitHub account with an SSH key. The following instructions describe how to do this in your JupyterLab instance. More detailed instructions can be found
- Open JupyterLab
- Open a terminal from the JupyterLab launcher
- Create an SSH key by following the Linux instructions here
- Add the SSH key to your GitHub account the Linux instructions here.
- For step 1, use the following command instead:
cat ~/.ssh/id_ed25519.pub
- For step 1, use the following command instead:
ReviewNB is a GitHub Marketplace app that provides visual diffs for Jupyter notebooks on GitHub. The following instructions describe how to setup ReviewNB with your fork.
- Open https://github.com/marketplace/review-notebook-app
- Under Pricing and setup, select Free and click
Install it for free - Click
Complete order and begin installation - Select
Only select repositories, select[username]/jupyter-git-tutorial, and clickInstall - Click
Authorize Review Notebook Appand you will be redirected to https://app.reviewnb.com/
- Clone your fork of this repo
- Using the Git extension
- Get the URI to your fork. In your browser, click
Code, select "HTTPS", and copy the URI. - On the left-hand side of JupyterLab, click the Git icon to open the Git extension.
- Click
Clone a Repository - Paste the URI to your fork, e.g. https://github.com/[username]/jupyter-git-tutorial
- Get the URI to your fork. In your browser, click
- Using the command line
- Get the URI to your fork. In your browser, click
Code, select "SSH", and copy the URI. - Open a terminal from the JupyterLab launcher
git clone git@github.com:[username]/jupyter-git-tutorial.git
- Get the URI to your fork. In your browser, click
- Using the Git extension
- Make a copy of the tutorial notebook
- Using JupyterLab
- Open
jupyter-git-tutorial - Create a new folder called
tutorial - Copy and paste
intro_to_pandas.ipynbintotutorial
- Open
- Using the command line
cd jupyter-git-tutorialmkdir tutorialcp intro_to_pandas.ipynb tutorial
- Using JupyterLab
- Stage the notebook
- Using the Git extension
- Under Untracked, select
intro_to_pandas.ipynband click+
- Under Untracked, select
- Using the command line
git statusgit add tutorial/intro_to_pandas.ipynbgit status
- Using the Git extension
- Commit the notebook
- Using the Git extension
- Summary: Adding copy of notebook
- Click
Commit - Enter your name and email
- Using the command line
- Set your email address:
git config --global user.email "you@example.com" - Set your name:
git config --global user.name "Your Name" git commit -m "Adding copy of notebook"git status
- Set your email address:
- Using the Git extension
- Ignore Jupyter checkpoints
-
Open
intro_to_pandas.ipynb -
Create a new text file in the
jupyter-git-tutorialfolder called.gitignoreand add the following:.ipynb_checkpoints -
Stage and commit
.gitignore- Using the Git extension
- Under Untracked, select
.gitignoreand click+ - Summary: Ignoring checkpoints
- Click
Commit
- Under Untracked, select
- Using the command line
git statusgit add .gitignoregit commit -m "Ignoring checkpoints"git status
- Using the Git extension
-
- Open
intro_to_pandas.ipynbin thejupyter-git-tutorial/tutorialfolder, run it, and save - Check Git status
- Using the Git extension
intro_to_pandas.ipynbshould be listed under Changed
- Using the command line
cd ~/jupyter-git-tutorialgit statustutorial/intro_to_pandas.ipynbshould be listed asmodifiedunderChanges not staged for commit
- Using the Git extension
- Look at the changes
- Using the Git extension
- Under Changed, select
intro_to_pandas.ipynband click the icon with a+and- - Only outputs should have changed
- Under Changed, select
- Using the command line
git diff- Keep pressing space to scroll down or
qto quit git difftool- Use the up/down keys to scroll or the following sequence twice to quit
:q- Enter
- Using the Git extension
- Stage the changes and view the changes again
- Using the Git extension
- Under Changed, select
intro_to_pandas.ipynband click+ - Under Staged, select
intro_to_pandas.ipynband click the icon with a+and-
- Under Changed, select
- Using the command line
git statusgit add tutorial/intro_to_pandas.ipynbgit statusgit diffNothing should happen!git diff --stagedgit difftool --staged
- Using the Git extension
- Commit the changes
- Using the Git extension
- Summary: Ran notebook
- Click
Commit - Enter your name and email
- Using the command line
git commit -m "Ran notebook"git status
- Using the Git extension
- Look at the log
- Using the Git extension
- Click the History tab
- Using the command line
git loggit log --pretty=format:"%h %an %ar - %s"
- Using the Git extension
- Look at the last commit
- Using the Git extension
- Click the History tab
- Click on the "Ran notebook" commit to expand
- Click on
intro_to_pandas.ipynb
- Using the command line
- Copy the long string of numbers and text after
commit. This is called the commit hash or commit SHA. git show [commit hash]
- Copy the long string of numbers and text after
- Using the Git extension
- Open
intro_to_pandas.ipynbin thejupyter-git-tutorial/tutorialfolder - Modify the notebook
- Find and replace the following
SacramentotoLos Angeles485199to379262197.92to468.97
- Run the notebook and save
- Find and replace the following
- Look at the changes
- Using the Git extension
- Look at the output after the
pd.Series(['San Francisco', 'San Jose', 'Los Angeles'])cell - Hover over the red/green boxes under "Outputs changed" and click
Show source
- Look at the output after the
- Using the command line
git diffgit difftool
- Note that may appear that all outputs have changed, even if you don't see any differences. This is because if the cell numbers differ, that counts as a change.
- Try Run->Restart Kernel and Run All Cells... to reset cell numbering and look at the diff again
- Using the Git extension
- Stage and commit the changes
- Summary: "Replaced Sacramento with Los Angeles"
- Look at information about the remote repository
- The Git extension does not have this feature
- Using the command line
cd ~/jupyter-git-tutorialgit remotegit remote show origin
- Open https://github.com/[username]/jupyter-git-tutorial in a browser
- Click on the
N commitslink next to the icon of a watch. It should not contain any of your commits.
- Click on the
- Push your commits
- Using the Git extension
- Click the cloud icon with an up arrow
- Enter GitHub username and password
- Using the command line
git push
- Using the Git extension
- Look at the log
- Open https://github.com/[username]/jupyter-git-tutorial in a browser
- Click on the
N commitslink again. The value ofNshould be larger and it should match the log history.
- Click on the
- Open https://app.reviewnb.com/
- Select
[username]/jupyter-git-tutorial - Select the Commits tab
- Select the "Replaced Sacramento with Los Angeles" commit
- Click
SEE ON GITHUB