How to Version Control Jupyter Notebooks

Hello all

I thought a few of you might be interested in this overview, How to Version Control Jupyter Notebooks. I try to strike a balance between writing something detailed but includes a broad number of tools. Feedback and discussion welcome!




If I intend to publish the notebook ‘directly’ (nbviewer) with intact output cells. I tend to use print(…) and save charts as external PNG and then insert those external images. Both keeps the notebook files more manageable by git, and especially the diffs quite clean.

And “restart kernel & clean” followed by “run all” before a commit. That adds the insurance that you don’t have added any state / order problems, unlike just cleaning the outputs.

Great summary. Chiming in wrt my workflow (hundreds of notebooks used for research, collaborative development of university courses, graduate student projects etc.) I’ve found jupytext to be transformative. A couple of comments from my own experience:

  1. I keep the python version and the json version in different folders. This makes diffing, grepping, adding files to the git index etc. easier. The ipynb files reside one level up, but aren’t committed to my git repo. My global looks like this:
c.NotebookApp.contents_manager_class = "jupytext.TextFileContentsManager"  # noqa
c.ContentsManager.preferred_jupytext_formats_save = "py:percent" # noqa
c.ContentsManager.default_jupytext_formats = "ipynb,python//py" # noqa
c.ContentsManager.default_notebook_metadata_filter = "all,-language_info"
c.ContentsManager.default_cell_metadata_filter = "all"

This passes almost all metadata and pairs mynotes.ipynb with python/, creating the python folder if it doesn’t exist.

  1. I use the pre-commit package to run black, reorder-python-imports and flake8 on every python file, which is obviously a major improvement for informative diffs.

  2. For those of you using nbgrader-- you’ll need to disable jupytext when you convert source notebooks to student released versions. You can do that by editing the formats metadata string for that notebook to “ipynb” only.

  3. Here’s an example of a set of teaching notebooks with their py:percent counterparts: EOSC 213

  4. Bottom line – jupytext is the missing piece I’ve been looking for since I started working with IPython notebooks

1 Like

For me nbdime ( works extremely well especially if you set it up so that git uses it for rendering diff’s between notebooks.

The only time people I work with run into trouble with this is when they attempt to put too much code into a notebook. However I consider this a feature not a bug as the advice then is to put that code in a .py file instead. This keeps the notebook focussed on explaining stuff and we get real IDE features for the code.

1 Like

Do you just do the reset/run/commit every time or have you written a script to accomplish this? It’s simple enough, but I’m sure I could still somehow mess this up.

Since I add -p on principle there is no persistent messing up.

And nbdime is preferable to ReviewNB because you want to use git and not GitHub or some additional flexibility? The ReviewNB folks just added GitHub commenting as part of their product, which looks pretty slick.

@schmudde, any thoughts on how to manage NBs in pull requests?

I’m reviewing a PR in BitBucket; it doesn’t render NBs nicely and therefore I end up scrolling up and down the page. What are the options in the case? Should I review the PR locally and not in BB?

I using GitHub, I’d highly suggest ReviewNB. It allows you to see notebook diff for any commit or pull request. But that’s a GitHub plugin and you said you’re on BitBucket.

NBDime also offers version control integration, but with Git and Mercurial only.

What did you end up doing? Visual diffing locally seems like a manual process - but perhaps the only one available in this instance.

I ended up excluding .ipynb from the diff page, it’s still in Beta (BitBucket Labs) at the moment.