How to Version Control Jupyter Notebooks

For me nbdime (https://nbdime.readthedocs.io/en/stable/) works extremely well especially if you set it up so that git uses it for rendering diff’s between notebooks.

The only time people I work with run into trouble with this is when they attempt to put too much code into a notebook. However I consider this a feature not a bug as the advice then is to put that code in a .py file instead. This keeps the notebook focussed on explaining stuff and we get real IDE features for the code.

1 Like

Do you just do the reset/run/commit every time or have you written a script to accomplish this? It’s simple enough, but I’m sure I could still somehow mess this up.

Since I add -p on principle there is no persistent messing up.

And nbdime is preferable to ReviewNB because you want to use git and not GitHub or some additional flexibility? The ReviewNB folks just added GitHub commenting as part of their product, which looks pretty slick.

@schmudde, any thoughts on how to manage NBs in pull requests?

I’m reviewing a PR in BitBucket; it doesn’t render NBs nicely and therefore I end up scrolling up and down the page. What are the options in the case? Should I review the PR locally and not in BB?

I using GitHub, I’d highly suggest ReviewNB. It allows you to see notebook diff for any commit or pull request. But that’s a GitHub plugin and you said you’re on BitBucket.

NBDime also offers version control integration, but with Git and Mercurial only.

What did you end up doing? Visual diffing locally seems like a manual process - but perhaps the only one available in this instance.

I ended up excluding .ipynb from the diff page, it’s still in Beta (BitBucket Labs) at the moment.

Hi! We’ve been building automatic version control support for all Jupyter notebooks. It’s basically an addon (that you can download and use for free) that adds a button to your notebook. The button a) runs your experiment on your cloud instance of choice (AWS, GCP, Azure) and b) stores a snapshot of the notebook so you can revert to any experiment you did 5 minutes ago or 5 years ago – including the notebooks outputs (e.g. if you plot some graphs).

Screenshots, videos, descriptions and instructions here: https://blog.valohai.com/valohai-jupyter-notebook-extension

Please add feedback!

1 Like

Sorry for (kind of) marketing plug here but I think it will be interesting to you.

We’ve recently built an extension to jupyter-notebooks and jupyter-lab that lets you version checkpoints by clicking a button and then you can browse versions and diff easily:

and I can easily share it with anyone by sending a link like this one.

We’ll be adding new features around this like commenting and stuff and I would love to hear what you think and what you would like to see there.
Anyway, I hope this helps.

Finally got to take a look at this. It’s really slick! Happy to see the inclusion of rich media in the diffs. Very cool. I’ll add it to the article.

I’m curious - how does it generate the versions?

Glad you liked it @schmudde!

If you install the extension you will get a new upload button in your jupyter lab/notebook.
Then, whenever you decide to upload a snapshot you click that button (you can name the snapshot and add a description too).

If you want to version your machine learning experiment runs inside of a versioned notebook (a lot of versioning I know), then those snapshots can be generated automatically whenever you run a cell with neptune.create_experiment() in it.
You can see an example of model training in notebook here.

Hi all!

In the spirit of moving this discussion about diffing/merging, version controlling, etc forward, we’re working on a new OPTIONAL file format JEP. Here’s the thread - Proposed-JEP: Investigate alternate, optional file formats

Please come join the discussion!

@schmudde,
A thanks for jupytext.
I have a few questions, and I can’t find a better place to ask them.
my setup is jupyterlab on a tljh instance.
I’ve got the *.ipynb in my gitignore so that I only save the *.py files.
now, whwn i pull a new py file, the jupytext does not automatically convert it to the notebook version,
and I need to jupytext --to notebook file.py myself.
even though my .jupyter/jupyter_notebook_config.py filr has the

c.NotebookApp.contents_manager_class = "jupytext.TextFileContentsManager" c.ContentsManager.notebook_extensions = "ipynb,py"

in it.
that seems to only sync from notebook to script.

am i missing something?

if i generate the notebook with the --to command above,
i need to also touch the …py file.

I must be doing something wrong. because this would all be solved if the autopair worked in both directions.

Here’s a docker image that is behaving the way I expect (i.e. when I create and save a new notebook
both a .md and a .ipynb file are created, and when I change one of them in jupyter the other one
is modified).

Here’s the config file that does that for the spawned notebook:

1 Like

Is there a way to configure Jupyer that on exit it will clean all outputs of teh file leaving it basically with the code itself?

Hi @Royi - this issue might contain the info you need: Suggestion for content: Configuring jupyter to scrub notebook output · Issue #1803 · alan-turing-institute/the-turing-way · GitHub

1 Like

Just to say - like others - I have got completely used to Jupytext, configured to save the notebook as .Rmd (RMarkdown), and I ignore the .ipynb files for version control. The .Rmd files (or your preferred text flavor) are so much easier to diff, and they don’t contain the output.

This is nice.
I hope the option to save the file without the output will become an option or inline magic command.

@matthew.brett , How can one integrate Jupytext into VS Code (I use VS Code for Jupyter notebooks).

Ah - I don’t know about VSCode - I guess that will depend on the Jupytext output format - and whether you need the notebook to be interactive in VSCode. I edit in Vim, and then go back to the notebook interface to test from time to time.

A big milestone here in viewing well-formed, nbformat-validated Jupyter Notebooks in their native JSON format:

This is powered by the excellent nbdime, mentioned a bit above, with the longer history of this effort here:

Because it’s powered by an actual Jupyter tool, and not an intern’s copy in the platform’s language du jour, it works for every version of the Jupyter Notebook Format, even those from before The Big Split.

And while of course, this is on a proprietary platform, of course owned by a Big Tech of consistently dubious morals, as a community we can at least bask in being embraced for a while :hugs: .

6 Likes