How to version Jupyter notebooks in git without output

certik · September 13, 2021, 4:45pm

What is the latest recommendation to store Jupyter notebooks in git?

I store them without output by clicking “Kernel->Restart & Clear Output”. My workflow is that I use the notebook, execute cells and then my git status as well as git diff is polluted by the metadata change and the outputs. I then have to “Kernel->Restart & Clear Output” and save the notebook to see the actual diff.

I tried nbdime with the following ~/.jupyter/nbdime_config.json:

{
  "GitDiff": {
    "details": false,
    "outputs": false
  }
}

And install it with nbdime config-git --enable and now git diff is indeed fixed, it does not show a diff anymore if I just run the notebook unless I actually modify a cell.

But git status still shows the notebook (.ipynb) as modified, and git commit -a still commits the change.

It seems to me the best solution would be to configure Jupyter to treat .ipynb as read only, and save the metadata+output into a separate file that is not versioned. That would work for me. Is that possible?

By searching here, it seems the closest solution so far is to not use .ipynb, but rather some other format that only contains the input cells and check those into git (such .md or .py), and then always convert to .ipynb. Is that the recommended workflow?

But then you have to manually convert to .ipynb to execute, and then manually convert back to .py to commit?

Some relevant threads that I found on this very topic:

How to Version Control Jupyter Notebooks (March 2019)

* https://discourse.jupyter.org/t/using-git-hooks-to-maintain-a-cleaned-output-notebook-branch/2231 (September 2019)
* https://discourse.jupyter.org/t/should-jupyter-recommend-a-text-based-representation-of-the-notebook/3273 (February 2020)

(Can moderators please uncomment the above “code block”? I get “new users can only put 2 links into their posts” if I do it myself.)

certik · September 13, 2021, 5:22pm

Answering my own question:

Exactly that might not be possible, but simply treating .ipynb as the “metadata+output” file format (removing it from git), and treating .py as the original notebook (adding it to git) and use jupytext seems to exactly do what I wanted.

Installation:

mamba install jupytext

Restart notebook, “File->Jupytext->Pair Notebook with light Script”. Uncheck .ipynb from git, check the newly generated .py file into git.

Now when I make modifications to a cell and save, it appears in the .py file. If I only run the notebook, there is no change in .py file (only in .ipynb file which is not tracked by git anymore).

When I do git clean -dfx, and jupyter notebook my_nb.py, it starts the notebook as usual and everything seems to just work.

Conclusion: treat the “light Script” .py files as the original notebook, checked into git. Use exactly as before.

(I’ve only used this new workflow for 5 minutes, so I will report back after I use this for a longer time to see if this works well for me.)

matthew.brett · September 13, 2021, 5:42pm

Yes, right. Personally I use .Rmd format, with Jupytext, and the matching Jupyter extension installed. That means that when I save the notebook, it automatically gets saved as .Rmd, and that’s what I put into version control. You can also, incidentally, edit the .Rmd elsewhere and reload the edited notebook in the Jupyter UI, but that’s just icing on the cake.

certik · September 13, 2021, 9:54pm

Thanks @matthew.brett. In your experience, what is the advantage of the R Markdown (.Rmd) compared to “Markdown” and “MyST Markdown”? I assume all three are some flavor of markdown.

The other family seems to be the various “Script” options, of those I chose “light Script” since it seems the simplest. I assume they all save some kind of a .py file.

Regarding markdown vs .py file, the markdown is probably better, since it is clear it is a document. The .py file invites to run it with Python directly, which I tried, but it fails because things like %pylab inline do not get executed and then it fails later when it is expecting things to be imported and they are not.

matthew.brett · September 13, 2021, 11:08pm

One major advantage of RMarkdown is that it’s a fairly common format, because of R, so I have good support for it in Vim (your editor here). My usual workflow is that I start with a sketch in Jupyter, and then work it up in my editor, so I really need good syntax-highlighting etc for text and code, inside the editor.

I also find the metadata markup pleasant to use, example:

```{python tags=c("raises-exception")}
# This gives an error
1 / 0
```

Given this has worked well, I haven’t explored whether Myst-Markdown, for example, would work better.

choldgraf · September 14, 2021, 12:43am

I think in MyST the syntax would be pretty similar, I think it’d be something like:

```{code-cell} python
:tags: raises-exception
# This gives an error
1 / 0
```

or if you wanted more verbose or complex YAML cell metadata

```{code-cell} python
---
tags:
- raises-exception
- someothertag
someotherkey: someotherval
---
# This gives an error
1 / 0
```

certik · September 16, 2021, 1:47am

Thanks @choldgraf and @matthew.brett for the feedback.

I’ve been using jupytext for a few days now and it works really great, exactly as I wanted.

The only issue I found so far is that I used to type ju<TAB> in a terminal to get jupyter, but now it stops at jupyte, due to both jupyter and jupytext in the path. So it’s a little annoying, but that’s minor.

Geoff · June 20, 2023, 5:53pm

In a Jupyter ReadTheDocs [LINK UPDATED January 2024 according to @nicolay below], they describe scrub_output_pre_save which will empty the output cells when saving (they are regeneratable, why save the output?).
I used to use this with Jupyter (pre-lab). I recently tried it with Jupyter-lab and it wasn’t working… but I didn’t investigate further.
I think this would be a nice option in ‘settings’.

nickolay · January 3, 2024, 8:46pm

The scrub_output_pre_save example is 404 now, here’s what used to be there: File save hooks — Jupyter Notebook 4.4.1 documentation

I had to rename jupyter_notebook_config.py to ~/.jupyter/jupyter_server_config.py and it now works with Jupyterlab 4.0.

Linus_J_Fernandes · August 20, 2024, 7:40am

Where is FileContentsManager.pre_save_hook located?

Topic		Replies	Views
How to Version Control Jupyter Notebooks Notebook blog-post	22	25733	March 8, 2023
Can one avoid tracking the notebook python version in git? Notebook how-to	2	608	May 30, 2024
Using git hooks to maintain a "cleaned output" notebook branch General	7	3818	April 8, 2021
Jupyter and GitHub - alternative file format Notebook community , idea	101	10234	May 31, 2021
Developing jupyter notebooks on Github Notebook	2	839	March 13, 2021

How to version Jupyter notebooks in git without output

Related topics