I thought a few of you might be interested in this overview, How to Version Control Jupyter Notebooks. I try to strike a balance between writing something detailed but includes a broad number of tools. Feedback and discussion welcome!
If I intend to publish the notebook ādirectlyā (nbviewer) with intact output cells. I tend to use print(ā¦) and save charts as external PNG and then insert those external images. Both keeps the notebook files more manageable by git, and especially the diffs quite clean.
And ārestart kernel & cleanā followed by ārun allā before a commit. That adds the insurance that you donāt have added any state / order problems, unlike just cleaning the outputs.
Great summary. Chiming in wrt my workflow (hundreds of notebooks used for research, collaborative development of university courses, graduate student projects etc.) Iāve found jupytext to be transformative. A couple of comments from my own experience:
I keep the python version and the json version in different folders. This makes diffing, grepping, adding files to the git index etc. easier. The ipynb files reside one level up, but arenāt committed to my git repo. My global jupyter_notebook_config.py looks like this:
This passes almost all metadata and pairs mynotes.ipynb with python/mynotes.py, creating the python folder if it doesnāt exist.
I use the pre-commit package to run black, reorder-python-imports and flake8 on every python file, which is obviously a major improvement for informative diffs.
For those of you using nbgrader-- youāll need to disable jupytext when you convert source notebooks to student released versions. You can do that by editing the formats metadata string for that notebook to āipynbā only.
Hereās an example of a set of teaching notebooks with their py:percent counterparts: EOSC 213
Bottom line ā jupytext is the missing piece Iāve been looking for since I started working with IPython notebooks
The only time people I work with run into trouble with this is when they attempt to put too much code into a notebook. However I consider this a feature not a bug as the advice then is to put that code in a .py file instead. This keeps the notebook focussed on explaining stuff and we get real IDE features for the code.
Do you just do the reset/run/commit every time or have you written a script to accomplish this? Itās simple enough, but Iām sure I could still somehow mess this up.
And nbdime is preferable to ReviewNB because you want to use git and not GitHub or some additional flexibility? The ReviewNB folks just added GitHub commenting as part of their product, which looks pretty slick.
@schmudde, any thoughts on how to manage NBs in pull requests?
Iām reviewing a PR in BitBucket; it doesnāt render NBs nicely and therefore I end up scrolling up and down the page. What are the options in the case? Should I review the PR locally and not in BB?
I using GitHub, Iād highly suggest ReviewNB. It allows you to see notebook diff for any commit or pull request. But thatās a GitHub plugin and you said youāre on BitBucket.
NBDime also offers version control integration, but with Git and Mercurial only.
What did you end up doing? Visual diffing locally seems like a manual process - but perhaps the only one available in this instance.
Hi! Weāve been building automatic version control support for all Jupyter notebooks. Itās basically an addon (that you can download and use for free) that adds a button to your notebook. The button a) runs your experiment on your cloud instance of choice (AWS, GCP, Azure) and b) stores a snapshot of the notebook so you can revert to any experiment you did 5 minutes ago or 5 years ago ā including the notebooks outputs (e.g. if you plot some graphs).
Weāll be adding new features around this like commenting and stuff and I would love to hear what you think and what you would like to see there.
Anyway, I hope this helps.
Finally got to take a look at this. Itās really slick! Happy to see the inclusion of rich media in the diffs. Very cool. Iāll add it to the article.
Iām curious - how does it generate the versions?
If you install the extension you will get a new upload button in your jupyter lab/notebook.
Then, whenever you decide to upload a snapshot you click that button (you can name the snapshot and add a description too).
If you want to version your machine learning experiment runs inside of a versioned notebook (a lot of versioning I know), then those snapshots can be generated automatically whenever you run a cell with neptune.create_experiment() in it.
You can see an example of model training in notebook here.
@schmudde,
A thanks for jupytext.
I have a few questions, and I canāt find a better place to ask them.
my setup is jupyterlab on a tljh instance.
Iāve got the *.ipynb in my gitignore so that I only save the *.py files.
now, whwn i pull a new py file, the jupytext does not automatically convert it to the notebook version,
and I need to jupytext --to notebook file.py myself.
even though my .jupyter/jupyter_notebook_config.py filr has the
Hereās a docker image that is behaving the way I expect (i.e. when I create and save a new notebook
both a .md and a .ipynb file are created, and when I change one of them in jupyter the other one
is modified).
Hereās the config file that does that for the spawned notebook:
Just to say - like others - I have got completely used to Jupytext, configured to save the notebook as .Rmd (RMarkdown), and I ignore the .ipynb files for version control. The .Rmd files (or your preferred text flavor) are so much easier to diff, and they donāt contain the output.