What is the latest recommendation to store Jupyter notebooks in git?
I store them without output by clicking “Kernel->Restart & Clear Output”. My workflow is that I use the notebook, execute cells and then my git status
as well as git diff
is polluted by the metadata change and the outputs. I then have to “Kernel->Restart & Clear Output” and save the notebook to see the actual diff.
I tried nbdime
with the following ~/.jupyter/nbdime_config.json
:
{
"GitDiff": {
"details": false,
"outputs": false
}
}
And install it with nbdime config-git --enable
and now git diff
is indeed fixed, it does not show a diff anymore if I just run the notebook unless I actually modify a cell.
But git status
still shows the notebook (.ipynb
) as modified, and git commit -a
still commits the change.
It seems to me the best solution would be to configure Jupyter to treat .ipynb
as read only, and save the metadata+output into a separate file that is not versioned. That would work for me. Is that possible?
By searching here, it seems the closest solution so far is to not use .ipynb
, but rather some other format that only contains the input cells and check those into git (such .md
or .py
), and then always convert to .ipynb
. Is that the recommended workflow?
But then you have to manually convert to .ipynb
to execute, and then manually convert back to .py
to commit?
Some relevant threads that I found on this very topic:
- How to Version Control Jupyter Notebooks (March 2019)
* https://discourse.jupyter.org/t/using-git-hooks-to-maintain-a-cleaned-output-notebook-branch/2231 (September 2019)
* https://discourse.jupyter.org/t/should-jupyter-recommend-a-text-based-representation-of-the-notebook/3273 (February 2020)
(Can moderators please uncomment the above “code block”? I get “new users can only put 2 links into their posts” if I do it myself.)