What is the latest recommendation to store Jupyter notebooks in git?
I store them without output by clicking “Kernel->Restart & Clear Output”. My workflow is that I use the notebook, execute cells and then my git status as well as git diff is polluted by the metadata change and the outputs. I then have to “Kernel->Restart & Clear Output” and save the notebook to see the actual diff.
I tried nbdime with the following ~/.jupyter/nbdime_config.json:
{
"GitDiff": {
"details": false,
"outputs": false
}
}
And install it with nbdime config-git --enable and now git diff is indeed fixed, it does not show a diff anymore if I just run the notebook unless I actually modify a cell.
But git status still shows the notebook (.ipynb) as modified, and git commit -a still commits the change.
It seems to me the best solution would be to configure Jupyter to treat .ipynb as read only, and save the metadata+output into a separate file that is not versioned. That would work for me. Is that possible?
By searching here, it seems the closest solution so far is to not use .ipynb, but rather some other format that only contains the input cells and check those into git (such .md or .py), and then always convert to .ipynb. Is that the recommended workflow?
But then you have to manually convert to .ipynb to execute, and then manually convert back to .py to commit?
Some relevant threads that I found on this very topic:
- How to Version Control Jupyter Notebooks (March 2019)
* https://discourse.jupyter.org/t/using-git-hooks-to-maintain-a-cleaned-output-notebook-branch/2231 (September 2019)
* https://discourse.jupyter.org/t/should-jupyter-recommend-a-text-based-representation-of-the-notebook/3273 (February 2020)
(Can moderators please uncomment the above “code block”? I get “new users can only put 2 links into their posts” if I do it myself.)