Nowadays, I use iPython notebooks a lot in my software development nowadays. It’s a nice way to debug things without having to fire up pdb; I’ll often use it when I’m trying to debug and explore a new API.
Unfortunately, notebooks are really hard to diff in Git. I use magit and git diffs pretty extensively when I change code, and I rely heavily them to make sure I haven’t introduced typos or bugs. iPython notebooks are just JSON blobs, though, so git gives me a horrible, incoherent mess. I basically commit them blindly without checking the code at all nowadays, which isn’t ideal.
So to resolve this I generate a readable version of the notebook, and check the diff for that. Specifically, I wrote a script that extracts only the Python code from the iPython notebook (which is essentially a JSON file). Then, whenever I commit a change to the iPython notebook, it:
- Automatically generates the Python-only version alongside the original notebook.
- Commits both files to the repository.
Here’s what the diff looks like:
To make sure it runs when I need it, I created a git pre-commit hook. Git’s default pre-commit hooks are a little difficult to use, so I built a hook for the pre-commit package. If you want to try it out, you can do so by setting up pre-commit, and then including the following code in your .pre-commit-hooks.yaml
:
- repo: https://github.com/moonglow-ai/pre-commit-hooks
rev: v0.1.1
hooks:
- id: clean-notebook
You can find the code for the hooks here: GitHub - moonglow-ai/pre-commit-hooks: Moonglow pre-commit hooks
and you can read more about it at this blog post here! Diffing iPython notebook code in Git