This is slightly off topic for this forum unless regarded as a Jupyter workflow problem, so apols in advance if it is off-topic…
I have a git repository with:
a set of notebooks in the master branch that contain run notebooks with output cells populated;
a set of derived notebooks derived from master in a clean branch that have same notebooks but with output cells stripped…
Tools such as nbstripout help automate the creation of “output cleaned” notebooks.
Is there a set of git commit hooks / filters I can use so that a commit of one or more notebooks to the master branch will result in a cleaned copy of the same notebook(s) being committed to the clean branch?
The master branch may contain a lot of notebooks in nested directories, so ideally I only want to run the notebook cleaner over notebooks that have been updated.
it would also be useful if the clean branch were configured from the start to ignore .ipynb_checkpoint/ notebooks.
I wonder what happens if you have a different .gitattributes in each branch and then cherry pick commits from your master branch to the clean branch. Does that pply the filters defined in the .gitattributes of the clean branch?
The setup is mostly automatic, but the daily use requires (for now) some manual steps.
The advantage of this approach is that the whole history is “clean” and all diffs are readable. But still, having a branch with executed notebooks allows the outputs to be visible, e.g., on nbviewer.
My preferred way of working would be to commit .md files into master and then use commit hooks to run the md via jupytext etc to produce run notebooks automatically committed into a separate branch. Having that branch empty of commit messages would be interesting.
Unfortunately, I’m having trouble persuading anyone else in the merits: a) of a Jupytext mediated approach; b) empty notebooks, rather than run notebooks, as the the thing committed by users. (The wider feeling is a user should commit the run notebook so they can see that they are committing the run notebook as they expect it to be run and then the output stripped notebook derived from that.)
I think the approach from my link above doesn’t work if you use Markdown files. Rebasing the “executed” branch would cause conflicts.
You could re-convert and re-execute all files (not only changed files) each time to avoid rebasing, then it could work. But it would be annoying if you have many notebooks (and you change only a few of them).
Instead of having executed notebooks in your repo, you can provide static HTML pages (including the cell outputs) with nbsphinx (full disclosure, I’m the author). See https://nbsphinx.readthedocs.io/en/0.4.3/custom-formats.html.
Here are two random example pages created from Markdown files:
Re: using Jupytext: yes, that would be the idea… Generate ipynb from md, essentially, and treat (as far as repo is concerned) the md as the first class source.
(By default, I use Jupytext in all my environments, with no pairing… it means I can edit py and md files in the notebook UI.)