I’ve been working on a tool to help make it easier to work reproducibly, and despite their strengths, there are definitely some traps one can fall into using a Jupyter notebook as a pipeline. So I created this cell magic that allows you to declare the variables the cell depends along with its output, so it can be run in a separate process with outputs cached with DVC and injected back into the kernel namespace.
The cell code is also hashed so its outputs are read directly from the cache if the dependencies and code have not changed, making it easy to run a notebook from scratch if you’ve pulled the cache with DVC.
You can install the package with pip install calkit-python
and load the %%stage
cell magic with %load_ext calkit.magics
. A more detailed tutorial can be found here: calkit/docs/tutorials/notebook-pipeline.md at 436f195c1d8d5677e2b6dc77b444b349939b18a7 · calkit/calkit · GitHub
I hope you find it useful!