I created a Jupyter cell magic that allows you to use a notebook cell as a DVC pipeline stage

petebachant · December 1, 2024, 1:43pm

I’ve been working on a tool to help make it easier to work reproducibly, and despite their strengths, there are definitely some traps one can fall into using a Jupyter notebook as a pipeline. So I created this cell magic that allows you to declare the variables the cell depends along with its output, so it can be run in a separate process with outputs cached with DVC and injected back into the kernel namespace.

The cell code is also hashed so its outputs are read directly from the cache if the dependencies and code have not changed, making it easy to run a notebook from scratch if you’ve pulled the cache with DVC.

You can install the package with pip install calkit-python and load the %%stage cell magic with %load_ext calkit.magics. A more detailed tutorial can be found here: calkit/docs/tutorials/notebook-pipeline.md at 436f195c1d8d5677e2b6dc77b444b349939b18a7 · calkit/calkit · GitHub

I hope you find it useful!

Topic		Replies	Views
I created a line magic for debugging to creates cells dynamically extensions community	0	282	February 20, 2024
Package to capture cell output like text, image or video to a file Show and Tell jupyterlab	0	1114	November 3, 2022
Register_cell_magic JupyterLab	2	685	September 7, 2021
Guix-Jupyter: Towards self-contained, reproducible notebooks Notebook	9	3304	January 14, 2020
Feature Idea: A specification for notebook output dependencies Notebook feature-idea	18	1553	August 12, 2021

I created a Jupyter cell magic that allows you to use a notebook cell as a DVC pipeline stage

Related topics