Some tips and tools for maintaining reproducibility in scientific workflows that use Jupyter notebooks

petebachant · July 7, 2025, 2:52pm

I wanted to share some stuff I’ve built and lessons learned regarding the use of Jupyter notebooks as part of a larger research project. Namely, how to keep things reproducible (with a single command) while retaining the benefits of interactivity notebooks provide.

You can read the docs here: https://docs.calkit.org/notebooks

Do you have any of your own tips and tricks for incorporating notebooks into a larger research or analytical project?

csc · July 14, 2025, 12:28pm

Hi @petebachant,

that’s a nice project! I like the way you integrate the code, data, and setup into the workflow specification! You have specifically pinned down the ML-setups very flexible and user-specific using Docker and/or Conda/venv. Only few have realized that the “open setup” is (besides code and data) a growing issue for ML experiments and solutions therefore are needed!

I’ve recently published a research article (preprint is on ResearchGate; still waiting for the official DOI) in which I’ve (1) searched for and compared solutions for reproducible machine learning and (2) proposed an own framework.

We compared the solutions based on the suitability for flexible Deep Learning experiments (GPU support, isolation on OS-level, taggable image versions, flexibility in languages and libraries, allowing custom-builds, and IDE integration).
The proposed solution is GPU-Jupyter. It is very well suited for these criteria. It offers a very robust and flexible image for Deep Learning. Moreover, it allows custom builds and the reproducibility of the whole customized setup in one single line.

However, the integration of data and the workflow are not strictly defined, mainly because engineering needs more flexibility. For classical ML experiments, we suggested referring to a codemeta.json as part of the FAIR4RS principles (see the demo-repository on github under iot-salzburg/reproducible-research-with-gpu-jupyter/blob/main/codemeta.json (not allowed to post more than 2 links)). This codemeta.json defines the Docker image with additional installations to some degree, but I think this is currently underrepresented.

I’ve seen that calkit allows specifying the Docker image. Therefore, a combination of our solution could be very interesting and better than sticking to codemeta.json specification.
What do you think about this quasi-standard?

How do you define in calkit whole data-science projects which require a directed graph of preprocessing steps?

Best regards,
Chris

csc · July 14, 2025, 12:30pm

Btw, I stick to the Jupyter’s docker stacks in GPU-Jupyter to allow a similar UI. I have already posted a question regarding the GPU-support here: Proposal: GPU-Support.

petebachant · July 14, 2025, 1:36pm

Hi @csc,

GPU-Jupyter looks super interesting. Workflows that use both notebooks and GPUs are becoming more prevalent, and the complexity will certainly make reproducibility more difficult.

I definitely think a combination of Calkit and GPU-Jupyter would be interesting to explore. Pre-processing steps are defined as additional stages in the pipeline. These can be notebooks, scripts, or even shell commands, and their outputs can be defined as inputs to other stages to form a DAG.

Is your example project (GitHub - iot-salzburg/reproducible-research-with-gpu-jupyter: This repository demonstrates how to use GPU-Jupyter for reproducible deep learning research with minimal setup effort..) a good one to try to adapt? If so, I’ll put together a quick PR as a demo.

csc · July 14, 2025, 2:03pm

@petebachant,

Allowing also non-sequential preprocessing brings very important flexibility and is an advantage over many existing solutions.

Yes, this example project is good to try GPU-Jupyter for reproducibility (the README will be shortened soon). I would be very glad for a demo for using GPU-Jupyter as base image within calkit. I expect that both the base image + torchsummary installation as well as the custom image https://hub.docker.com/repository/docker/cschranz/reproducible-research-with-gpu-jupyter can be used in calkit, right?

I would also like to try calkit this week to better understand it.

Best,
Chris

petebachant · July 14, 2025, 6:24pm

Just opened a PR here: Add Calkit compatibility by petebachant · Pull Request #1 · iot-salzburg/reproducible-research-with-gpu-jupyter · GitHub

Let me know what you think if you get a chance to try it out.

csc · July 15, 2025, 10:15am

Thanks @petebachant, I’ll check it out later this week.

What makes me wonder is, that (some of) the resulting errors are different, even though only the package notebook was changed. Do you know why this is the case?

Additionally, there seems to be a lot of redundant files within the .calkit directory. This is necessary for version-controlling the setup state under which notebooks and code is executed, if I understand it correctly, right?
I will have to check on how to compile the project using calkit in more detail. I may ask you on this point later.

petebachant · July 15, 2025, 2:58pm

I assume this is non-reproducibility coming from PyTorch, but I’m not sure. Maybe it’s worth running on different machines.

That’s right. Since Calkit compiles a DVC pipeline, which relies on files to determine stage staleness, Calkit generates a cleaned version of the notebook so the inputs can be isolated. It also generates some executed versions to save as artifacts. Whether or not they get committed to version control is configurable, however. For example, you can set the storage settings in the pipeline stage to null and those files will not be committed.

csc · July 29, 2025, 1:22pm

Hi @petebachant,

eventually I had time to dive into calkit. That’s an impressive and comprehensive software project you created! It took quite a time to get into it, but I guess it would be quite fast for subsequent projects. I guess that some hands-on video tutorial could help a lot of users.

Overall, I think calkit is very interesting and pins down each step to reproduce work to IMHO the minimal effort required - both for reproduction and also for making existing work reproducible.

Supporting the CUDA drivers in Docker OS-agnostically would be an important point for me, because the calkit run command would take very long without GPU for deep learning projects.

Best,
Christoph

Topic		Replies	Views
Reproducible Jupyter Notebooks with Docker General	1	476	October 23, 2019
Guix-Jupyter: Towards self-contained, reproducible notebooks Notebook	9	3368	January 14, 2020
Guix-Jupyter 0.2.1 released: a kernel for self-contained, reproducible notebooks Notebook announcement , release	0	626	January 25, 2021
Tracking inconsistencies in notebooks General	2	505	October 23, 2019
List of Publications about Jupyter? Publishing	0	636	March 6, 2020

Some tips and tools for maintaining reproducibility in scientific workflows that use Jupyter notebooks

Related topics