A Large-scale Study about Quality and Reproducibility of Jupyter Notebooks

minrk · June 13, 2019, 10:03am

Some folks at UFF, NYU did a study of the reproducibility of notebooks found on GitHub.

Highlighted data points: 1.4M notebooks found in 264k repos. 864k notebooks executed with 24% completing without errors and 4% producing the “same” results (“same” is quite strictly defined with some criticism available for their execution practices).

They made a few choices for measuring reproducibility that I think are peculiar. In particular, when a notebook had out-of-order prompts, indicating re-run cells, they chose to follow the displayed execution order, rather than what you’d get from “restart and run all” which would be my definition. Additionally, if reproducibility is requiring byte-for-byte identical output, then you need to make sure there are no time or memory-sensitive outputs (e.g. printing default object reprs), which may well have no impact on reproducibility. This is by far the simplest measure of reproducibility, and the strictest, but not the most useful.

They also define some best practices for reproducibility, which I think are worth attention and discussion.

pierrepo · June 20, 2019, 3:13pm

This article is indeed very nice. Among proposed good practices, I’m a bit puzzled with the idea of testing notebooks. Albeit the authors pointing some tools, this does not appear obvious to me.

arnim · December 17, 2024, 11:30am

Similar Computational reproducibility of Jupyter notebooks from biomedical publications by Sheeba Samuel and Daniel Mietchen https://doi.org/10.1093/gigascience/giad113

with the code at GitHub - fusion-jena/computational-reproducibility-pmc

Topic		Replies	Views
Tracking inconsistencies in notebooks General	2	504	October 23, 2019
Testing notebooks Binder	28	5640	July 5, 2022
Some tips and tools for maintaining reproducibility in scientific workflows that use Jupyter notebooks General reproducibility	8	79	July 29, 2025
Guidelines for submitting a notebook for peer review today Publishing	14	3281	July 3, 2020
List of Publications about Jupyter? Publishing	0	634	March 6, 2020

A Large-scale Study about Quality and Reproducibility of Jupyter Notebooks

Related topics