Testing notebooks

lheagy · April 3, 2019, 9:14pm

During the Pangeo meeting today with @rabernat, @jhamman, @jsignell, @jwagemann, @robfatland, @amanda-tan, and @mrocklin, the question “How do you keep your notebooks fresh?” came up and there currently isn’t an agreed upon solution for testing notebooks and detecting when they need to be updated. The purpose of this post is to collect working ideas for maintaining notebooks and collaborate on a path forward.

State the problem

“We created an awesome demo notebook to teach people how to use Pangeo, but after 2 months some upstream changes were made and the notebook no longer runs ”

If we can catch these errors in a timely manner, the notebooks are easier to maintain.

A sampling of solutions

There isn’t one common solution to this, so many groups are inventing their own, lets identify the best parts and assemble them!

nbsmoke: does smoke testing and linking and works for bokeh/holoviews output
testipynb: treats each notebook as a test case and checks that it runs (see for example notebooks with a recent publication
nbsphinx: compiles notebooks for documentation, also catches errors. Used for the dask examples
sphinx-nbexamples: Used in the pangeo documentation
…

Edge cases to be aware of

Some notebooks will require intensive compute steps or large data sets, so running them on the free tier of available CI systems might not be feasible. We don’t necessarily need to tackle this right now, but ideally a solution could be sufficiently extensible to address these cases.

The test scenario should be able to cover notebooks that will be running on Binder and on JupyterHub deployments. @rabernat uses a single docker image for both to avoid the scenario where the notebook runs on Binder but not JupyterHub or visa versa.

Road Forward

Beyond maintaining examples, testing notebooks gets to ideas around best practices in reproducible science - this is an important problem!

Please post your current solutions or ideas on how to develop a solution for testing notebooks here. We can use these as a starting point to collaborate on a common approach.

choldgraf · April 3, 2019, 9:27pm

You could use papermill for something like this as well, no? Could 1. make sure the notebooks actually run, and 2. Output the notebooks (via papermill) in a folder that is exposed via travis/circle/etc?

lheagy · April 3, 2019, 9:28pm

@choldgraf: do you know of an example repo that does this? Seeing it in action would be helpful

choldgraf · April 3, 2019, 10:21pm

Nope I’ve never done it, but the idea just popped into my head so I mentioned it haha

rabernat · April 3, 2019, 10:28pm

This suggestion hits an important scenario. Generally speaking, we would like to have an un-executed version of the notebook to run in binder. That’s what most of our existing example binders do. We don’t want the cells to be run, because we want the users to have the experience of seeing the output appear for the first time.

On the other hand, we would like to have fully “built” output for display in galleries, like our pangeo use cases.

Currently these are completely separate things. It would be great to be able to unify them somehow.

choldgraf · April 3, 2019, 10:35pm

you could totally do that with papermill!

rabernat · April 3, 2019, 10:37pm

If we did this, how could we use CI to view the output (rendered notebooks) of a PR to a repo? I can’t quite imagine how that would work.

willingc · April 4, 2019, 4:17am

Take a look at fastai and quantecon’s tooling. IIRC both groups are autotesting notebooks for their courses and websites.

betatim · April 4, 2019, 5:50am

This is a fairly long post linking out to various building blocks to illustrate a solution I once built for a customer.

The whole system uses CircleCI to execute several (unexecuted) notebooks with papermill, turns them into HTML, stores that HTML as artefact, and has a bot that will post a link to them in the PR as a comment.

You’d mark the executed notebook (the one you just run ) as an “artefact”. That way it will be available after the CI jobs ends.

The CircleCI config goes something like this: CircleCI config to run notebooks, convert to HTML and upload them somewhere. · GitHub

We used a bot (source here GitHub - betatim/notebook-bot). It will post (and then update) a comment like the following in your PR:

and if you follow the link it takes you to a HTML rendered notebook:

in your browser. This means that it is easy to inspect your notebooks visually.

The CircleCI config linked above it uses papermill and a conda env on the CI node. However there is also a version that uses something like repo2docker --editable . papermill path/to/notebook.ipynb to run the notebooks inside a repo2docker built image on the CI.

Papermill has the concept of “execution engines”. These are extensions you can write (and contribute to the core package or have as separate package) that take care of executing the notebook. One idea is to build a repo2docker execution engine that takes care of creating the container for you. A BinderHub execution engine that runs your notebook on a BinderHub is a logical next step/alternative. That way you can launch your compute intensive notebooks on a BinderHub from CircleCI/travis/etc.

Your CI command would only change like this: papermill --engine binderhub --some-engine-extra-args-here notebook.ipynb output/notebook.ipynb to run the notebook on mybinder.org.

My attempt to make a kaggle engine kinda worked. It is on hold because we can’t get the rendered notebook back from the Kaggle API

HTH

willirath · April 4, 2019, 7:34am

Parcels uses nbval to ensure all tutorials work.

psychemedia · April 4, 2019, 9:02am

@willirath Have you by any chance posted a write up of how you went about setting up the nbval notebook testing?

I need to do something similar for some course notebooks and keep putting it off because I’m not sure where to start with things like travis, circle.ci etc.

thanks
–tony

willirath · April 4, 2019, 9:31am

Am not allowed to put more than two links in a post… So the refs are a bit weird below.
Parcels repo: https://github.com/OceanParcels/parcels/

Requirements

See environment_py3_linux.yml in the Parcels repo for full env. They get pytest and nbval from PyPi, and py (why though?) from conda-forge.

CI

Full Travis config in .travis.yml of the Parcels repo.

There is some issue with running nbval without a display. So they use

export DISPLAY=:99.0;
sh -e /etc/init.d/xvfb start;
sleep 3;
py.test -v -s --nbval-lax examples/;

in a linux image that seems to have xvfb preinstalled.

(Note that Parcels uses --nbval-lax and hence just checks for execution without error.)

A similar workflow seems to be used by the vtkiorg/vtki repo.

CI logs

This is a sample output: https://travis-ci.org/OceanParcels/parcels/jobs/515320252#L1871

betatim · April 4, 2019, 11:41am

You can head to Introduce yourself! - #59 by josephcslater to introduce yourself to gain more karma. I’ll see if I can fix up some of the links for you.

willirath · April 4, 2019, 2:04pm

This second post seems to have lifted me above the initial barrier.

choldgraf · April 4, 2019, 9:49pm

Maybe relevant: https://github.com/ReviewNB/treon

manics · April 4, 2019, 11:29pm

I’m also using nbval, an X display isn’t needed. It’s not run automatically by Travis yet, but in theory it should be as easy as executing a script.

This is the test script: https://github.com/IDR/idr-notebooks/blob/9b639ca2fdb4cf6b4f64802f7e656f0f67aa9261/docker/test_notebooks.sh

This is all run inside a Docker container: https://github.com/IDR/idr-notebooks/blob/9b639ca2fdb4cf6b4f64802f7e656f0f67aa9261/docker/test.sh

trallard · April 5, 2019, 6:25pm

I was going to mention nvbal I pretty much use it will all my notebooks collections/repos. In the case of very lengthy computations you can mark cells that should be skipped during the automated test. Which is useful when doing loads of development in a short period of time.

amirathi · April 10, 2019, 6:26am

Maybe relevant: https://github.com/ReviewNB/treon

I built treon recently for testing Notebooks. It runs through every Notebook top to bottom in a fresh kernel and flags execution errors if any. Additionally, you can add unittests, doctests in the Notebook cell and they would get executed as well. More details in Readme.

It’s a command line tool so can be used easily in any CI environment. I intend to build Notebook CI at https://www.reviewnb.com/ with the help of treon.

willirath · June 3, 2020, 3:27pm

To pick this up again: I’m playing with papermill in a repo2docker-built docker image run on Github actions here.

All this basically works. But there’s some friction from the way Github actions are set up if things are directly run in the container (uid 1001 for GH Actions vs. 1000 for repo2docker, non-interactive shells not properly activating conda, etc.). I didn’t, for example, succeed in running a notebook that %pip installs stuff along the way.

betatim · June 6, 2020, 6:55am

In the mean time @hamel has done some cool stuff with repo2docker and GH actions in:

Maybe worth looking/sharing/reusing from that as well.

(Which reminds me we wanted to move that repository to the JupyterHub organisation on GH… )

Topic		Replies	Views
Binder Notebook Builder Bot Binder	6	1233	April 14, 2020
Binder as part of a test framework Binder	5	703	December 3, 2018
Notebook added to binder but it doesn't run (kernel not connected) Binder	3	1767	March 21, 2019
New tool for users building docs with notebooks General release	0	529	December 14, 2020
"reproducible" binder environments with repo2docker, dockerhub and nbgitpuller discuss	10	2105	August 7, 2019

Testing notebooks

Requirements

CI

CI logs

Related topics