Testing notebooks

During the Pangeo meeting today with @rabernat, @jhamman, @jsignell, @jwagemann, @robfatland, @amanda-tan, and @mrocklin, the question “How do you keep your notebooks fresh?” came up and there currently isn’t an agreed upon solution for testing notebooks and detecting when they need to be updated. The purpose of this post is to collect working ideas for maintaining notebooks and collaborate on a path forward.

State the problem

“We created an awesome demo notebook to teach people how to use Pangeo, but after 2 months some upstream changes were made and the notebook no longer runs :cry:

If we can catch these errors in a timely manner, the notebooks are easier to maintain.

A sampling of solutions

There isn’t one common solution to this, so many groups are inventing their own, lets identify the best parts and assemble them!

Edge cases to be aware of

Some notebooks will require intensive compute steps or large data sets, so running them on the free tier of available CI systems might not be feasible. We don’t necessarily need to tackle this right now, but ideally a solution could be sufficiently extensible to address these cases.

The test scenario should be able to cover notebooks that will be running on Binder and on JupyterHub deployments. @rabernat uses a single docker image for both to avoid the scenario where the notebook runs on Binder but not JupyterHub or visa versa.

Road Forward

Beyond maintaining examples, testing notebooks gets to ideas around best practices in reproducible science - this is an important problem!

Please post your current solutions or ideas on how to develop a solution for testing notebooks here. We can use these as a starting point to collaborate on a common approach.

4 Likes

You could use papermill for something like this as well, no? Could 1. make sure the notebooks actually run, and 2. Output the notebooks (via papermill) in a folder that is exposed via travis/circle/etc?

1 Like

@choldgraf: do you know of an example repo that does this? Seeing it in action would be helpful

Nope I’ve never done it, but the idea just popped into my head so I mentioned it haha

This suggestion hits an important scenario. Generally speaking, we would like to have an un-executed version of the notebook to run in binder. That’s what most of our existing example binders do. We don’t want the cells to be run, because we want the users to have the experience of seeing the output appear for the first time.

On the other hand, we would like to have fully “built” output for display in galleries, like our pangeo use cases.

Currently these are completely separate things. It would be great to be able to unify them somehow.

you could totally do that with papermill!

If we did this, how could we use CI to view the output (rendered notebooks) of a PR to a repo? I can’t quite imagine how that would work.

Take a look at fastai and quantecon’s tooling. IIRC both groups are autotesting notebooks for their courses and websites.

1 Like

This is a fairly long post linking out to various building blocks to illustrate a solution I once built for a customer.

The whole system uses CircleCI to execute several (unexecuted) notebooks with papermill, turns them into HTML, stores that HTML as artefact, and has a bot that will post a link to them in the PR as a comment.

You’d mark the executed notebook (the one you just run ) as an “artefact”. That way it will be available after the CI jobs ends.

The CircleCI config goes something like this: https://gist.github.com/betatim/0a31ea563289afa2ce077d99a64b0948

We used a bot (source here https://github.com/betatim/notebook-bot). It will post (and then update) a comment like the following in your PR:

and if you follow the link it takes you to a HTML rendered notebook:

in your browser. This means that it is easy to inspect your notebooks visually.

The CircleCI config linked above it uses papermill and a conda env on the CI node. However there is also a version that uses something like repo2docker --editable . papermill path/to/notebook.ipynb to run the notebooks inside a repo2docker built image on the CI.

Papermill has the concept of “execution engines”. These are extensions you can write (and contribute to the core package or have as separate package) that take care of executing the notebook. One idea is to build a repo2docker execution engine that takes care of creating the container for you. A BinderHub execution engine that runs your notebook on a BinderHub is a logical next step/alternative. That way you can launch your compute intensive notebooks on a BinderHub from CircleCI/travis/etc.

Your CI command would only change like this: papermill --engine binderhub --some-engine-extra-args-here notebook.ipynb output/notebook.ipynb to run the notebook on mybinder.org.

My attempt to make a kaggle engine kinda worked. It is on hold because we can’t get the rendered notebook back from the Kaggle API :frowning:

HTH

1 Like

Parcels uses nbval to ensure all tutorials work.

2 Likes

@willirath Have you by any chance posted a write up of how you went about setting up the nbval notebook testing?

I need to do something similar for some course notebooks and keep putting it off because I’m not sure where to start with things like travis, circle.ci etc.

thanks
–tony

Am not allowed to put more than two links in a post… So the refs are a bit weird below.
Parcels repo: https://github.com/OceanParcels/parcels/

Requirements

See environment_py3_linux.yml in the Parcels repo for full env. They get pytest and nbval from PyPi, and py (why though?) from conda-forge.

CI

Full Travis config in .travis.yml of the Parcels repo.

There is some issue with running nbval without a display. So they use

export DISPLAY=:99.0;
sh -e /etc/init.d/xvfb start;
sleep 3;
py.test -v -s --nbval-lax examples/;

in a linux image that seems to have xvfb preinstalled.

(Note that Parcels uses --nbval-lax and hence just checks for execution without error.)

A similar workflow seems to be used by the vtkiorg/vtki repo.

CI logs

This is a sample output: https://travis-ci.org/OceanParcels/parcels/jobs/515320252#L1871

1 Like

You can head to Introduce yourself! to introduce yourself to gain more karma. I’ll see if I can fix up some of the links for you.

1 Like

This second post seems to have lifted me above the initial barrier.

1 Like

Maybe relevant: https://github.com/ReviewNB/treon

1 Like

I’m also using nbval, an X display isn’t needed. It’s not run automatically by Travis yet, but in theory it should be as easy as executing a script.

This is the test script: https://github.com/IDR/idr-notebooks/blob/9b639ca2fdb4cf6b4f64802f7e656f0f67aa9261/docker/test_notebooks.sh

This is all run inside a Docker container: https://github.com/IDR/idr-notebooks/blob/9b639ca2fdb4cf6b4f64802f7e656f0f67aa9261/docker/test.sh

I was going to mention nvbal I pretty much use it will all my notebooks collections/repos. In the case of very lengthy computations you can mark cells that should be skipped during the automated test. Which is useful when doing loads of development in a short period of time.

Maybe relevant: https://github.com/ReviewNB/treon

I built treon recently for testing Notebooks. It runs through every Notebook top to bottom in a fresh kernel and flags execution errors if any. Additionally, you can add unittests, doctests in the Notebook cell and they would get executed as well. More details in Readme.

It’s a command line tool so can be used easily in any CI environment. I intend to build Notebook CI at https://www.reviewnb.com/ with the help of treon.