During the Pangeo meeting today with @rabernat, @jhamman, @jsignell, @jwagemann, @robfatland, @amanda-tan, and @mrocklin, the question “How do you keep your notebooks fresh?” came up and there currently isn’t an agreed upon solution for testing notebooks and detecting when they need to be updated. The purpose of this post is to collect working ideas for maintaining notebooks and collaborate on a path forward.
State the problem
“We created an awesome demo notebook to teach people how to use Pangeo, but after 2 months some upstream changes were made and the notebook no longer runs ”
If we can catch these errors in a timely manner, the notebooks are easier to maintain.
A sampling of solutions
There isn’t one common solution to this, so many groups are inventing their own, lets identify the best parts and assemble them!
nbsmoke: does smoke testing and linking and works for bokeh/holoviews output
Some notebooks will require intensive compute steps or large data sets, so running them on the free tier of available CI systems might not be feasible. We don’t necessarily need to tackle this right now, but ideally a solution could be sufficiently extensible to address these cases.
The test scenario should be able to cover notebooks that will be running on Binder and on JupyterHub deployments. @rabernat uses a single docker image for both to avoid the scenario where the notebook runs on Binder but not JupyterHub or visa versa.
Road Forward
Beyond maintaining examples, testing notebooks gets to ideas around best practices in reproducible science - this is an important problem!
Please post your current solutions or ideas on how to develop a solution for testing notebooks here. We can use these as a starting point to collaborate on a common approach.
You could use papermill for something like this as well, no? Could 1. make sure the notebooks actually run, and 2. Output the notebooks (via papermill) in a folder that is exposed via travis/circle/etc?
This suggestion hits an important scenario. Generally speaking, we would like to have an un-executed version of the notebook to run in binder. That’s what most of our existing example binders do. We don’t want the cells to be run, because we want the users to have the experience of seeing the output appear for the first time.
On the other hand, we would like to have fully “built” output for display in galleries, like our pangeo use cases.
Currently these are completely separate things. It would be great to be able to unify them somehow.
This is a fairly long post linking out to various building blocks to illustrate a solution I once built for a customer.
The whole system uses CircleCI to execute several (unexecuted) notebooks with papermill, turns them into HTML, stores that HTML as artefact, and has a bot that will post a link to them in the PR as a comment.
You’d mark the executed notebook (the one you just run ) as an “artefact”. That way it will be available after the CI jobs ends.
in your browser. This means that it is easy to inspect your notebooks visually.
The CircleCI config linked above it uses papermill and a conda env on the CI node. However there is also a version that uses something like repo2docker --editable . papermill path/to/notebook.ipynb to run the notebooks inside a repo2docker built image on the CI.
Papermill has the concept of “execution engines”. These are extensions you can write (and contribute to the core package or have as separate package) that take care of executing the notebook. One idea is to build a repo2docker execution engine that takes care of creating the container for you. A BinderHub execution engine that runs your notebook on a BinderHub is a logical next step/alternative. That way you can launch your compute intensive notebooks on a BinderHub from CircleCI/travis/etc.
Your CI command would only change like this: papermill --engine binderhub --some-engine-extra-args-here notebook.ipynb output/notebook.ipynb to run the notebook on mybinder.org.
@willirath Have you by any chance posted a write up of how you went about setting up the nbval notebook testing?
I need to do something similar for some course notebooks and keep putting it off because I’m not sure where to start with things like travis, circle.ci etc.
I was going to mention nvbal I pretty much use it will all my notebooks collections/repos. In the case of very lengthy computations you can mark cells that should be skipped during the automated test. Which is useful when doing loads of development in a short period of time.
I built treon recently for testing Notebooks. It runs through every Notebook top to bottom in a fresh kernel and flags execution errors if any. Additionally, you can add unittests, doctests in the Notebook cell and they would get executed as well. More details in Readme.
It’s a command line tool so can be used easily in any CI environment. I intend to build Notebook CI at https://www.reviewnb.com/ with the help of treon.
To pick this up again: I’m playing with papermill in a repo2docker-built docker image run on Github actions here.
All this basically works. But there’s some friction from the way Github actions are set up if things are directly run in the container (uid 1001 for GH Actions vs. 1000 for repo2docker, non-interactive shells not properly activating conda, etc.). I didn’t, for example, succeed in running a notebook that %pip installs stuff along the way.