This is an idea we came up with at the 2019 Pangeo Community Meeting. The aim of my post is to encourage people to work on something I don’t really have the expertise to do myself, but which I think would have a huge impact.
Many projects have an “example gallery” of notebooks (example: dask). It’s great to have the these examples live in a binder, in which case the notebooks are stored with output cleared. But often we also want a fully executed notebook to live in a static documentation site. Where should this execution happen? For many build systems, like dask’s, it happens in CI.
But this is not always ideal. Sometimes the binder environment can be quite complex, involving a big set of dependencies or customized access to resources (as in Pangeo’s binder). This makes it hard to recreate the proper build environment in CI.
I am proposing a tool, and associated bot, that uses the Jupyterhub API to execute the notebooks within their own binder. Specifically, this tould would
- Launch repo binder via the jupyterhub API
- Use the API to run each notebook
- Download the executed notebooks out of the running binder
In bot form, this service could watch a repo for changes to the
master branch and, when it detects a change to a notebook, run this workflow and generate a PR to a
rendered branch. That way the repo could contain both blank and rendered notebooks, with the bot keeping them in sync. This sort of continuous integration would also serve as form of quality control, keeping a binder fresh and functional as its content evolves.
I think such a tool could actually be the foundation of a notebook based publication service, which has been discussed many times.
I made some tiny progress on the API stuff at the Pangeo hackathon, but got stuck because I don’t know how to do async programming:
@yuvipanda also has some relevant examples of using the hub API in hubtraf:
There may some interest from @jsignell in working on this.
Keen to hear thoughts from the community.