Cautionary note: I’m not a Binder expert. Feel free to comment / discuss below, I’m happy to improve this post!
After I spent the last two days debugging code which broke because of updates in our dependencies (e.g. a bad mpl bug), it occurred to me that our MyBinder tutorials will be broken as well, because each new commit on a binder repository would trigger a new conda install, with broken packages in it.
Solution 1: pin your packages
The recommended way to deal with the issue is to pin packages in your
environment.yml file. However, in conda this is easier said than done (we have quite a few dependencies). And, more problematically, package versions and their inter dependencies are not guaranteed on conda-forge: a pinned file that worked one day might not work another day.
Solution 2: pin your MyBinder link
Another way to deal with the issue is to make a MyBinder link which points to a commit which works (e.g. before a package update which broke your notebooks). As long as MyBinder remembers that it has built this commit in the past, the image will work. The problem with this pin is that you can’t update your notebook content (a new commit triggers a build), and MyBinder doesn’t make any guarantee that they will store our images “forever” (the MyBinder image registry is a temporary cache, as explained below).
Solution 3: repo2docker, dockerhub and nbgitpuller
This solution is inspired by the separation of content and environment described here. We go one step further, with three repositories:
- an environment description repository (example) with all the config files (env, postBuild, etc). This environment is built on Travis (at each commit and each week with a cron-job) with repo2docker and pushed to dockerhub. The build on Travis allows to run tests (either on our dependencies or our own content) and therefore ensures that the pushed images are “working”.
- a Binder env repository (example) which does nothing else then pulling from dockerhub with a Docker file. This way, you can tag the exact version of the image you want to build and don’t rely on any other tool than dockerhub to store your images. This is the repository which we link to on Binder. When Binder builds an image from it, it is going to push from DockerHub, which is usually faster than a full repo2docker build but can be slow and might have drawbacks, as explained below.
- one (or more) “content repositories” with the notebooks (or code) you would like to share on Binder (example). This content is pulled into your Binder envs with nbgitpuller (example documentation).
This set-up is of course more complex than a single “binder ready” repository. But there are a couple of advantages:
- you can update the content (notebooks) frequently without triggering an environment build (which you only rarely want to do).
- other people can use your environment with their own content (for example, Lizz improved and translated our notebooks into spanish for a class).
- you can add test to your travis script, ensuring that the envs you are building work for you (edit: an undocumented - and better - alternative is explained below).
- as long as dockerhub exists, reproducibility and a proper “time machine” is ensured. You can also store your images elsewhere if you want.
- the actual reason why we did this in the first place is that we need the repo2docker images available on Dockerhub in order to use then for our jupyterhub
- this set-up is more flexible than pinning dependencies to a fixed version. Often, it is very hard to find a combination of packages that work together, and most of the time you actually want to update your dependencies. Pinning packages is a tedious process, downloading from dockerhub is arguably easier.