Pre-building images on BinderHub

Hello all!

For some context, my team created a plugin to insert code blocks in pages that are executable via BinderHub. We want to make sure that users can execute these code blocks as quickly as they could. We plan for the backend of these code blocks to be based on a Dockerfile, which we constantly update as we add more packages.

Right now, this is bottlenecked by the time to request a server and build the image (which minimizing the image size/layers). Is there a way for BinderHub to pre-build an image on each node, so we reduce the time to build the image (since it’s constantly updating)?

We also have a JupyterHub on the same cluster which uses the same image, so perhaps we can use it in conjunction?

Thank you for any advice!

You can trigger the building of a repository each time something is merged into the master branch of your repository.

There are also some tricks to how you write your Dockerfile so that rebuilds (where only some things have changed) are faster. I think the general principle is to install the things that take longest to build first in your image. Things that are fast later. For example installing and compiling some big package is better done at the start of the image and copying over notebooks/README later. That way the layer containing the expensive to build thing is reused when you change the README.

Can you explain a bit more what you mean? You shouldn’t need to build the image on each node. Once a particular image has been built all other launches of it should pull it from the docker registry of your BinderHub.

On mybinder.org we use the “sticky builds” feature of BinderHub. In a normal BinderHub a build is assigned to a “random” node in the cluster. With sticky builds enabled BinderHub tries to assign builds of the same repository to the same node to increase the chances of having shared layers in the docker cache.

Another thing we found on mybinder.org is that people will re-re-re-re-re-build their image a lot at the start, but then after a few days of development stop changing it quite so frequently (time between changes becomes >> time to build). The recommendation right now is to use repo2docker or docker build locally for fast paced development as it will run more quickly and you have easier access to logs etc.

2 Likes

I believe in JupyterHub, the images get pulled down onto each node through the hook-image-puller and continuous-image-puller from DockerHub before you do an upgrade, so that when a user launches a server, the Docker container is ready to run. I was wondering if we could do the same with BinderHub where images could be pulled so that a user wouldn’t have to wait for a build.

If we use docker build locally, this would not affect times for having the image on each node, correct?

Could I also ask how to enable the sticky builds feature on BinderHub?

I do realize that our use case of BinderHub might not align with the purpose of BinderHub, so thank you so much for your reply!

The BinderHub helm chart is dependent on the JupyterHub one, so whatever you can do in the JupyterHub helm chart will apply to the BinderHub helm chart. Here’s an example of enabling the continuous image pre-puller for a BinderHub: https://github.com/alan-turing-institute/hub23-deploy/blob/87a268ab51d6d1fbdbdeb634aae2f5c18042e214/deploy/prod.yaml#L44

2 Likes