People often ask the question: how can I make my repository launch more quickly on mybinder.org?
This is a short and informal post to share some insights and a few suggestions.
What affects launch time?
The challenge between running mybinder.org vs. a different cloud service such as Colab is that Binder is meant to run arbitrary environments that you define in a GitHub repository. While most online notebooks platforms run a “kitchen sink” environment that has a ton of pre-installed stuff, Binder’s approach is to give users control over the environment for their sessions to encourage more reproducible and well-contained code / analyses / communications / etc. This added complexity (flexible environment generation) adds some time to launches.
Most of the time when a repository is (very) slow (more than 30s0 to launch it is because the environment for that session must be built and initialized. This mostly happens to people “developing” on a repository (constantly changing things and launching right away).
For most users of a Binder link the environment is already built. This is because someone else has previously launched the same version. this can still be slow but not very slow (more than 30s).
mybinder.org runs on Kubernetes, which runs a cluster that grows and shrinks as necessary to take on new users. Each time a user clicks a Binder link, these things happen:
- A slot (called a “pod”) is reserved on one of the cloud machines. This takes 1-2 seconds.
- Binder looks to see if a Docker image exists for that repository
- If it doesn’t, Binder must first build the image for that repo using
repo2docker
(this takes time)
- If it doesn’t, Binder must first build the image for that repo using
- Binder looks for a built image on the machine the user will use
- If it isn’t on the machine, Binder must first pull the image onto that machine (this takes time)
- Binder launches the user’s session. This includes:
- a small amount of time to start the “init pods to limit network access”,
- a few seconds for the Jupyter process to start,
- a few seconds for BinderHub to notice,
- and finally, your browser needs to follow the redirect.
Each of these steps collectively influences how long it takes for a new session to start. In addition, how much each step contributes to the total launch time depends on the repository.
For example:
- if your repository results in a 30GB Docker image, then it will almost certainly take a long time on steps 2 and 3.
- if your repository is rarely launched, then when somebody launches it there is a good chance the Docker image won’t be on the machine. This means step 3 will take time instead of being instant.
Generally speaking, steps 2 and 3 contribute the most to a Binder launch. If the Docker image is both already built and already on the machine where a new user is starting their session, then the session should launch in a matter of seconds (our statistics say you should be waiting about 20s or so).
How can I reduce my launch time?
With that being said, in order to reduce the amount of time it takes your repository to launch, try these steps:
-
Make your repository environment more light-weight - A repository with fewer dependencies and a smaller size will be faster to both build and download into the Binder session.
-
Ensure your repository gets a lot of clicks - The more often that a repository is launched, the more likely it will already be built and downloaded to a machine when a user starts a new session. As a result, the more popular a repository is, the faster launches will tend to take.
-
Use two repositories: one for the environment, one for your content - many people change their content much more often than they change the environment needed for it. However, Binder will re-build the environment for any changes to a repository. A hack to get around this is to define an “environment repository” that Binder builds, and use a hook to pull in new content at launch from a “content repository”. This means that your “environment repository” changes less-often, which should result in fewer new builds and reduced launch times. See the instructions in this post to get started.
-
Use the
nbgitpuller.link
page to automate separate content/environment repos. The above step can be (mostly) automated by usingnbgitpuller.link
. This is a little web form that generates JupyterHub links for you. To quickly create a link for content/environment repositories, go here:nbgitpuller.link?tab=binder
and fill out the form.
You can also pre-populate the form with some fields. For example:
nbgitpuller.link/?tab=binder&repo=https://github.com/binder-examples/requirements
will use the
binder-examples
repository as the “environment” repo. -
Contribute to the Binder project - mybinder.org is a volunteer-run service that uses cloud credits and donated infrastructure to operate. There are likely ways that we can improve the performance of launches, but this requires resources. Donating your time, or money, or cloud infrastructure to the Binder project can help us improve Binder for everybody. See this contributing page for inspiration.
-
Join the mybinder.org federation - mybinder.org is not a single BinderHub deployment, but is in fact a collection of BinderHub deployments run by various teams. If you’d like to run such a deployment, or help maintain and support one of the pre-existing deployments, this could result in more cloud resources available to mybinder.org, which may result in reduced launch times. See the mybinder.org federation page for more information.
Those are a few tips that come to mind, and I hope that they give some inspiration for what you can do to speed things up! If others have suggestions of their own, I’m marking this top post as a wiki, meaning that anybody can edit it