Tip: speed up Binder launches by pulling github content in a Binder link with nbgitpuller

Ah, right… yes… understood. Other factors… :frowning:

Pondering this a bit more eg in context of this repo which is gitpulled into this Binder build (discussion), I wonder…

Github conventionally uses the gh-pages branch as a “reserved” branch for constructing Github Pages docs related to a particular repo.

The binder/ directory in a repo can be used to partition Binder build requirements in a repo, but there are a couple of problems associated with this:

  • a maintainer may not want to have the binder/ directory cluttering their package repo;
  • any updates to the repo will force a rebuild of the Binder image next time the repo is run on a particular Binder node. (With Binder federation, if there are N hosts in the federation, after updating a repo, is it possible that my next N attempts to run the repo on MyBinder may require a rebuild if I am directed to a different host each time?)

If by convention something like a binder-build branch was used to contain the build requirements for a repo, then the process for calling a build (by default) could be simplified.

Eg rather than having something like:


we would have something like:


which could simplify to something that defaults to a build from binder-build branch (the “build” branch) and nbgitpull from master (the “content” branch):


Complications could be added to support changing the build branch, the nbgitpull branch, the commit/ID of a particular build, etc?

It might overly complicate things further, but I could also imagine:

  • automatically injecting nbgitpuller into the Binder image and enabling it;
  • providing some sort of directive support so that if the content directory has a setup.py file the package from that content directory is installed.

@choldgraf, thanks a lot for these instructions, they work like a charm!

I have a question though: does the syntax allow to pull into a jupyter-lab, and at a specific notebook location? I’ve been playing around with the urls bellow, but without success:

content repo: https://github.com/OGGM/oggm-edu
env repo: https://github.com/OGGM/oggm-edu-r2d

The most basic of the links works fine:

Now, what if I want to open jupyter-lab at a specific location? The following keeps opening the notebook on root:


@fmaussion I’ve found you need to bring escaping into play so that different arguments are correctly associated with different parts of the URL when it comes to be parsed.

This pattern works for me:


@psychemedia this is great! Works with subpaths. And if I want to open in a jupyter lab? Adding lab/tree/ to the path gives a 404 error

I solved it: @psychemedia the subpath option is deprecated and should be relpaced with urlpath (source)

@fmaussion ah, deprecation: good to know… thanks…

[UPDATE] Hmm… replacing subPath with urlpath breaks everything for me.


works fine (if Binder didn’t keep sending me to different clusters and having to build all the time!) but:


is very broken…

1 Like

@psychemedia I’ve been playing around with this all day and got this to work for us. For example, I’ve just written this:


@betatim , you write:

I’d be -1 on adding nbgitpuller to repo2docker because it is niche.

I understand, but my first impression after moving from a single repository to a content + environment repository is quite good. And the use case I’ve explained in the link above is a highlight for me! Basically, instructors can now just focus on the content and write content based on the model, while not having to fight with mybinder configurations at all. Let’s see how it goes with time, but I’m quite excited about this mybinder+nbgitpuller configuration :wink:

Interesting thanks… what does the autodecode suffix do?

I think there is structure in the URLs with the urlpath mode relating to the repo name and other path elements, but I need to check a couple more examples with a clear head to check I have the structure right…

On my to do list is explore adding a link generating tab to the nbgitpuller link generator, but it won’t be for at least a week now… Isle of Festival mode for me for a few days. now…

This is so delightful! I love the idea of having a reference binder (say for a team/lab) and everyone can point their repos to it. Can there be an endpoint that launches RStudio server instead of Jupyter? Pretty please?

I think that’s what folks above were trying out (maybe with limited success?). I think this would be good to support though, if we can figure out what’s preventing it from working

Not sure which point you were referring to? If it was the question re: running RStudio server rather than Jupyter, I think this generalises to Using Jupyterhub & Binderhub to launch arbitrary containers ?

I generally try to test various combinations of running Jupyter notebooks/lab (arbitrary kernels), RStudio (R, shiny), and OpenRefine (java) along with Postgres (as headless service). I stalled trying to use Binderhub to just launch either RStudio or just OpenRefine, decided to leave it for a week or two to see if inspiration about how to get it working came to mind, and haven’t had a chance to get back to it (albeit without any inspiration!)

Because why wouldn’t you use a docker image with so many tools pre-installed that it is a miracle it works as your general purpose environment to run notebooks?

How can I use this?

To use it add the link to the Git repository that contains your notebook to the end of this URL:


Yes, there are two ? in that URL. It has to be like that.

As an example the this link will launch the Kaggle environment and fill it with the contents of https://github.com/betatim/binderlyzer

What libraries are included?

Check out the (huge) Dockerfile from which the Kaggle kernels environment is built: https://github.com/Kaggle/docker-python/blob/11dc81f96a263027ff1dc4fd126bc922f1d76bac/Dockerfile. It probably contains the data science library you were thinkinng of using.

Why does it take so long to launch?

The Kaggle kernels docker image is about 9GB in size. It takes a while to move that image to a node that doesn’t have it in its cache.

1 Like

IS that a challenge to try to put together a Dockerfile containing every pypi application in one go?! Or a conda envt with every conda package?!

You mean the kaggle kernels Dockerfile? Looks more like a dare if you ask me.

1 Like

If you all do this you should call it THE-KITCHEN-SINK

This is really cool! Has anyone managed to get nbgitpuller links working from within a running binder rather than built into the initial launch? For example in a tutorial setting, everyone launches the base binder and then pastes in links to several different repos without needing to know any git.

This approach definitely works from a dedicated jupyterhub (https://jupyterhub.github.io/nbgitpuller/link.html), but I’m seeing 403: Forbidden errors if I try to open links from within a binder session.

Could you make an example link @scottyhq? And explain again what you would like to do, I am not sure I get it :frowning:.

You start a new binder, then do some stuff in it, then the instructor says “now we need the content of repo X” and gives everyone a link to click that triggers a nbgitpuller actionn in the binder instance I am already running?

You start a new binder, then do some stuff in it, then the instructor says “now we need the content of repo X” and gives everyone a link to click that triggers a nbgitpuller actionn in the binder instance I am already running?


As a concrete example, we start @fmaussion’s really great tutorial : https://mybinder.org/v2/gh/OGGM/oggm-edu-r2d/master?urlpath=git-pull?repo=https://github.com/OGGM/oggm-edu%26amp%3Bbranch=master%26amp%3Burlpath=lab/tree/oggm-edu/notebooks/oggm-edu/welcome.ipynb%3Fautodecode

Which sends us to https://hub.gke.mybinder.org/user/oggm-oggm-edu-r2d-vurgegax/lab?autodecode

We work for a while and then want to try bringing in a new repo (w/o dealing with git and assuming we have all the required packages - https://github.com/ICESAT-2HackWeek/data-access).

Following https://jupyterhub.github.io/nbgitpuller/link.html we get a link that looks like this:

I’m guessing something isn’t compatible here with the /hub/user-redirect/


I really like this idea. As it currently works, it has an unexpected characteristic that I discovered while playing with the concept on an internal version. We have found it very convenient to be able to use git in jupyter to enable round-trip development by committing changes in the jupyter notebook/lab environment and pushing them back to the remote repo.

As you have this implemented today, the remote is the repo of the environment rather than the repo with the content. I did not expect this. I think that a user would always want any changes to be pushed upstream to the content repo, not the environment repo.

You can see this easily by cat .git/config from a Terminal window. Changes would be pushed to choldgraf/binder-sandbox instead of the probably more desirable data/materials-fa17. Yes, you would not expect somebody who did not have ownership rights to the content repo to try to push anything back, but if you owned it, or someone forked it, I think the remote content repo is the more desirable target.