Tip: embed custom github content in a Binder link with nbgitpuller

@fmaussion ah, deprecation: good to know… thanks…

[UPDATE] Hmm… replacing subPath with urlpath breaks everything for me.

eg
https://gke.mybinder.org/v2/gh/ouseful-demos/binder-base-boxes/textanalysis/?urlpath=git-pull?repo=https://github.com/psychemedia/showntell%26amp%3Bbranch=linguistics%26amp%3Bsubpath=4.2.0%20Classics.ipynb

works fine (if Binder didn’t keep sending me to different clusters and having to build all the time!) but:

https://gke.mybinder.org/v2/gh/ouseful-demos/binder-base-boxes/textanalysis/?urlpath=git-pull?repo=https://github.com/psychemedia/showntell%26amp%3Bbranch=linguistics%26amp%3Burlpath=4.2.0%20Classics.ipynb

is very broken…

1 Like

@psychemedia I’ve been playing around with this all day and got this to work for us. For example, I’ve just written this:

http://edu.oggm.org/en/latest/user_content.html

@betatim , you write:

I’d be -1 on adding nbgitpuller to repo2docker because it is niche.

I understand, but my first impression after moving from a single repository to a content + environment repository is quite good. And the use case I’ve explained in the link above is a highlight for me! Basically, instructors can now just focus on the content and write content based on the model, while not having to fight with mybinder configurations at all. Let’s see how it goes with time, but I’m quite excited about this mybinder+nbgitpuller configuration :wink:

Interesting thanks… what does the autodecode suffix do?

I think there is structure in the URLs with the urlpath mode relating to the repo name and other path elements, but I need to check a couple more examples with a clear head to check I have the structure right…

On my to do list is explore adding a link generating tab to the nbgitpuller link generator, but it won’t be for at least a week now… Isle of Festival mode for me for a few days. now…

This is so delightful! I love the idea of having a reference binder (say for a team/lab) and everyone can point their repos to it. Can there be an endpoint that launches RStudio server instead of Jupyter? Pretty please?

I think that’s what folks above were trying out (maybe with limited success?). I think this would be good to support though, if we can figure out what’s preventing it from working

Not sure which point you were referring to? If it was the question re: running RStudio server rather than Jupyter, I think this generalises to Using Jupyterhub & Binderhub to launch arbitrary containers ?

I generally try to test various combinations of running Jupyter notebooks/lab (arbitrary kernels), RStudio (R, shiny), and OpenRefine (java) along with Postgres (as headless service). I stalled trying to use Binderhub to just launch either RStudio or just OpenRefine, decided to leave it for a week or two to see if inspiration about how to get it working came to mind, and haven’t had a chance to get back to it (albeit without any inspiration!)

Because why wouldn’t you use a docker image with so many tools pre-installed that it is a miracle it works as your general purpose environment to run notebooks?

How can I use this?

To use it add the link to the Git repository that contains your notebook to the end of this URL:

https://mybinder.org/v2/gh/betatim/kaggle-binder/master?urlpath=git-pull?repo=<URL_TO_YOUR_REPO_HERE>

Yes, there are two ? in that URL. It has to be like that.

As an example the this link will launch the Kaggle environment and fill it with the contents of https://github.com/betatim/binderlyzer

What libraries are included?

Check out the (huge) Dockerfile from which the Kaggle kernels environment is built: https://github.com/Kaggle/docker-python/blob/11dc81f96a263027ff1dc4fd126bc922f1d76bac/Dockerfile. It probably contains the data science library you were thinkinng of using.

Why does it take so long to launch?

The Kaggle kernels docker image is about 9GB in size. It takes a while to move that image to a node that doesn’t have it in its cache.

1 Like

IS that a challenge to try to put together a Dockerfile containing every pypi application in one go?! Or a conda envt with every conda package?!

You mean the kaggle kernels Dockerfile? Looks more like a dare if you ask me.

1 Like

If you all do this you should call it THE-KITCHEN-SINK

This is really cool! Has anyone managed to get nbgitpuller links working from within a running binder rather than built into the initial launch? For example in a tutorial setting, everyone launches the base binder and then pastes in links to several different repos without needing to know any git.

This approach definitely works from a dedicated jupyterhub (https://jupyterhub.github.io/nbgitpuller/link.html), but I’m seeing 403: Forbidden errors if I try to open links from within a binder session.

Could you make an example link @scottyhq? And explain again what you would like to do, I am not sure I get it :frowning:.

You start a new binder, then do some stuff in it, then the instructor says “now we need the content of repo X” and gives everyone a link to click that triggers a nbgitpuller actionn in the binder instance I am already running?

You start a new binder, then do some stuff in it, then the instructor says “now we need the content of repo X” and gives everyone a link to click that triggers a nbgitpuller actionn in the binder instance I am already running?

Exactly!

As a concrete example, we start @fmaussion’s really great tutorial : https://mybinder.org/v2/gh/OGGM/oggm-edu-r2d/master?urlpath=git-pull?repo=https://github.com/OGGM/oggm-edu%26amp%3Bbranch=master%26amp%3Burlpath=lab/tree/oggm-edu/notebooks/oggm-edu/welcome.ipynb%3Fautodecode

Which sends us to https://hub.gke.mybinder.org/user/oggm-oggm-edu-r2d-vurgegax/lab?autodecode

We work for a while and then want to try bringing in a new repo (w/o dealing with git and assuming we have all the required packages - https://github.com/ICESAT-2HackWeek/data-access).

Following https://jupyterhub.github.io/nbgitpuller/link.html we get a link that looks like this:
https://hub.gke.mybinder.org/hub/user-redirect/git-pull?repo=https%3A%2F%2Fgithub.com%2FICESAT-2HackWeek%2Fdata-access&urlpath=lab%2Ftree%2Fdata-access%2Fnotebooks%2FNSIDC+DAAC+ICESat-2+Customize+and+Access.ipynb

I’m guessing something isn’t compatible here with the /hub/user-redirect/

1 Like

@choldgraf
I really like this idea. As it currently works, it has an unexpected characteristic that I discovered while playing with the concept on an internal version. We have found it very convenient to be able to use git in jupyter to enable round-trip development by committing changes in the jupyter notebook/lab environment and pushing them back to the remote repo.

As you have this implemented today, the remote is the repo of the environment rather than the repo with the content. I did not expect this. I think that a user would always want any changes to be pushed upstream to the content repo, not the environment repo.

You can see this easily by cat .git/config from a Terminal window. Changes would be pushed to choldgraf/binder-sandbox instead of the probably more desirable data/materials-fa17. Yes, you would not expect somebody who did not have ownership rights to the content repo to try to push anything back, but if you owned it, or someone forked it, I think the remote content repo is the more desirable target.

The repo is cloned to a subdirectory, ~/materials-fa17

@manics Ah. Of course! Thanks.

Riffing on this thread alongside Binderhub button - 'pull from referrer' (and maybe Binder template repositories) I wonder (feature creep ;-)…

Docs around @manics recent https://github.com/jupyterhub/binderhub/pull/891 PR suggests:

In a GitHub repo create a readme with a link to https://binder.example.org/autodetect, if it works the referrer will be parsed and converted into a link to launch the repo you came from.

So what if there was also a redirect saying: “(and) by default/convention look for a “binder-base” branch in the same same directory; if it exists, build / pull that, and then top up with content from an nbgitpulled content repo”.

For example, running:

https://binder.example.org/autodetectwithbase

from https://github.com/user/example would:

  • autodetect https://github.com/user/example as referrer;
  • build/pull https://github.com/user/example/tree/binder-base
  • nbgitpull https://github.com/user/example into the binder image

Complicating further, there may also be a need to allow users to over-ride the gitpulled branch name with an arbitrary one, as well as allowing Binder to autodetect a referral from a branch specifying content from that branch is the content to be pulled in?

1 Like

I’m worried there’s a bit too much magic here, at least to begin with :slight_smile:

As a compromise perhaps there could be a new repo2docker buildpack e.g. environmentrepo.giturl that contains a link to the environment repository/branch? Or there could be a convention for specifying it in the README. Conceptually it’s a bit like a symlink to the files in the other environment repo, internally of course the repos would be handled separately. This requires the notebook repo owner to add this file, but I think implements everything you’ve suggested without any complex logic

@manics That “symlink” approach would work for me :slight_smile:

Just trying to think of a way where a user can easily say:

  • use this branch for the Binder image;
  • use this branch for the nbgitpull;
  • use the referrer as the location of the repo.

I guess things start to get even more complicated if you have the image builder branch and the content branch in different repos…!

However, if you enforce a convention, things get easier; eg easiest for magic to work might be:

  • content repo must be in master;
  • Binder build repo must be in binder-build and must include nbgitpuller;
  • both branches need to be branches of the same repo.

But being able to pop a link into a simple environment file to specify eg the Binder image branch would make sense (you wouldn’t want the link in the Binder build branch because that’s the one we’re trying not to change at at all).

Thinking back on the template repo idea, if a template repo:

  • has a Binder image build branch;
  • has an autodetect path in the README binder button link;
  • has a reference that points back to the build branch using an absolute URL

a user could derive a repo from the template, update the content file, click the button, launch against the Binder image specified originally in the template repo.

Alternatively, you could clone a repo containing a content (master) branch and a build branch and specify the build branch relatively from within the content repo.

If all the metadata is encoded in the readme instead of (or as well as) a environmentrepo.giturl file the readme for the notebook repo could be something like this:

# Notebook Repo

This repo contains just notebooks

Clicking this link will open this repository in binder by detecting the
referrer:

[open with mybinder](https://binder.example.org/autodetect)

This is a special metadata tag that tells binder to build the
environment for this notebook from a GitHub branch called
`branch` in `[example-user]/repository`:

environment-repo: https://github.com/[example-user]/repository/tree/branch