Tip: embed custom github content in a Binder link with nbgitpuller

I think we have consensus on what “custom base images in repo2docker” should look like: https://github.com/jupyter/repo2docker/issues/487#issuecomment-479794426 and the next few comments after the linked one (plus really the whole thread for several options that were considered, trade-offs, prototypes and alternatives. I like the converged upon idea and would recommend we try to implement it before re-opening the discussion. Otherwise we spend forever talking and not so much doing. We aren’t yet a committee :slight_smile:

I’d be -1 on adding nbgitpuller to repo2docker because it is niche. A constant battle is the perception that repo2docker created images contain “bloat” or “for real uses cases one needs a custom Dockerfile to remove bloat” etc etc. So we should make an effort to keep thing slim (because they are!) which means only adding things to core repo2docker that are used very widely, even if (like nbgitpuller) they don’t actually increase the image size all that much. Instead more documentation and “cookie cutter” repos specialised to these use cases.

Those are my thoughts on how to address this.

On a more positive note: having new, other, more user interfaces and user experiences for how to create your “binder link” is very cool. They can and should be hosted/built separately of BinderHub. The fact that it doesn’t need the central oversight committee to agree to any of it is a feature :slight_smile:

1 Like

Just caught up with that thread: extensions / plugins for repo2docker, brilliant. Makes for easier community contributions. Here’s a related line of thinking from Simon Willison on datasette plugins.

1 Like

Thanks for the link to datasette. Simon seems to have settled on pluggy/likes it so I am adding that to my list of things to checkout. I really like the idea of (one day) having something like plugincompat.herokuapp.com for repo2docker.

datasette also has a range of tools for packaging and deploying datasettes: datasette publish. There are various issues in the repo where Simon runimated on various aspects of this I think.

The use case is much more limited / constrained than repo2docker, but again, ways of doing things and if different ecosystems merge on good practice or common utils, that’s handy for future projects. (eg datasette uses click for command line interface IIRC).

Apols if this is a distraction; I try to make sense of Jupyter / where it might be developing, by trying to make sense of it in context of other things I don’t really understand either!:wink:

1 Like

@choldgraf Just been playing with this quickly and it’s absolutely bonkers in a brilliant way:-)

eg https://mybinder.org/v2/gh/ouseful-testing/binder-graphviz/master/?urlpath=git-pull?repo=https://github.com/hchasestevens/show_ast

eg provides a mechanism for showing repo maintainers how their repo looks in mybinder, and can also be used to demo requirements for making it runnable ex- of their repo and ex- of requiring a PR on it.

Reminds me of URL hacking to chain different things across different APIs. Can it also accept a redirect to open into a specified notebook?

@yuvipanda Adding a tab for Binderhub to the nbgitpuller link generator would be really useful I think… especially if folk found out about it…

2 Likes

This is pretty wild :smiley: So wild we should publicise it more!

1 Like

I’m wondering as well as nbgitpuller, there could be a generic (curl, wget etc) pull that could pull eg data files from a URL?

Not sure what I think about making it possible to fetch arbitrary URLs. For getting data my personal view is that one should use https://github.com/binder-examples/getting-data#how-to-get-data-into-your-binder.

One thing that feels uncomfortable to me with my mybinder.org-operator hat on is that if we let people construct URLs that make mybinder.org take action via its high bandwidth connection we become a more attractive target for being the source of a DOS. You can trigger a mybinder.org launch from a very slow dial up connection and if that launch then starts a (very large) download you nnow have a way to amplify your impact. (This is probably also true for letting people pull from git repos and generally true if we let people perform outgoing network connections from within mybinder.org but it still feels like making it “too easy” to do :-/)

1 Like

Ah, right… yes… understood. Other factors… :frowning:

Pondering this a bit more eg in context of this repo which is gitpulled into this Binder build (discussion), I wonder…

Github conventionally uses the gh-pages branch as a “reserved” branch for constructing Github Pages docs related to a particular repo.

The binder/ directory in a repo can be used to partition Binder build requirements in a repo, but there are a couple of problems associated with this:

  • a maintainer may not want to have the binder/ directory cluttering their package repo;
  • any updates to the repo will force a rebuild of the Binder image next time the repo is run on a particular Binder node. (With Binder federation, if there are N hosts in the federation, after updating a repo, is it possible that my next N attempts to run the repo on MyBinder may require a rebuild if I am directed to a different host each time?)

If by convention something like a binder-build branch was used to contain the build requirements for a repo, then the process for calling a build (by default) could be simplified.

Eg rather than having something like:

https://mybinder.org/v2/gh/colinleach/binder-box/master/?urlpath=git-pull?repo=https://github.com/colinleach/astro-Jupyter

we would have something like:

https://mybinder.org/v2/gh/colinleach/astro-Jupyter/binder-build/?urlpath=git-pull?repo=https://github.com/colinleach/astro-Jupyter

which could simplify to something that defaults to a build from binder-build branch (the “build” branch) and nbgitpull from master (the “content” branch):

https://mybinder.org/v2/gh/colinleach/astro-Jupyter?binder-build=True

Complications could be added to support changing the build branch, the nbgitpull branch, the commit/ID of a particular build, etc?

It might overly complicate things further, but I could also imagine:

  • automatically injecting nbgitpuller into the Binder image and enabling it;
  • providing some sort of directive support so that if the content directory has a setup.py file the package from that content directory is installed.
2 Likes

@choldgraf, thanks a lot for these instructions, they work like a charm!

I have a question though: does the syntax allow to pull into a jupyter-lab, and at a specific notebook location? I’ve been playing around with the urls bellow, but without success:

content repo: https://github.com/OGGM/oggm-edu
env repo: https://github.com/OGGM/oggm-edu-r2d

The most basic of the links works fine:
https://mybinder.org/v2/gh/OGGM/oggm-edu-r2d/master?urlpath=git-pull?repo=https://github.com/OGGM/oggm-edu

Now, what if I want to open jupyter-lab at a specific location? The following keeps opening the notebook on root:
https://mybinder.org/v2/gh/OGGM/oggm-edu-r2d/master?urlpath=git-pull?repo=https://github.com/OGGM/oggm-edu&urlpath=lab/tree/notebooks/oggm-edu/welcome.ipynb

Thanks!

@fmaussion I’ve found you need to bring escaping into play so that different arguments are correctly associated with different parts of the URL when it comes to be parsed.

This pattern works for me:

https://mybinder.org/v2/gh/USER/REPO/BUILDBRANCH/?urlpath=git-pull?repo=https://github.com/CONTENTUSER/CONTENTREPO%26amp%3Bbranch=CONTENTBRANCH%26amp%3BsubPath=CONTENTFILENAME.ipynb
1 Like

@psychemedia this is great! Works with subpaths. And if I want to open in a jupyter lab? Adding lab/tree/ to the path gives a 404 error

I solved it: @psychemedia the subpath option is deprecated and should be relpaced with urlpath (source)

@fmaussion ah, deprecation: good to know… thanks…

[UPDATE] Hmm… replacing subPath with urlpath breaks everything for me.

eg
https://gke.mybinder.org/v2/gh/ouseful-demos/binder-base-boxes/textanalysis/?urlpath=git-pull?repo=https://github.com/psychemedia/showntell%26amp%3Bbranch=linguistics%26amp%3Bsubpath=4.2.0%20Classics.ipynb

works fine (if Binder didn’t keep sending me to different clusters and having to build all the time!) but:

https://gke.mybinder.org/v2/gh/ouseful-demos/binder-base-boxes/textanalysis/?urlpath=git-pull?repo=https://github.com/psychemedia/showntell%26amp%3Bbranch=linguistics%26amp%3Burlpath=4.2.0%20Classics.ipynb

is very broken…

1 Like

@psychemedia I’ve been playing around with this all day and got this to work for us. For example, I’ve just written this:

http://edu.oggm.org/en/latest/user_content.html

@betatim , you write:

I’d be -1 on adding nbgitpuller to repo2docker because it is niche.

I understand, but my first impression after moving from a single repository to a content + environment repository is quite good. And the use case I’ve explained in the link above is a highlight for me! Basically, instructors can now just focus on the content and write content based on the model, while not having to fight with mybinder configurations at all. Let’s see how it goes with time, but I’m quite excited about this mybinder+nbgitpuller configuration :wink:

Interesting thanks… what does the autodecode suffix do?

I think there is structure in the URLs with the urlpath mode relating to the repo name and other path elements, but I need to check a couple more examples with a clear head to check I have the structure right…

On my to do list is explore adding a link generating tab to the nbgitpuller link generator, but it won’t be for at least a week now… Isle of Festival mode for me for a few days. now…

This is so delightful! I love the idea of having a reference binder (say for a team/lab) and everyone can point their repos to it. Can there be an endpoint that launches RStudio server instead of Jupyter? Pretty please?

I think that’s what folks above were trying out (maybe with limited success?). I think this would be good to support though, if we can figure out what’s preventing it from working

Not sure which point you were referring to? If it was the question re: running RStudio server rather than Jupyter, I think this generalises to Using Jupyterhub & Binderhub to launch arbitrary containers ?

I generally try to test various combinations of running Jupyter notebooks/lab (arbitrary kernels), RStudio (R, shiny), and OpenRefine (java) along with Postgres (as headless service). I stalled trying to use Binderhub to just launch either RStudio or just OpenRefine, decided to leave it for a week or two to see if inspiration about how to get it working came to mind, and haven’t had a chance to get back to it (albeit without any inspiration!)