Would a "The Littlest Binder" be useful?

Y’all should check out a (what is I think undocumented) feature of BinderHub to ban repositories. This uses a regex, so I think you should be able to do something like “match anything that is not the repository you want to allow

See here for where this is configured in mybinder.org:

We should definitely add this to the documentation. See here for an issue I just opened:

1 Like

Been thinking again about a “littlest Binderhub” and how it might relate to a couple of contexts:

  • a class that provides several Jupyter Book books that each run against a different docker container;
  • electron wrapped courses as per the spacy course (ANN) (which in turn makes me wonder: electron wrapped Jupyter Book?)

In this case, it might be useful to have an easily installed Binderhub environment that can launch one or more containers in order to support a range of courses / books?

See also the discussion around snakestagram / custom conda envts for baking into nteract app.

Thank you @choldgraf!

For anyone interested in this, the syntax is as follows:

config:
  GitHubRepoProvider:
    banned_specs:
      - ^(?!org\/repo).*$ 

where org and repo are the ones you want to allow.

Will also add to the issue.

2 Likes

I spent all day procrastinating on other important work I had to do, so here’s a demo of an instance of The Littlest Binder!

You can access it at http://34.74.126.139. I’ll probably keep that running for a few more days or so.

The core of this is a new repo2dockerspawner. Images are built if needed with repo2docker every time a user server starts. So if you are pointing to ‘master’ and it has new commits, new users will automatically.

There’s a lot of work left to do, but this is a good start. Soon enough, we can have a TLJH Plugin that turns it into The Littlest Binder. repo2dockerspawner probably has a lot of other single-node uses too.

2 Likes

Ooooh… exciting…

So is this “just” a jupyterhub with a spawner that gives it the ability to launch a against a repo, rebuilding if necessary? So presumably the Jupyterhub landing page could also give a user a selection of several environments - named docker containers, git repos - from which they could launch a specific environment?

And TLJH plugins look really interesting too…

Yeah, DockerSpawner already lets you pick from a whitelist of images, so mapping these onto repos instead should be a pretty small change.

If you require the admin to build images beforehand rather than enabling builds as part of spawn, then the default DockerSpawner + image_whitelist should be all you need. It’s less automatic and therefore less “bindery” but might fit, depending on your needs.

1 Like

So… erm… could / does TLJH support a py / conda env spawner? So you could have separate envts, user selectable, on TLJH?

[Update: hmm; I suppose a better way to offer different environments is just to install different kernels that a user could open a notebook with. ]

Yup, this is an easy next step.

:heart: Them. Helps keep TLJH small and focused while allowing other cool things to build on top of it.

Yes… the plugins look like a really useful thing…

And made me wonder… If I have binder directory with conda.yml file (for example), then it should be relatively to figure out what I’d need to put into an environment customisation plugin?

Yep! In fact, I’ve already written most of the code for that in https://github.com/pangeo-data/pangeo-stacks/blob/193204e330bb3771855b1225fa41ad4663d63aca/onbuild/r2d_overlay.py. Needs to be extracted into its own package though.

1 Like

Just trying to keep up by explaining to myself what I think this all means / makes possible / requires! :wink:

Hi @yuvipanda (Sorry it’s taken me so long to play with this!)

This looks perfect for the case studies in the Turing Way! It seemed super fast to load too, which is great from a user perspective! Thank you for developing it :sparkles:

What would the installation look like if I wanted something like this an autoscaling Kubernetes cluster?

Glad you like it, @sgibson91

This is currently optimized for running on a single node only. I think having it be the way you have running - with a binderhub installation limited to one repo only - is the easiest way to run this on an autoscaling Kubernetes cluster. A lot of the performance optimizations and simplifications we can do here you can’t in a distributed system like Kubernetes…

Ok that’s good to know. Though I guess at the moment, we’re not sure how popular the book is going to be! We could monitor this for a few months and see how many people are actually using the Binder links in the book (yet to come!). It may be more cost-effective for us to run a littlest BinderHub deployment if we’re not getting a lot of traffic to the autoscaling Hub.

Awesome :slight_smile: Keep me posted.

@mathematicalmichael I’d love for you to take a look at the demo too, and see if this is what you had in mind.

I’m mostly trying to gauge interest in repo2dockerspawner so I can prioritize it appropriately.

Hey @yuvipanda thank you for pinging me! I hadn’t seen the updates to this thread yet!

@choldgraf mentioned a way to use regex (clarified by @sgibson91) to achieve a similar result with a fork of littlest-binderhub. I believe the use case they mention regarding the lack of resources on mybinder.org is most closely aligned to what I have in mind.

I checked out your link, (also thank you… I had never seen the ipython repo before! It’s so full of useful stuff), and was very happy with how quickly it loaded by comparison to the binder link on their github. I love the next step that @psychemedia asked about wherein a dropdown list is used (I implemented this once in a Jupyterhub so students could poke around and explore the differences between a few environments while maintaining access to the same persistent files).
(I use DockerSpawner and am fairly sure I pulled that code from something I saw @minrk post in an Issue regarding image_whitelist).
I think I understand what he mentioned about a solution that relies on re-building images (but that is something I would like to avoid doing manually. If I push to master, I expect the link to just be ready the next time someone visits, the same way binder does now). Problem is, nothing I do right now is popular enough to live in cache, so those load times are great to avoid!

I’m very much interested in helping others publish open source “textbooks” (sites) with interactive examples (like spacy mixed with jupyterbook or the various similar static-site alternatives), and simply want it to be the case that it is easy to set up to scale traffic, so I don’t have to worry about a server I rent going down.

So again, same use case as @sgibson91. If someone visits a link and an ephemeral container is created for them with only a single node but perhaps more memory, that’s already a good start. multi-node/ability to parallelize is much lower priority but firmly in the ‘would be nice’ column. I don’t quite understand the nuances of performance optimizations, but I can already tell that this is fairly close to what I have in mind. I have yet to set up anything on kubernetes that can scale, but from my understanding, using a binder with the whitelist seems to be an appropriate way to handle scaling the computational needs of an online course. I really appreciate your work on this (and more).

My other use-case for this is (building up towards those ambitions) building interactive demonstrations of my thesis work on my website. There, the problem isn’t so much resources as it is lag time for binder to start up. Which this project (and scaling with single-nodes in kube) … most certainly solves. Just to clarify, the link you provided is running on a single-node server and can handle multiple users at once?

@mathematicalmichael Not sure how relevant this is but I recently tried wrapping a Jupyter Book in an electron app shell and it seemed to work okay against a local kernel.

This provides the advantage of being able to distribute the static textbook elements in self serving, cross platform way (in principle at least; the current demo targets a build to Mac only), with ThebeLab making code executable against a kernel outside the electron app.

Three things that would make the demo more compelling:

  1. a recipe that lets you target a build to Win/Mac/Linux;
  2. ability to resize the app window;
  3. ability to set/select alternative kernel runners (MyBinder, other Binderhubs, Bingderhub by URL, remote kernel IP address/port (maybe even local notebook server autodiscovery).

Down the line, the ability to pack and run a kernel inside the app, and/or run notebook within the electron browser against a Pyodide kernel (I have no idea if the latter is event possible!)

1 Like

very cool! that’s certainly an interesting approach, (perhaps thinking too far ahead here), but having the ability for a student to open up a website on an ipad and have their “textbook” be there is a real appeal. I know jupyterlab still isn’t quite up to par on that device, but lots of websites most certainly are (like https://jupyter.org/jupyter-book/intro.html). The thinking is that even a smartphone can suffice for self-teaching if the website is well-designed enough.

The electron app is neat but (I feel) the second you ask someone to download a thing, you add some immediate friction. For packaging a software solution for a client (something that’s been a struggle before on smaller projects with non-technical people), your solution is pretty perfect, and something I’m keeping on my radar.

Yes, the download step is a friction step, but it you want to read the text content (even if not execute the notebook code) in an offline setting, the electron app lets you do that without having to install and run a notebook server or find an HTTP server to serve the Jupyter Book (or maybe it works without an http server anywy? I haven’t checked.)

Using ThebeLab to launch a Binder kernel from the electron app / book also means that if you do have a network connection, you can execute the code from the app book without having to install a Jupyter server yourself.

1 Like

Thinking about TLJH plugins… If I were an instructor, and packaged up course requirements as plugin, that’d give me an easy route to build and test my set up locally, then give it to a Faculty IT person (for example) to a customise a Faculty managed TLJH server.

But what if one of my students wanted to run the set up in their own, local, single user notebook environment? Could the plugin be used to set that up?

(I’ve often thought it might make sense to try to define course requirements as a Python package that contains nothing but a README and a set of package requirements; this would mean a student could install the required course environment using something like pip install myUni-py101).

1 Like