Building a "The Littlest BinderHub"

(Different from Would a "The Littlest Binder" be useful?, which focuses on just one repo, not a fully functional binderhub)

I opened Support running without kubernetes, just docker · Issue #1318 · jupyterhub/binderhub · GitHub to work on adding a pure docker (no kubernetes) backend to BinderHub. @manics is doing related work
on repo2docker.
There’s nothing inherent about binderhub that requires kubernetes. What it needs really is:

  • A way to spawn repo2docker
  • A way to tell a JupyterHub to run users with a specific image

This can be done with just docker, and will enable us to build a true ‘Littlest BinderHub’, providing the entire functionality of BinderHub but within a single VM. This has a couple advantages:

  1. Makes it infinitely easier to run a BinderHub that you don’t expect a lot of traffic for (esp. when used with Auth)
  2. Enable abstraction that would make BinderHub usable in broader contexts in the future, such as on HPC machines

I’ve laid out a pathway for how to get there in the issue, and am slowly working on it. Please comment here on discourse if this is something you will be interested in, and on the GitHub issue if you have opinions on how this should be accomplished.

4 Likes

We’d use this extensively at UBC, especially for upper year courses based on jupyter-books and pangeo docker images. Would this take a different approach than GitHub - plasmabio/tljh-repo2docker: Plugin for The Littlest JupyterHub to build multiple user environments with repo2docker ? One additional feature that we would be happy to help with would be deploying with ansible, following the example of @jtp and colleagues: plasma/tljh.yml at 7883a6a1266b69ab49f353bdf9974be408bf0709 · plasmabio/plasma · GitHub

Yes - this will be running the BinderHub software (that powers mybinder.org) directly. tljh-repo2docker uses repo2docker directly and has its own UI instead.

For a distribution, I’d imagine this would get deployed as two docker containers. Should be pretty easy to wrap the container deploys in ansible.

For a lot of edu use cases, I could see this being really hard in orgs where it might be institutionally difficult to to get k8s support but achievable to run a service from a single container.

With such a server, it’d presumably be easy enough to lock it down to run just a single whitelisted repo, or one of a few whitelisted services, ideally from a really simple list of whitelisted repo URLs?

There is a related issue to this open already: Document how to ban repositories in binderhub · Issue #848 · jupyterhub/binderhub · GitHub

Hmmm… what permissions are required to launch a container from another docker container?The same permissions would presumably apply to launching a set of linked containers using docker-compose? I think docker-compose has just been bumped up to a first class member of the docker CLI (docs). Binderhub doesn’t do docker compose, I think? I’m guessing repo2docker doesn’t either? But if a docker-compose.yml file was referring to separate images build from different subdirs, I guess that you just mean calling repo2docker on each then running them via docker-compose. So repo2docker would have to parse the docker-compose.yml to look for build: steps?

wrap the container deploys in ansible.

indeed… there’s a lot of stuff that could just be done by ansible, though i’ve struggled to keep track of the current state of things with their package reorg.

On the repo front, an in-house binder could be much more aware of where it’s pulling stuff, e.g. actually use API calls in the UI to autocomplete repo/branch names.

I think this again raises the question of whether repo2x might be solved for more than x=docker. In some HPC settings, for example, the docker play might be substantially more complicated, where they have slurm|whatever and that’s the way they likes it. Somebody mentioned a packer-based approach, which has a lot of legs, as it can target just about anything (including docker).

Alternately, and not to bang the old conda drum too much, but with a bit of time in the conda-forge mine for some hub deps (i gave up when traefik needed bzr to build from source), conda-packs could be a compelling target. packs are dumb tarballs that can manage anything that fits in an as installed conda PREFIX which could include pip|npm|gem installed stuff, or just ./configure --prefix $PREFIX && make && make install. Instead of running a docker registry, these tarballs could be stored by their input commits, e.g. /opt/repo2pack/<sha> or potentially with some salt from e.g. the builder major version, or something content-addressable.

The win here is, provided ingress to 443 was handled, nothing else would probably need to run with elevated permissions.

See:

And am sure others.

1 Like

“repo2packer” was my hacked repo:

All it does is convert the internal Dockerfile generated by repo2docker into a shell-script, and then adds an optional packer template. I think we’re stuck with using the Dockerfile as the interface for now, though longer term there may be other options, such as CNCF buildpacks:

Talking of Ansible, if you want to go all-in, this might entertain you :smiley: GitHub - manics/jupyterhub-ansiblespawner: Spawn JupyterHub single user notebook servers using Ansible

1 Like