Building a "The Littlest BinderHub"

yuvipanda · July 1, 2021, 1:14pm

(Different from Would a "The Littlest Binder" be useful?, which focuses on just one repo, not a fully functional binderhub)

I opened Support running without kubernetes, just docker · Issue #1318 · jupyterhub/binderhub · GitHub to work on adding a pure docker (no kubernetes) backend to BinderHub. @manics is doing related work
on repo2docker.
There’s nothing inherent about binderhub that requires kubernetes. What it needs really is:

A way to spawn repo2docker
A way to tell a JupyterHub to run users with a specific image

This can be done with just docker, and will enable us to build a true ‘Littlest BinderHub’, providing the entire functionality of BinderHub but within a single VM. This has a couple advantages:

Makes it infinitely easier to run a BinderHub that you don’t expect a lot of traffic for (esp. when used with Auth)
Enable abstraction that would make BinderHub usable in broader contexts in the future, such as on HPC machines

I’ve laid out a pathway for how to get there in the issue, and am slowly working on it. Please comment here on discourse if this is something you will be interested in, and on the GitHub issue if you have opinions on how this should be accomplished.

phaustin · July 1, 2021, 2:54pm

We’d use this extensively at UBC, especially for upper year courses based on jupyter-books and pangeo docker images. Would this take a different approach than GitHub - plasmabio/tljh-repo2docker: Plugin for The Littlest JupyterHub to build multiple user environments with repo2docker ? One additional feature that we would be happy to help with would be deploying with ansible, following the example of @jtp and colleagues: plasma/tljh.yml at 7883a6a1266b69ab49f353bdf9974be408bf0709 · plasmabio/plasma · GitHub

yuvipanda · July 1, 2021, 2:57pm

Yes - this will be running the BinderHub software (that powers mybinder.org) directly. tljh-repo2docker uses repo2docker directly and has its own UI instead.

For a distribution, I’d imagine this would get deployed as two docker containers. Should be pretty easy to wrap the container deploys in ansible.

psychemedia · July 1, 2021, 7:17pm

For a lot of edu use cases, I could see this being really hard in orgs where it might be institutionally difficult to to get k8s support but achievable to run a service from a single container.

With such a server, it’d presumably be easy enough to lock it down to run just a single whitelisted repo, or one of a few whitelisted services, ideally from a really simple list of whitelisted repo URLs?

There is a related issue to this open already: Document how to ban repositories in binderhub · Issue #848 · jupyterhub/binderhub · GitHub

psychemedia · July 1, 2021, 7:26pm

Hmmm… what permissions are required to launch a container from another docker container?The same permissions would presumably apply to launching a set of linked containers using docker-compose? I think docker-compose has just been bumped up to a first class member of the docker CLI (docs). Binderhub doesn’t do docker compose, I think? I’m guessing repo2docker doesn’t either? But if a docker-compose.yml file was referring to separate images build from different subdirs, I guess that you just mean calling repo2docker on each then running them via docker-compose. So repo2docker would have to parse the docker-compose.yml to look for build: steps?

bollwyvl · July 2, 2021, 12:01am

wrap the container deploys in ansible.

indeed… there’s a lot of stuff that could just be done by ansible, though i’ve struggled to keep track of the current state of things with their package reorg.

On the repo front, an in-house binder could be much more aware of where it’s pulling stuff, e.g. actually use API calls in the UI to autocomplete repo/branch names.

I think this again raises the question of whether repo2x might be solved for more than x=docker. In some HPC settings, for example, the docker play might be substantially more complicated, where they have slurm|whatever and that’s the way they likes it. Somebody mentioned a packer-based approach, which has a lot of legs, as it can target just about anything (including docker).

Alternately, and not to bang the old conda drum too much, but with a bit of time in the conda-forge mine for some hub deps (i gave up when traefik needed bzr to build from source), conda-packs could be a compelling target. packs are dumb tarballs that can manage anything that fits in an as installed conda PREFIX which could include pip|npm|gem installed stuff, or just ./configure --prefix $PREFIX && make && make install. Instead of running a docker registry, these tarballs could be stored by their input commits, e.g. /opt/repo2pack/<sha> or potentially with some salt from e.g. the builder major version, or something content-addressable.

The win here is, provided ingress to 443 was handled, nothing else would probably need to run with elevated permissions.

yuvipanda · July 2, 2021, 5:17am

See:

And am sure others.

manics · July 2, 2021, 5:03pm

“repo2packer” was my hacked repo:

All it does is convert the internal Dockerfile generated by repo2docker into a shell-script, and then adds an optional packer template. I think we’re stuck with using the Dockerfile as the interface for now, though longer term there may be other options, such as CNCF buildpacks:

github.com/jupyterhub/repo2docker

Explore CNCF v3 buildpacks

opened 04:41PM - 26 Jun 19 UTC

yuvipanda

From @jchesterpivotal in https://github.com/jupyter/repo2docker/issues/707#issue…comment-505904267 > By way of warning, what follows is hilariously biased: I've several times worked on two generations of buildpack technology over the past 5 years. Pride makes me defensive. > > As it was related to me by a Red Hatter I asked, `s2i` was created largely because the previous generations of buildpack lifecycles from Heroku (v2a) and Cloud Foundry (v2b) were optimised to a rootfs+tarball target (Heroku's term is "slug", Cloud Foundry's is "droplet"). That was considered unsuitable for OpenShift v3, which was an image-centric architecture. > > Whereas Heroku and Cloud Foundry would meet you at code and hid the underlying container infrastructure, OpenShift would meet you at the image, so the latter (this is a personal opinion) had a business need for something _like_ buildpacks to reduce the convenience gap. > > But `s2i` never really found a home outside of OpenShift, while buildpacks have flourished in two massive, independent but genetically-related ecosystems. > > **Critically, the emergence of the v2 registry API enables features (particularly layer rebasing) that were previously impossible. In addition Google's Container Tools team developed and maintain the `google-gocontainerregistry` library which allows us to perform construction and rebasing operations with or without the docker daemon. The design of CNBs takes full advantage of both of these advances.** > > By way of speed improvements: We have observed some Java rebuilds drop from minutes to milliseconds. We expect large-cluster rollouts to drop from dozens of hours to potentially minutes. > > Edit: **I should add, your reasons for moving off `s2i` would apply to v2a and v2b buildpack lifecycles as well. One of the motivating problems faced by both Pivotal and Heroku has been exactly this sort of combinatorial explosion; CNBs are designed to make it possible to more easily compose buildpacks developed independently of one another.** I've bolded the bits that I think are most relevant to us. It would be great if someone could take a look at https://buildpacks.io to see if we can base repo2docker off v3 of buildpacks. http://words.yuvi.in/post/why-not-s2i/ contains reasons why we moved off s2i (which is similar to v2 of buildpacks). A useful test case would be to try to make: 1. A buildpack for environment.yml 2. A buildpack for install.R 3. A buildpack for postBuild 4. A buildpack for apt.txt And then see how easy / hard it is to have a repo with any combination of these 4 files produce one single image. My rudimentary math skills tell me that there's `4!` possible combinations here (24), and we shouldn't have to write more than 4 buildpacks...

Talking of Ansible, if you want to go all-in, this might entertain you GitHub - manics/jupyterhub-ansiblespawner: Spawn JupyterHub single user notebook servers using Ansible

stevejpurves · October 26, 2022, 11:15am

I’ve been playing with the setup in binderhub/testing/local-binder-local-hub to achieve the same thing. Would a LittlestBinderHub be quite close to this? what would be the main differences

yuvipanda · September 15, 2023, 7:30pm

@stevejpurves it would be very much similar / same! This topic predates that directory

Topic		Replies	Views
Do I have to use a repository with Binderhub? If not, how do I locally run them? Binder	4	1503	August 6, 2021
BinderHub for HPC BinderHub	21	3454	August 26, 2020
Like Binder, but for generic Docker repos Binder	2	670	October 15, 2021
Use published Docker image for Binder Binder docker	3	1204	September 7, 2021
Creating a new Binder-at-home tool Binder idea	14	3834	November 23, 2023

Building a "The Littlest BinderHub"

Related topics