Do I have to use a repository with Binderhub? If not, how do I locally run them?

Hello, I have created Binderhub (using the zero to binderhub guide) on our company server, and I want our users to have access to this binderhub to build and run their images with jupyter notebooks. Potentially dozens of users will be using it, and creating images that might take GBs of storage space.

My question is, the manual insists on me setting up a repository (docker hub or microsoft azure or others), and from I understand, using an image repository is not necessary, as it is only used as a “transport” middle man between binder and jupyterhub, correct? And also users could potentially create thousands of images, that kind of upload/download traffic to/from docker hub could be very taxing, right?

I want to run everything locally on my server, that means binderhub, jupyterhub, images, everything on one machine under one machine’s public IP in the same kubernetes cluster (I would love to dodge kubernetes and just install it on the machine like normal apps, but I guess that does not work with binderhub and I have to use helm?).

So is it possible to just have binderhub and jupyterhub locally next to each other in the same kubernetes cluster, and forget about using a repository? If I set “use_registry” to “false” in the config.yaml, it seems to skip the uploading to docker hub, but then it fails on starting the image, so somehow I have to install jupyterhub to the kubernetes cluster too with helm?

So my MAIN question is: if I indeed dont have to use an image repository and can just run binderhub and everything locally, how do I run the images, how do I set up the path between binderhub and jupyterhub? I thought that by installing binderhub with helm, it also installs its own jupyterhub (because how else would it run the image after building it), is it true? Or do I have to use the “zero to jupyterhub” guide to install jupyterhub into helm to run my images? I am quite confused about how to actually run the built notebook images, as it is supposed to be run in jupyterhub, but I have to provide IP of the jupyterhub, which I dont get, I thought jupyterhub is just run locally in the same kubernetes cluster, why would someone run binderhub on one machine and launched the images by uploading&downloading them through docker hub to jupyterhub running on a completely different machine? If I run binderhub and jupyterhub locally, do I just refer to jupyterhub IP as 127.0.0.1?

So how do I go from “I can succesfully build images in binderhub” to also “running them locally on the same machine so that user can build and run the image through their browser”? Thanks a lot, I am very confused about the helm usage and setup, about what role does jupyterhub play and how do I give it the images and run it.

BTW I actually already run jupyterhub on the same machine (it was working before I even thought about binderhub), but with batchspawner (PBSSpawner to be exact), so it spawns jupyter notebook servers on the compute nodes of our cluster (supercomputer). Right now it is enough for me if I can run binderhub images just locally (with jupyterhub’s local process spawner I suppose?), but in the future, it is possible to have binderhub for my users, that builds their image, and starts it on a compute node (I dont mind having a separate second jupyterhub for that, as I dont expect my jupyterhub users can mix the notebook servers created from jupyterhub with notebooks created from binderhub?

Thanks a ton, hope I explained correctly what I want. If I didnt, just forget about my setup and how I think I should run things, and show me a working tutorial where Binderhub builds and runs images without uploading them to docker hub without assuming I know anything, just how to set this up on a freshly installed ubuntu machine (I am mainly confused about where do I get the jupyterhub and how to run images in it?)

I’m not sure if a registry is mandatory. It doesn’t have to be an external registry though, you can run your own local registry in your K8s cluster.

However, since it’s your first time deploying BinderHub I’d recommend you start with a standard installation including a registry and get everything working, then modify your config and see if it’s possible to run without the registry. Otherwise it’ll be very difficult to tell whether problems are due to the lack of registry, or whether it’s some unrelated issue with your setup.

Currently BinderHub only spawns images with a version of KubeSpawner. In principle it could use any spawner that launches a Docker image since that’s what repo2docker builds, but if you want to go down this route I’d again recommend starting with something known to work, then try modifying it.

If you follow the zero-to-binderhub JupyterHub should be setup too on the same K8s cluster, are you seeing something different?

There’s some very preliminary exploratory work into running BinderHub without Kubernetes Support running without kubernetes, just docker · Issue #1318 · jupyterhub/binderhub · GitHub
but that’s a longer term project.

In a different topic next to this one called “Denied:…” I am trying to solve the problem of pushing built images to docker hub, as they are improperly named (dont start with user/), so right now I cant even get to the part where jupyterhub starts my image.

However, I tried to install binderhub on my stock ubuntu desktop, where there is nothing to interfere with anything (unlike on the company server), and if I set use_registry: false, then it doesnt try to push to docker hub, instead it ends up with something like Failed launching server..., so I wonder how do I connect Binderhub with its local jupyterhub. The zero to binderhub manual just says to put ip of the machine where jupyterhub runs into configs, and it should find it. I tried public IP of my computer, as well as 127.0.0.1, and the image just does not launch.

Could you maybe paste here the content of every yaml files you use in the helm install or helm upgrade command in order to make the stock most simplest binderhub build and run images (like if you were a newbie and just followed the zero to binderhub guide until you ended up with binderhub that can build and run images, what are the actual contents of configs)? I must be msiunderstanding something, or the authors of the zero to binderhub manual implicitly assume something, or I dont know, all I know is I cant make even the simplest binderhub work and it is getting annoying, because I will have to make so many adjustments to make it work for our users, like https, logging users through our kerberos, some local image repository probably, and other changes I dont even know yet, and yet I cant even make the basic binderhub run. Thank you!

Does this guide help? It’s setup for Azure but the helm and kubectl commands will be the same the-turing-way/zero-to-binderhub.md at master · alan-turing-institute/the-turing-way · GitHub

There are also these instructions which describe setting up a local BinderHub with minikube - but these are for developing BinderHub only, not running a production service binderhub/CONTRIBUTING.md at master · jupyterhub/binderhub · GitHub

More generally, BinderHub isn’t really setup for being run locally. Most users deploy a Littlest JupyterHub server with the repo2docker extension installed. The docker-only implementation Simon linked above would be another step towards a simpler BinderHub setup but that is on a longer development term.

1 Like

Thank you for mentioning the TLJH-repo2docker plugin, it actually does almost exactly what I wanted, now if only I could combine it with the PBSSpawner from the batchspawner, and I could have everything I need inside one single jupyterhub, instead of running jupyterhub for pbs spawning and a second jupyterhub with the repo2docker plugin that runs images locally, I wish there was an easy way to run these images remotely through ssh, but that actually requires to install some stuff in the image itself to properly communicate with the hub, and copying the image on the remote system.

HOWEVER, I have one issue with the TLJH-repo2docker plugin, the built docker images are randomly disappearing from the system. If I go into Environments menu and start building images, they all build fine successfuly, but either immediatelly or in a few minutes (or hours), those images randomly disappear from the Environments menu like I have never built them, and the images also disappear from the docker images listing, they just completely vanish from the system. Anybody has encountered that, docker randomly deleting built images?