JupyterHub / Amalthea

Hi,

While doing researches related to combining JupyterHub with GitLab (outside of using the later for authentication), I became aware of a project from the Swiss Data Science Center named Renku that partly does what I would like to accomplish. They have it available at https://renkulab.io.

They used to use JupyterHub as part of their deployment but for the latest release of their platform moved to implement their own Kubernetes Jupyter server manager called Amalthea.

The full explanation of the reasons they did so can be found in the README file. Basically, JupyterHub is seen as a full self-sufficient application that end-users can install and directly make use of (authentication and user management is included, Jupyter server spawning and status tracking, etc.) while Amalthea is only there to manage the Jupyter server handling part in Kubernetes to integrate it within other setups. From that point of view, system admins are responsible to provide the authentication, currently OpenID connect only, as well as user management.

JupyterHub uses the KubeSpawner class to create the Jupyter server pods but I was wondering whether there could be some ideas/concepts that could be useful from Amalthea and if it could be a project of interest more globally to the Jupyter community.

What do you guys think ?

4 Likes

It sounds like a neat idea! Effectively they’ve partially implemented JupyterHub as a Kubernetes controller, and are managing singleuser-servers as custom resource definitions (CRDs). Makes sense where you’re all-in on Kubernetes, have no need for all of JupyterHub’s features, and want an easy way to add custom features without worrying about JupyterHub compatibility.

There’s been some previous discussion about using Deployments or StatefulSets in KubeSpawner, see switch from raw Pod to Deployment · Issue #138 · jupyterhub/kubespawner · GitHub

One aspect I like is how they’ve used Jinja2 templates to define all K8s resources instead of creating K8s objects with the Python API, this looks a lot cleaner and more easily extensible. For example, Support to create the networkpolicy dynamically · Issue #523 · jupyterhub/kubespawner · GitHub is a request for dynamic control of NetworkPolicies. This would be very difficult to make configurable in Python code, but having it as a template would make it easily extensible or overridable in subclasses.

4 Likes

disclaimer: I’m a maintainer of Amalthea

Thanks @sgaist for taking note of our project and mentioning it here. In fact it has been on my to-do list to reach out to the Jupyter community about the project, but you were faster. However, now is definitly a good moment to talk about the project as we’ve been running Amalthea in production as part of the RenkuLab platform for 2-3 months, and after some small initial hickups (like k8s probes preventing the culling of idle servers) Amalthea is now managing the up to 300 parallel user sessions on renkulab.io smoothly. So I feel confident to claim that the project is past the “proof-of-concept” stage.

One aspect that I would like to stress is the extensibility of the custom resources which was already mentioned by @manics. This was indeed one of the guiding principles when designing Amalthea, but something like adding a network policy could be achieved even more easily by adding a patch to the JupyterServer object spec. In fact, we heavily use the patching option in RenkuLab as can be seen here. For example we add a side-car container to each JupyterServer which acts as a proxy for the http traffic to the hosted git repository which is cloned into the JupyterServer. As this is a very RenkuLab-specific use-case, we intentionally kept it out of Amalthea.

If there’s an interest in the Jupyter community to learn more about Amalthea, we’re more than happy to share our ideas and experience in more detail. Also, we are certainly very open for any form of collaboration with or contribution back to the project Jupyter if what we have developed is seen as useful for a larger scope.

2 Likes

@ableuler Are you interested in doing a short presentation/demo, either at one of the JupyterHub monthly team meetings (next one is Thu 20 Jan 17:00 UTC) or the Jupyter Community call (next one is Tue January 25 08:00 Pacific / 16:00 UTC)?

Sure, with pleasure. I can make both dates, so whatever you think is the more appropriate format for this works for me.

The JupyterHub meeting tends to be more technically focussed on JupyterHub. A few JupyterHub devs have already discussed Amalthea and whether there’s potential for sharing code between it and e.g. KubeSpawner in the longer term, so if you’ve got any thoughts it’d be great to hear them as well as see a demo!

In contrast the Community call has a mix of advanced users, developers, sysadmins, and non-technical users, and is a good forum to show off what you’re doing at a high level. It’s recorded so also gets more views afterwards. With my academic hat on I’d say it’s a good place to get more awareness and promote what you’re doing.

I’m biased and interested in the technical details so I’d say come to the JupyterHub meeting :smiley: , but really the community call is equally good. You can even do both if you want!

Sounds great, then I’m joining the JupyterHub meeting on January 20 and talk about those technical details :smiley:. I might be happy to seize the opportunity and present Amalthea to the larger community call audience at some point too.

1 Like