Does jupyterhub support multitenancy?

Hi all,
We want to use Jupyterhub in our enterprise productive environment. So each time new customer onboarded we want to provide separate authentication configuration, list of admins etc. and even maybe different singleuser configuration.
What are our options? Is adding additional hubs is supported?
Any help to support multitenancy will be highly appreciated!
I’ve seen the Jupyter Enterprise Gateway project, but not sure it serves same capabilities described above and if it is still under development.

Thank you in advance,
Yan

It sounds like you want multiple deployments of JupyterHub, each tailored to one of your tenants. What would you expect “multitenancy” to mean in the context of JupyterHub?

Hi @manics, thank you so much for your response!
We considering multiple deployments, but we afraid that with big amount of customers such setup will be difficult to maintain. If it is common approach, do you have some document that explains best practices for such setup? Maybe deployments per new customer can deploy hub or proxy only and share other services so we don’t need to deploy them each time?

Alternatively, I think it would be nice if on each new customer provisioning we could use REST API to:

  • add new hub configuration (are multiple configurations are supported for single deployment/hub according to url or something?) or have one hub/proxy only that deployed per customer.
  • add some amount of concurrent users, etc.

Thank you
Yan

The hub and proxy are the only required JupyterHub services. What other services are you thinking of?

As far as having multiple “customers” is concerned, it depends how much segregation (in terms of configurability and security) you need between them. If you wanted to do it entirely within a single JupyterHub you could try splitting them by group. JupyterHub is extensible enough that you can probably achieve what you want if you’re prepared to write some code (potentially quite a lot of code and configuration, depending on what exactly you need). However you have to trust your users since they can effectively execute anything they want inside their singleuser servers- what level of security are your customers expecting?

If I had to deploy multiple JupyterHubs then personally I’d avoid all manual configuration and follow Infrastructure as code principles. Once you’ve done the hard work of deploying one JupyterHub it’s easy to deploy 5, 10, 100 (if you can afford to pay for the compute :slight_smile:).

On Kubernetes I’d use the JupyterHub Helm Chart with some other orchestration, e.g. Terraform for bringing up cloud infrastructure, and optionally also for deploying multiple Helm charts. If you search there are plenty of other options using Docker, or Ansible, which aren’t reliant on Kubernetes.

If your infrastructure is defined in code then once it’s working you can hook it up to a continuous deployment system, and deploying a new JupyterHub just requires adding a new config file to your code repo, and leaving you CD system to automatically deploy it. You also have all your configs visible in one place.

QHub could be worth looking at. I haven’t tried it but it sounds potentially useful.

1 Like

Hi,
Yes I do use k8s and helm deployment.
After single deployment I see several nodes(maybe using ‘service’ term was not correct): few continuous pullers and awaiter, few hook image pullers, jupyter-jupyter node, hub, proxy, schedulers and some more. All together 13 pods, most of them up and running. After 100 customers its 1300 pods on single namespace, not taking in consideration pods that will be created for users.

Security: we will provide users of each tenant with separate credentials so only their organization data will be available to them(schema separation), maybe the interaction between user nodes is an issue. Authentication configuration for each tenant may be different.

Thank you,
Yan

Several of the pods are intended to improve the user experience where there’s a single JupyterHub deployment with full use of the K8s cluster:

The continuous pullers pull a user image when a new node is created (manually or through auto-scaling), so that a user who gets assigned to a new node doesn’t have to wait for the image to be pulled. The scheduler pods try to assign new users to existing nodes rather than new ones where possible.

It’s safe to disable all of those- the scheduler pods may conflict anyway since you’d have two deployment both trrying to control assignment and autoscaling.

1 Like