Jupyterhub multi instance/node

rufytn · November 24, 2021, 9:07am

Hi Team,

I am trying to see the possibility of having support of multi-node jupyterhub.

At the moment we have a single node instance Jupyterhub on edge node (based hadoop cluster) and the notebooks are spawned via YarnSpawner
The requirement is to have a jupyterhub multi instance/node.
Just wanted to check if it is possible to have multi-node setup where end user will have single entry point and notebooks will be served on one of the hub nodes based on the request.
TraekifEtcProxy can help?

P.S We can’t use Kubernetes

manics · November 25, 2021, 6:38pm

Hi! What do you mean by “multi instance/node”? Do you mean having each user’s singleuser-server running on a different VM or in it’s own Docker container?

rufytn · November 26, 2021, 8:22am

@manics for multi instance/node i mean just HUB component, each user’s singleuser-server running on worker node of cluster hadoop through Yarn Spawner.
Reading the documentation it seems that jupyterhub can only be installed on one node/VM, for example HUB component can it be handled on 2 vm?where end user will have single entry point and notebooks will be served on one of the hub nodes based on the request.

minrk · November 26, 2021, 11:16am

That’s right, a given JupyterHub deployment can have only one Hub instance. For more information: Support HA (High Availability) · Issue #1932 · jupyterhub/jupyterhub · GitHub

Craig_Weber · January 4, 2022, 11:04pm

I noticed #1932 links to two other issues (Revoke db access from Authenticator, Spawner · Issue #3700 · jupyterhub/jupyterhub · GitHub, and moving state out of BaseHandler · Issue #3699 · jupyterhub/jupyterhub · GitHub) but it’s not clear to me whether these are the only issues holding up #1932 or if there is more to do as well? Also, I would like to understand more about the db session lifecycle and how this limits our ability to run multiple hub instances. Are there any existing documents or threads that you could link me to?

minrk · January 5, 2022, 10:50am

this is the main summary.

Only revoking db access is relevant to this work, the other is more of a refactor that isn’t necessary, but related in that the goal should inform exactly how we do the refactor. It’s more parallel work, though.

But those are just first steps. The biggest task is changing the lifecycle of our ORM objects to no longer be persistent for the process, instead always creating a new session for each request. We have to lose lots of performance optimizations to do this, so it’s tricky long-term work.

Topic		Replies	Views
JupyterHub Multi-Node setup JupyterHub	3	2016	July 27, 2021
Jupyterhub for multiple server nodes JupyterHub help-wanted	6	1336	November 9, 2023
Spawners on multiple external servers and restricting users to specific servers JupyterHub how-to , help-wanted	0	491	October 25, 2020
Single JupyterHub with multiple Kubernetes clusters? JupyterHub how-to	3	757	August 14, 2020
JupyterHub Spawner on S3 JupyterHub jupyterhub , help-wanted	0	352	July 14, 2021

Jupyterhub multi instance/node

Related topics