Jupyterhub multi instance/node

Hi Team,

I am trying to see the possibility of having support of multi-node jupyterhub.

At the moment we have a single node instance Jupyterhub on edge node (based hadoop cluster) and the notebooks are spawned via YarnSpawner
The requirement is to have a jupyterhub multi instance/node.
Just wanted to check if it is possible to have multi-node setup where end user will have single entry point and notebooks will be served on one of the hub nodes based on the request.
TraekifEtcProxy can help?

P.S We can’t use Kubernetes

Hi! What do you mean by “multi instance/node”? Do you mean having each user’s singleuser-server running on a different VM or in it’s own Docker container?

@manics for multi instance/node i mean just HUB component, each user’s singleuser-server running on worker node of cluster hadoop through Yarn Spawner.
Reading the documentation it seems that jupyterhub can only be installed on one node/VM, for example HUB component can it be handled on 2 vm?where end user will have single entry point and notebooks will be served on one of the hub nodes based on the request.

That’s right, a given JupyterHub deployment can have only one Hub instance. For more information: Support HA (High Availability) · Issue #1932 · jupyterhub/jupyterhub · GitHub

I noticed #1932 links to two other issues (Revoke db access from Authenticator, Spawner · Issue #3700 · jupyterhub/jupyterhub · GitHub, and moving state out of BaseHandler · Issue #3699 · jupyterhub/jupyterhub · GitHub) but it’s not clear to me whether these are the only issues holding up #1932 or if there is more to do as well? Also, I would like to understand more about the db session lifecycle and how this limits our ability to run multiple hub instances. Are there any existing documents or threads that you could link me to?

this is the main summary.

Only revoking db access is relevant to this work, the other is more of a refactor that isn’t necessary, but related in that the goal should inform exactly how we do the refactor. It’s more parallel work, though.

But those are just first steps. The biggest task is changing the lifecycle of our ORM objects to no longer be persistent for the process, instead always creating a new session for each request. We have to lose lots of performance optimizations to do this, so it’s tricky long-term work.

1 Like