Core component resilience/reliability

mriedem · August 17, 2020, 4:23pm

One day I hope to write up a doc about this, specifically for using zero-to-jupyterhub-k8s, but until then there are some recent(ish) related threads that might help you get started [1][2][3][4][5].

Since the hub does not (natively) support HA [6] you can’t run multiple replicas of it and scale horizontally that way. And since it’s a single python process you get 1 CPU for the hub (and KubeSpawner in the same process) so keep an eye on CPU usage. To keep CPU usage down and API response rates low you will likely need to tune various config options related to reporting notebook activity so your thousands of users and notebook pods aren’t storming the hub API with activity updates and DB writes which will consume CPU and starve the hub.

You will also want to keep an eye on the cull-idle script if you have thousands of users on a single hub. In our case we changed the concurrency on that to 1 to reduce its load on the API, we set the timeout to 5 days and run it every hour, though the notebooks will cull themselves (and delete the pod) after an hour of inactivity. We set the cull-idle lower because we also have that configured to cull users. Needing to do a GET /users request with thousands of users can take awhile currently because of a lack of paging and server side filtering in the hub DB [7].

[1] Identifying JupyterHub api performance bottleneck
[2] Scheduler "insufficient memory.; waiting" errors - any suggestions?
[3] Minimum specs for JupyterHub infrastructure VMs?
[4] Background for JupyterHub / Kubernetes cost calculations?
[5] Confusion of the db instance
[6] https://github.com/jupyterhub/jupyterhub/issues/1932
[7] https://github.com/jupyterhub/jupyterhub/issues/2954

Topic		Replies	Views
Jupyterhub Can a large number of users start the service at the same time JupyterHub jupyterhub	2	1239	September 15, 2021
JupyterHub on Kubernetes: Spawn Failed for users with `timeout 3`. Increase pod creation Timeout Zero to JupyterHub on Kubernetes help-wanted	5	3033	May 1, 2023
Hub is failing to start single user server Zero to JupyterHub on Kubernetes community , jupyterhub , how-to , help-wanted	6	1568	May 11, 2023
Jupyterhub install kubernetes Zero to JupyterHub on Kubernetes	0	482	May 11, 2023
Questions about running Jupyterhub Z2JK in a multi-zonal GKE cluster Zero to JupyterHub on Kubernetes	2	1008	December 17, 2020

Core component resilience/reliability

Related topics