One day I hope to write up a doc about this, specifically for using zero-to-jupyterhub-k8s, but until then there are some recent(ish) related threads that might help you get started [1][2][3][4][5].
Since the hub does not (natively) support HA [6] you can’t run multiple replicas of it and scale horizontally that way. And since it’s a single python process you get 1 CPU for the hub (and KubeSpawner in the same process) so keep an eye on CPU usage. To keep CPU usage down and API response rates low you will likely need to tune various config options related to reporting notebook activity so your thousands of users and notebook pods aren’t storming the hub API with activity updates and DB writes which will consume CPU and starve the hub.
You will also want to keep an eye on the cull-idle
script if you have thousands of users on a single hub. In our case we changed the concurrency on that to 1 to reduce its load on the API, we set the timeout to 5 days and run it every hour, though the notebooks will cull themselves (and delete the pod) after an hour of inactivity. We set the cull-idle
lower because we also have that configured to cull users. Needing to do a GET /users
request with thousands of users can take awhile currently because of a lack of paging and server side filtering in the hub DB [7].
[1] Identifying JupyterHub api performance bottleneck
[2] Scheduler "insufficient memory.; waiting" errors - any suggestions?
[3] Minimum specs for JupyterHub infrastructure VMs?
[4] Background for JupyterHub / Kubernetes cost calculations?
[5] Confusion of the db instance
[6] https://github.com/jupyterhub/jupyterhub/issues/1932
[7] https://github.com/jupyterhub/jupyterhub/issues/2954