We are running Z2JK (Zero to Jupyterhub on Kubernetes) on a multi-zonal GKE cluster. We noticed that during the Spawning process the Pod’s persistent volume is being provisioned in an availability zone that is different than the availability zone where the Pod is being provisioned (Spawned). This causes the POD to crash and Jupyterhub to log a “spawner timeout error”. Is there any way to tell Z2JK to provision the pod’s persistent volume in the same availability zone where the Pod is being provisioned (Spawned).
Hi! Please could you:
- show us your full Z2JH config with secrets redacted
- provide your Z2JH version
- tell us how you setup your GKE cluster?
Can you try disabling the user-scheduler if you haven’t already
scheduling: userScheduler: enabled: false
Solved the problem by creating a custom storage class as shown below where I set the volumeBindingMode property equal to WaitForFirstConsumer
I do have a follow-up question: What does the user-scheduler do and what happens when it is enabled and when it is disabled???
The user-scheduler is discussed here; it enables some special configuration of how pods are assigned to nodes. The main thing this does is ensure that user pods are packed together, ensuring efficient scale-down of the cluster.
From the docs above:
NOTE : If you don’t want to scale down the nodes you have, it would make more sense to let the users spread out and utilize all available nodes. Only activate the user scheduler if you have an autoscaling node pool.
@minrk Thanks for your response. We do use a dedicated auto-scaled node-pool for the users’ Jupyter Lab Server pods. That’s because we provide GPU support for these pods.
Given the guidance above, we will continue to enable the user-scheduler (unless you feel it’s a bad idea.)