Problem with Spawning Pods with Persistent Volumes in a Multi-Zonal GKE deployment

mcberma · April 23, 2021, 12:31am

We are running Z2JK (Zero to Jupyterhub on Kubernetes) on a multi-zonal GKE cluster. We noticed that during the Spawning process the Pod’s persistent volume is being provisioned in an availability zone that is different than the availability zone where the Pod is being provisioned (Spawned). This causes the POD to crash and Jupyterhub to log a “spawner timeout error”. Is there any way to tell Z2JK to provision the pod’s persistent volume in the same availability zone where the Pod is being provisioned (Spawned).

manics · April 23, 2021, 6:40pm

Hi! Please could you:

show us your full Z2JH config with secrets redacted
provide your Z2JH version
tell us how you setup your GKE cluster?

Can you try disabling the user-scheduler if you haven’t already

scheduling:
  userScheduler:
    enabled: false

Thanks!

mcberma · April 27, 2021, 9:16pm

Solved the problem by creating a custom storage class as shown below where I set the volumeBindingMode property equal to WaitForFirstConsumer

kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
annotations:
storageclass.kubernetes.io/is-default-class: “false”
labels:
addonmanager.kubernetes.io/mode: EnsureExists
kubernetes.io/cluster-service: “true”
name: jupyterhub-user-ssd
provisioner: kubernetes.io/gce-pd
parameters:
type: pd-ssd
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer

I do have a follow-up question: What does the user-scheduler do and what happens when it is enabled and when it is disabled???

minrk · April 28, 2021, 9:29am

The user-scheduler is discussed here; it enables some special configuration of how pods are assigned to nodes. The main thing this does is ensure that user pods are packed together, ensuring efficient scale-down of the cluster.

From the docs above:

NOTE : If you don’t want to scale down the nodes you have, it would make more sense to let the users spread out and utilize all available nodes. Only activate the user scheduler if you have an autoscaling node pool.

mcberma · April 28, 2021, 3:06pm

@minrk Thanks for your response. We do use a dedicated auto-scaled node-pool for the users’ Jupyter Lab Server pods. That’s because we provide GPU support for these pods.

Given the guidance above, we will continue to enable the user-scheduler (unless you feel it’s a bad idea.)

Topic		Replies	Views
JupyterHub deployed on Kubernetes cannot spawn users Zero to JupyterHub on Kubernetes jupyterhub , help-wanted	2	629	January 16, 2023
Cannot spawn server for new user Zero to JupyterHub on Kubernetes help-wanted	28	683	April 17, 2024
Jupyterhub Pods all going to only one node on the cluster Zero to JupyterHub on Kubernetes	10	1635	September 5, 2023
Multi-zone jupyterhub in GKE Zero to JupyterHub on Kubernetes	2	558	April 12, 2019
Singleuser pods stuck in Pending Zero to JupyterHub on Kubernetes	12	5998	July 9, 2023

Problem with Spawning Pods with Persistent Volumes in a Multi-Zonal GKE deployment

Related topics