User's pod with pending state in JupyterHub

Hi guys, I am running load tests to verify that my infrastructure supports 450 users using JupyterHub application deployed with z2jh, however, I am having problems with some user pods staying in pending state for more than 10mins. As you can see in the following event log of one of these pods, my node cluster autoscales correctly but as the culling process is running and removing pods due to inactivity, I don’t see the user pod starting even though it has available resources.

What can cause this behavior?

Events:
  Type     Reason            Age                    From               Message
  ----     ------            ----                   ----               -------
  Warning  FailedScheduling  16m (x4 over 17m)      default-scheduler  0/35 nodes are available: pod has unbound immediate PersistentVolumeClaims. preemption: 0/35 nodes are available: 35 Preemption is not helpful for scheduling.
  Warning  FailedScheduling  16m                    default-scheduler  0/36 nodes are available: pod has unbound immediate PersistentVolumeClaims. preemption: 0/36 nodes are available: 36 Preemption is not helpful for scheduling.
  Warning  FailedScheduling  9m23s (x7 over 16m)    default-scheduler  0/37 nodes are available: pod has unbound immediate PersistentVolumeClaims. preemption: 0/37 nodes are available: 37 Preemption is not helpful for scheduling.
  Warning  FailedScheduling  8m1s                   default-scheduler  0/38 nodes are available: pod has unbound immediate PersistentVolumeClaims. preemption: 0/38 nodes are available: 38 Preemption is not helpful for scheduling.
  Warning  FailedScheduling  6m29s (x6 over 7m29s)  default-scheduler  0/41 nodes are available: pod has unbound immediate PersistentVolumeClaims. preemption: 0/41 nodes are available: 41 Preemption is not helpful for scheduling.
  Warning  FailedScheduling  83s (x3 over 7m50s)    default-scheduler  0/40 nodes are available: pod has unbound immediate PersistentVolumeClaims. preemption: 0/40 nodes are available: 40 Preemption is not helpful for scheduling.

You’ll need to figure out why the PVCs could not be bound. e.g. do you have a dynamic storage provider, is it limited to certain regions/zones, have you configured it correctly, etc. Are you using a standard public cloud managed Kubernetes, or are you running your own?

1 Like

And to figure that out, it can be good to use “kubectl get pvc”, then do “kubectl describe pvc ”, and you may see events saying “failed to …” etc towards the end.