Server requested Spawn failed: did not start in 300 seconds

Hi there,

I’ve installed z2jh on my kubernetes cluster using the helm chart 1.2.0 and it was working for few months without issue, but today it started to fail at spawning my users servers. The status of the pods stay at Pending for a while and then fails after 300s.

When I try to describe the pod, I get virtually no logs :

Volumes:
  volume-cbotek:
    Type:        PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:   claim-cbotek
    ReadOnly:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     hub.jupyter.org/dedicated=user:NoSchedule
                 hub.jupyter.org_dedicated=user:NoSchedule
                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:          <none>

I also checked the available ressources on my cluster and confirmed that there is plenty of CPU, memory and disk available.

Do you have any idea where I could find any log helping me to identify the root cause of this ?
Any suggestion on the cause would help too.

Thank you,

Chris

Update:
I might have found the issue: There was too many pods, services, jobs accumulated on my cluster after months of launching spark jobs.
I simply ran the following for every namespace where I’m running spark jobs.

kubectl delete pods --all -n mynamespace
kubectl delete jobs --all -n mynamespace
kubectl delete services --all -n mynamespace

hope it helps others