Hub pod stuck on pending - timed out binding volumes

Hello, I’m trying to follow the Zero to JupyterHub guide, using AWS and EKS.
As the docs aren’t fully up-to-date with eksctl, I also used this post on the Github issue as a reference.

I was able to launch the cluster fine using eksctl, but I’m getting stuck at the installation of JupyterHub - the hub pod does not move past pending (the others are running fine).

[cloudshell-user@ip-10-0-165-104 ~]$ cat cluster.yaml
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig

metadata:
  name: Z2JHKubernetesCluster
  region: eu-west-2

nodeGroups:
  - name: ng-1
    instanceType: t2.medium
    volumeSize: 10
eksctl create cluster -f cluster.yaml

For now I’m using the chart configuration file in the instructions which consists just of comments, and I used the following command to install the chart:

helm upgrade --cleanup-on-fail \
  --install z2jh jupyterhub/jupyterhub \
  --namespace jhub \
  --create-namespace \
  --version=2.0.0 \
  --values config.yaml

I’ve been trying to see what’s going on using kubectl get pod and kubectl describe pod. All pods except the hub were in Running status after a few seconds.

$ kubectl --namespace=jhub get pod
NAME                             READY   STATUS    RESTARTS   AGE
continuous-image-puller-fvxsp    1/1     Running   0          31m
continuous-image-puller-gcvtp    1/1     Running   0          32m
hub-786cbd7b46-b2kgk             0/1     Pending   0          32m
proxy-58fbcfc8b4-jt8gd           1/1     Running   0          32m
user-scheduler-699549567-b7d4v   1/1     Running   0          32m
user-scheduler-699549567-cwjkf   1/1     Running   0          32m

After about 10 minutes a FailedScheduling warning is visible in the describe output - as below.

$ kubectl --namespace=jhub describe pod hub-786cbd7b46-b2kgk
Name:           hub-786cbd7b46-b2kgk
Namespace:      jhub
Priority:       0
Node:           <none>
Labels:         app=jupyterhub
                component=hub
                hub.jupyter.org/network-access-proxy-api=true
                hub.jupyter.org/network-access-proxy-http=true
                hub.jupyter.org/network-access-singleuser=true
                pod-template-hash=786cbd7b46
                release=z2jh
Annotations:    checksum/config-map: 1f53da94c6f18f51205e774b58c880e71974b294d25e181a0fefe3efcf585c38
                checksum/secret: ee2c2a4aebbcee22a100918b40700dec56d1ed8bdd0b8a6d472406f871151c44
                kubernetes.io/psp: eks.privileged
Status:         Pending
IP:             
IPs:            <none>
Controlled By:  ReplicaSet/hub-786cbd7b46
Containers:
  hub:
    Image:      jupyterhub/k8s-hub:2.0.0
    Port:       8081/TCP
    Host Port:  0/TCP
    Args:
      jupyterhub
      --config
      /usr/local/etc/jupyterhub/jupyterhub_config.py
      --upgrade-db
    Liveness:   http-get http://:http/hub/health delay=300s timeout=3s period=10s #success=1 #failure=30
    Readiness:  http-get http://:http/hub/health delay=0s timeout=1s period=2s #success=1 #failure=1000
    Environment:
      PYTHONUNBUFFERED:        1
      HELM_RELEASE_NAME:       z2jh
      POD_NAMESPACE:           jhub (v1:metadata.namespace)
      CONFIGPROXY_AUTH_TOKEN:  <set to the key 'hub.config.ConfigurableHTTPProxy.auth_token' in secret 'hub'>  Optional: false
    Mounts:
      /srv/jupyterhub from pvc (rw)
      /usr/local/etc/jupyterhub/config/ from config (rw)
      /usr/local/etc/jupyterhub/jupyterhub_config.py from config (rw,path="jupyterhub_config.py")
      /usr/local/etc/jupyterhub/secret/ from secret (rw)
      /usr/local/etc/jupyterhub/z2jh.py from config (rw,path="z2jh.py")
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-bzzlv (ro)
Conditions:
  Type           Status
  PodScheduled   False 
Volumes:
  config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      hub
    Optional:  false
  secret:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  hub
    Optional:    false
  pvc:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  hub-db-dir
    ReadOnly:   false
  kube-api-access-bzzlv:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 hub.jupyter.org/dedicated=core:NoSchedule
                             hub.jupyter.org_dedicated=core:NoSchedule
                             node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason            Age                  From               Message
  ----     ------            ----                 ----               -------
  Warning  FailedScheduling  3m37s (x3 over 23m)  default-scheduler  running PreBind plugin "VolumeBinding": binding volumes: timed out waiting for the condition

Being new to Kubernetes and all this, I’m not sure where to go next, and I couldn’t find any similar issues in previous discussions. Should the hub have launched successfully within 10 minutes, or does the timeout need extending somehow?

Thanks very much for any help you’re able to provide!

It sounds like there’s a problem with your dynamic storage volumes. JupyterHub requests a volume from k8s, and k8s should respond by automatically creating it, but it’s not.

Have you tried

?
I’ve successfully used Amazon EBS CSI driver - Amazon EKS but not with eksctl.

1 Like

Thank you for that. I followed the AWS persistent storage page and encountered a similar problem on the test project - so it looks like it’s nothing to do with JupyterHub, anyway!

I’ll keep investigating, and will post back if I get any further. I may try setting up the cluster without eksctl to see if there’s a crucial stage that is being missed out that way.

1 Like

Hi @tsutch
Did you happen to find the solution to this?
I am getting exactly the same error as you. the hub-db-dir PV stucks at Pending status results in the hub pod stay at Pending forever.

Thanks!

I suspect you have run into the complexity of getting peristent storage allocated when upgrading/using EKS 1.23+.

What you need is to setup the EKS cluster addon ebs-csi-drivers-addon, and you need to provide permissions for pods that will start running in kube-system.

The permissions part i dont get fully. Maybe its enough to include the “addons” entry seen in my linked issue in your eksctl config - mentioning wellKnownPolicies. Maybe some policy is also needed also on nodes where the controller runs - I’m not 100%. I recommend adding the addons section like from that issue and only adding a policy to nodes if needed.

In Update AWS EKS terraform jinja templates to so we can use k8s 1.24 · Issue #2054 · 2i2c-org/infrastructure · GitHub I’ve done work to resolve that while managing a k8s cluster, linking out to various stuff.

hi @consideRatio , thanks for the info. I just recall I asked this here.
I was able to get it thru by adding the aws-ebs-csi-driver addon into the yaml file with a attachPolicy section for the iam permissions.

addons: 
  - name: aws-ebs-csi-driver
    attachPolicy: 
      Version: "2012-10-17"
      Statement:
      - Effect: Allow
        Action:
        - "ec2:AttachVolume"
        - "ec2:CreateSnapshot"
        - "ec2:CreateTags"
        - "ec2:CreateVolume"
        - "ec2:DeleteSnapshot"
        - "ec2:DeleteTags"
        - "ec2:DeleteVolume"
        - "ec2:DescribeInstances"
        - "ec2:DescribeSnapshots"
        - "ec2:DescribeTags"
        - "ec2:DescribeVolumes"
        - "ec2:DetachVolume"
        Resource: '*'

Hope it helps someone else for this issue.

2 Likes

I’m getting the exact same error with k3s with storage class local-path (default). I was wondering how I can resolve it?

kubectl get pv
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                STORAGECLASS   REASON   AGE
pvc-a24b35bc-6ee5-49d5-82af-dce32f7ec5be   1Gi        RWO            Delete           Bound    default/hub-db-dir   local-path              60m
kubectl get pvc
NAME                STATUS    VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
nbgrader-exchange   Pending                                                                        local-path     60m
hub-db-dir          Bound     pvc-a24b35bc-6ee5-49d5-82af-dce32f7ec5be   1Gi        RWO            local-path     60m
kubectl get pod
NAME                     READY   STATUS    RESTARTS   AGE
proxy-5dbf95cf76-5mjgl   1/1     Running   0          60m
hub-78db84f8fc-97xcd     0/1     Pending   0          60m
kubectl describe pod/hub-78db84f8fc-97xcd | tail
                             hub.jupyter.org_dedicated=core:NoSchedule
                             node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason            Age                From               Message
  ----     ------            ----               ----               -------
  Warning  FailedScheduling  51m                default-scheduler  running PreBind plugin "VolumeBinding": binding volumes: timed out waiting for the condition
  Warning  FailedScheduling  31m (x2 over 41m)  default-scheduler  running PreBind plugin "VolumeBinding": binding volumes: timed out waiting for the condition
  Warning  FailedScheduling  21m                default-scheduler  running PreBind plugin "VolumeBinding": binding volumes: timed out waiting for the condition
  Warning  FailedScheduling  52s (x2 over 11m)  default-scheduler  running PreBind plugin "VolumeBinding": binding volumes: timed out waiting for the condition