Unexpectedly long `PodInitializing` state after `Started container notebook`

Hi all,

I am running into a problem where singleuser pod for one particular binderhub-built image takes forever to complete PodInitializing state (Message: Started container notebook), for example:

jupyter-simexp-2dfmriprep-2dnoise-2dbenchmark-2dj8bc29h6   0/1     PodInitializing   0             54m

Eventually after a looong wait, it changes to running, but a latency of 1hr+ is not really acceptable. I am not running into this with other binder-built images. Any idea how to debug this problem to find the culprit? I’d be glad if you could help. Below are the logs and pod descriptions. Thank you!

Logs:

Error from server (BadRequest): container "notebook" in pod "jupyter-simexp-2dfmriprep-2dnoise-2dbenchmark-2dj2h9ifm4" is waiting to start: PodInitializing

Logs (-v 8):

0516 14:30:11.714967  770558 request.go:1181] Response Body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"container \"notebook\" in pod \"jupyter-simexp-2dfmriprep-2dnoise-2dbenchmark-2dj2h9ifm4\" is waiting to start: PodInitializing","reason":"BadRequest","code":400}
I0516 14:30:11.715427  770558 helpers.go:219] server response object: [{
  "metadata": {},
  "status": "Failure",
  "message": "container \"notebook\" in pod \"jupyter-simexp-2dfmriprep-2dnoise-2dbenchmark-2dj2h9ifm4\" is waiting to start: PodInitializing",
  "reason": "BadRequest",
  "code": 400
}]
F0516 14:30:11.715502  770558 helpers.go:118] Error from server (BadRequest): container "notebook" in pod "jupyter-simexp-2dfmriprep-2dnoise-2dbenchmark-2dj2h9ifm4" is waiting to start: PodInitializing

Describe pod:

Name:         jupyter-simexp-2dfmriprep-2dnoise-2dbenchmark-2dj2h9ifm4
Namespace:    binderhub
Priority:     0
Node:         neurolibre-node1/192.168.73.213
Start Time:   Tue, 16 May 2023 13:56:50 -0400
Labels:       app=jupyterhub
              chart=jupyterhub-1.1.2
              component=singleuser-server
              heritage=jupyterhub
              hub.jupyter.org/network-access-hub=true
              hub.jupyter.org/servername=
              hub.jupyter.org/username=simexp-2dfmriprep-2dnoise-2dbenchmark-2dj2h9ifm4
              release=binderhub
Annotations:  hub.jupyter.org/username: simexp-fmriprep-noise-benchmark-j2h9ifm4
Status:       Pending
IP:           10.244.1.42
IPs:
  IP:  10.244.1.42
Init Containers:
  block-cloud-metadata:
    Container ID:  docker://f2df1a86f1cb342ce70ab408828dea61886cfa8a09ccbc040117daa5c58b97eb
    Image:         jupyterhub/k8s-network-tools:1.1.2
    Image ID:      docker-pullable://jupyterhub/k8s-network-tools@sha256:5238f5fc8281cb6e27ca22b5887fe6af97780a3696bf304fad35d207b2114c66
    Port:          <none>
    Host Port:     <none>
    Command:
      iptables
      -A
      OUTPUT
      -d
      169.254.169.254
      -j
      DROP
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Tue, 16 May 2023 13:56:56 -0400
      Finished:     Tue, 16 May 2023 13:56:56 -0400
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:         <none>
Containers:
  notebook:
    Container ID:  
    Image:         binder-registry.conp.cloud/binder-registry.conp.cloud/binder-simexp-2dfmriprep-2ddenoise-2dbenchmark-d0ef34:26e5cfea22cbd763feca0fbeed29ee0db96c0b29
    Image ID:      
    Port:          8888/TCP
    Host Port:     0/TCP
    Args:
      python3
      -c
      import os
      import sys
      
      try:
          import jupyterlab
          major = int(jupyterlab.__version__.split(".", 1)[0])
      except Exception:
          have_lab = False
      else:
          have_lab = major >= 3
      
      if have_lab and "NotebookApp.default_url" not in " ".join(sys.argv):
          # if recent-enough lab is available, make it the default UI
          sys.argv.insert(1, "--NotebookApp.default_url=/lab/")
      
      # launch the notebook server
      os.execvp("jupyter-notebook", sys.argv)
      
      --ip=0.0.0.0
      --port=8888
      --NotebookApp.base_url=/jupyter/user/simexp-fmriprep-noise-benchmark-j2h9ifm4/
      --NotebookApp.token=1TTJnFrDRLmqU1arwoNLkg
      --NotebookApp.trust_xheaders=True
    State:          Waiting
      Reason:       PodInitializing
    Ready:          False
    Restart Count:  0
    Limits:
      memory:  8589934592
    Requests:
      cpu:     1
      memory:  4294967296
    Environment:
      BINDER_LAUNCH_HOST:             https://test.conp.cloud/
      BINDER_PERSISTENT_REQUEST:      v2/gh/SIMEXP/fmriprep-denoise-benchmark/26e5cfea22cbd763feca0fbeed29ee0db96c0b29
      BINDER_REF_URL:                 https://github.com/SIMEXP/fmriprep-denoise-benchmark/tree/26e5cfea22cbd763feca0fbeed29ee0db96c0b29
      BINDER_REPO_URL:                https://github.com/SIMEXP/fmriprep-denoise-benchmark
      BINDER_REQUEST:                 v2/gh/SIMEXP/fmriprep-denoise-benchmark/main
      CPU_GUARANTEE:                  1.0
      JPY_API_TOKEN:                  ead19dfcbebe4699bb5a04116fdd4a2b
      JUPYTERHUB_ACTIVITY_URL:        http://hub:8081/jupyter/hub/api/users/simexp-fmriprep-noise-benchmark-j2h9ifm4/activity
      JUPYTERHUB_ADMIN_ACCESS:        1
      JUPYTERHUB_API_TOKEN:           ead19dfcbebe4699bb5a04116fdd4a2b
      JUPYTERHUB_API_URL:             http://hub:8081/jupyter/hub/api
      JUPYTERHUB_BASE_URL:            /jupyter/
      JUPYTERHUB_CLIENT_ID:           jupyterhub-user-simexp-fmriprep-noise-benchmark-j2h9ifm4
      JUPYTERHUB_HOST:                
      JUPYTERHUB_OAUTH_CALLBACK_URL:  /jupyter/user/simexp-fmriprep-noise-benchmark-j2h9ifm4/oauth_callback
      JUPYTERHUB_OAUTH_SCOPES:        ["access:servers!server=simexp-fmriprep-noise-benchmark-j2h9ifm4/", "access:servers!user=simexp-fmriprep-noise-benchmark-j2h9ifm4"]
      JUPYTERHUB_SERVER_NAME:         
      JUPYTERHUB_SERVICE_PREFIX:      /jupyter/user/simexp-fmriprep-noise-benchmark-j2h9ifm4/
      JUPYTERHUB_SERVICE_URL:         http://0.0.0.0:8888/jupyter/user/simexp-fmriprep-noise-benchmark-j2h9ifm4/
      JUPYTERHUB_USER:                simexp-fmriprep-noise-benchmark-j2h9ifm4
      JUPYTER_IMAGE:                  binder-registry.conp.cloud/binder-registry.conp.cloud/binder-simexp-2dfmriprep-2ddenoise-2dbenchmark-d0ef34:26e5cfea22cbd763feca0fbeed29ee0db96c0b29
      JUPYTER_IMAGE_SPEC:             binder-registry.conp.cloud/binder-registry.conp.cloud/binder-simexp-2dfmriprep-2ddenoise-2dbenchmark-d0ef34:26e5cfea22cbd763feca0fbeed29ee0db96c0b29
      MEM_GUARANTEE:                  4294967296
      MEM_LIMIT:                      8589934592
    Mounts:
      /home/jovyan/data from shared-data (ro)
      /mnt/books from book-data (rw)
      /mnt/data from repo2data (rw)
      /usr/local/share/fill_submission_metadata.bash from files (rw,path="fill_submission_metadata")
      /usr/local/share/jb_build.bash from files (rw,path="jb_build")
      /usr/local/share/repo2data.bash from files (rw,path="repo2data")
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  files:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  singleuser
    Optional:    false
  shared-data:
    Type:          HostPath (bare host directory volume)
    Path:          /DATA
    HostPathType:  
  book-data:
    Type:          HostPath (bare host directory volume)
    Path:          /DATA/book-artifacts
    HostPathType:  
  repo2data:
    Type:          HostPath (bare host directory volume)
    Path:          /DATA
    HostPathType:  
QoS Class:         Burstable
Node-Selectors:    <none>
Tolerations:       hub.jupyter.org/dedicated=user:NoSchedule
                   hub.jupyter.org_dedicated=user:NoSchedule
                   node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                   node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type    Reason     Age   From                      Message
  ----    ------     ----  ----                      -------
  Normal  Scheduled  24m   binderhub-user-scheduler  Successfully assigned binderhub/jupyter-simexp-2dfmriprep-2dnoise-2dbenchmark-2dj2h9ifm4 to neurolibre-node1
  Normal  Pulled     24m   kubelet                   Container image "jupyterhub/k8s-network-tools:1.1.2" already present on machine
  Normal  Created    24m   kubelet                   Created container block-cloud-metadata
  Normal  Started    24m   kubelet                   Started container block-cloud-metadata
  Normal  Pulled     24m   kubelet                   Container image "binder-registry.conp.cloud/binder-registry.conp.cloud/binder-simexp-2dfmriprep-2ddenoise-2dbenchmark-d0ef34:26e5cfea22cbd763feca0fbeed29ee0db96c0b29" already present on machine
  Normal  Created    24m   kubelet                   Created container notebook
  Normal  Started    24m   kubelet                   Started container notebook
1 Like

OK now that I check the process again, I noticed that entrypoints we configured by injecting some files via Configuration Reference — Zero to JupyterHub with Kubernetes documentation were actually running during this PodInitializing phase.

That explains the long wait, still I’d be glad if someone can confirm that this is the expected behaviour.

Yeah I am also seeing similar behavior though I am also occasionally just getting a 500 if I try to reload after encountering this issue.