JupyterHub on Kubernetes: Spawn Failed for users with `timeout 3`. Increase pod creation Timeout

JunaidChaudry · April 25, 2023, 5:40pm

I am having an issue where the hub pod is unable to spawn user pods, and failing within 30 seconds with a timeout error, even though the singleuser.startTimeout is set to 30000. While trying to create the pod, I see the logs saying Attempting to create pod jupyter-user3, with timeout 3, and then eventually fails after 30 seconds with user3's server failed to start in 30000 seconds, giving up., even though it hasn’t been 30000 seconds in total.

Is there a way to increase the timeout 3 which it uses while “attempting to create the pod” ?

My cluster has a lot of resources, so pod creation, etc. takes up to a minute or two on times.

I see the following in the hub logs

[D 2023-04-25 17:28:31.789 JupyterHub user:430] Creating <class 'kubespawner.spawner.KubeSpawner'> for user3:
[D 2023-04-25 17:28:31.792 JupyterHub pages:213] Triggering spawn with default options for user3
[D 2023-04-25 17:28:31.792 JupyterHub base:934] Initiating spawn for user3
[D 2023-04-25 17:28:31.792 JupyterHub base:938] 0/64 concurrent spawns
[D 2023-04-25 17:28:31.792 JupyterHub base:943] 0 active servers
[I 2023-04-25 17:28:31.803 JupyterHub provider:651] Creating oauth client jupyterhub-user-user3
[D 2023-04-25 17:28:31.821 JupyterHub user:743] Calling Spawner.start for user3
[I 2023-04-25 17:28:31.822 JupyterHub spawner:2509] Attempting to create pvc claim-user3, with timeout 3
[I 2023-04-25 17:28:31.823 JupyterHub log:186] 302 GET /hub/spawn/user3 -> /hub/spawn-pending/user3 (user3@100.64.31.165) 36.76ms
[D 2023-04-25 17:28:31.844 JupyterHub scopes:796] Checking access via scope servers
[D 2023-04-25 17:28:31.844 JupyterHub scopes:623] Argument-based access to /hub/spawn-pending/user3 via servers
[I 2023-04-25 17:28:31.844 JupyterHub pages:394] user3 is pending spawn
[D 2023-04-25 17:28:31.845 JupyterHub log:186] 304 GET /hub/spawn-pending/user3 (user3@100.64.31.165) 3.24ms
[I 2023-04-25 17:28:31.846 JupyterHub spawner:2525] PVC claim-user3 already exists, so did not create new pvc.
[I 2023-04-25 17:28:31.851 JupyterHub spawner:2469] Attempting to create pod jupyter-user3, with timeout 3
[D 2023-04-25 17:28:31.858 JupyterHub log:186] 200 GET /hub/health (@100.64.24.192) 0.55ms
[D 2023-04-25 17:28:31.928 JupyterHub scopes:796] Checking access via scope read:servers
[D 2023-04-25 17:28:31.928 JupyterHub scopes:623] Argument-based access to /hub/api/users/user3/server/progress via read:servers
[D 2023-04-25 17:28:31.929 JupyterHub spawner:2308] progress generator: jupyter-user3
[D 2023-04-25 17:28:32.901 JupyterHub reflector:362] pods watcher timeout
[D 2023-04-25 17:28:32.902 JupyterHub reflector:281] Connecting pods watcher
[D 2023-04-25 17:28:33.541 JupyterHub reflector:362] events watcher timeout
[D 2023-04-25 17:28:33.541 JupyterHub reflector:281] Connecting events watcher
[D 2023-04-25 17:28:33.859 JupyterHub log:186] 200 GET /hub/health (@100.64.24.192) 0.94ms
[D 2023-04-25 17:28:33.859 JupyterHub log:186] 200 GET /hub/health (@100.64.24.192) 0.92ms
[I 2023-04-25 17:28:34.983 JupyterHub spawner:2469] Attempting to create pod jupyter-user3, with timeout 3
[D 2023-04-25 17:28:35.858 JupyterHub log:186] 200 GET /hub/health (@100.64.24.192) 0.65ms
[D 2023-04-25 17:28:37.858 JupyterHub log:186] 200 GET /hub/health (@100.64.24.192) 0.61ms
[I 2023-04-25 17:28:38.181 JupyterHub spawner:2469] Attempting to create pod jupyter-user3, with timeout 3
[D 2023-04-25 17:28:39.858 JupyterHub log:186] 200 GET /hub/health (@100.64.24.192) 0.72ms
[I 2023-04-25 17:28:41.730 JupyterHub spawner:2469] Attempting to create pod jupyter-user3, with timeout 3
[D 2023-04-25 17:28:41.859 JupyterHub log:186] 200 GET /hub/health (@100.64.24.192) 0.69ms
[D 2023-04-25 17:28:42.913 JupyterHub reflector:362] pods watcher timeout
[D 2023-04-25 17:28:42.913 JupyterHub reflector:281] Connecting pods watcher
[D 2023-04-25 17:28:43.566 JupyterHub reflector:362] events watcher timeout
[D 2023-04-25 17:28:43.566 JupyterHub reflector:281] Connecting events watcher
[D 2023-04-25 17:28:43.859 JupyterHub log:186] 200 GET /hub/health (@100.64.24.192) 0.92ms
[D 2023-04-25 17:28:43.859 JupyterHub log:186] 200 GET /hub/health (@100.64.24.192) 0.81ms
[D 2023-04-25 17:28:45.859 JupyterHub log:186] 200 GET /hub/health (@100.64.24.192) 0.66ms
[I 2023-04-25 17:28:46.027 JupyterHub spawner:2469] Attempting to create pod jupyter-user3, with timeout 3
[D 2023-04-25 17:28:47.858 JupyterHub log:186] 200 GET /hub/health (@100.64.24.192) 0.70ms
[D 2023-04-25 17:28:49.858 JupyterHub log:186] 200 GET /hub/health (@100.64.24.192) 0.63ms
[I 2023-04-25 17:28:51.422 JupyterHub spawner:2469] Attempting to create pod jupyter-user3, with timeout 3
[D 2023-04-25 17:28:51.858 JupyterHub log:186] 200 GET /hub/health (@100.64.24.192) 0.67ms
[D 2023-04-25 17:28:52.922 JupyterHub reflector:362] pods watcher timeout
[D 2023-04-25 17:28:52.923 JupyterHub reflector:281] Connecting pods watcher
[D 2023-04-25 17:28:53.587 JupyterHub reflector:362] events watcher timeout
[D 2023-04-25 17:28:53.588 JupyterHub reflector:281] Connecting events watcher
[D 2023-04-25 17:28:53.859 JupyterHub log:186] 200 GET /hub/health (@100.64.24.192) 0.94ms
[D 2023-04-25 17:28:53.859 JupyterHub log:186] 200 GET /hub/health (@100.64.24.192) 0.89ms
[D 2023-04-25 17:28:55.858 JupyterHub log:186] 200 GET /hub/health (@100.64.24.192) 0.69ms
[D 2023-04-25 17:28:57.858 JupyterHub log:186] 200 GET /hub/health (@100.64.24.192) 0.63ms
[I 2023-04-25 17:28:57.899 JupyterHub spawner:2469] Attempting to create pod jupyter-user3, with timeout 3
[D 2023-04-25 17:28:59.858 JupyterHub log:186] 200 GET /hub/health (@100.64.24.192) 0.63ms
[D 2023-04-25 17:29:01.858 JupyterHub log:186] 200 GET /hub/health (@100.64.24.192) 0.67ms
[I 2023-04-25 17:29:02.208 JupyterHub spawner:2469] Attempting to create pod jupyter-user3, with timeout 3
[D 2023-04-25 17:29:02.934 JupyterHub reflector:362] pods watcher timeout
[D 2023-04-25 17:29:02.934 JupyterHub reflector:281] Connecting pods watcher
[D 2023-04-25 17:29:03.613 JupyterHub reflector:362] events watcher timeout
[D 2023-04-25 17:29:03.614 JupyterHub reflector:281] Connecting events watcher
[D 2023-04-25 17:29:03.858 JupyterHub log:186] 200 GET /hub/health (@100.64.24.192) 0.81ms
[D 2023-04-25 17:29:03.858 JupyterHub log:186] 200 GET /hub/health (@100.64.24.192) 0.82ms
[W 2023-04-25 17:29:05.211 JupyterHub user:824] user3's server failed to start in 30000 seconds, giving up.
    
    Common causes of this timeout, and debugging tips:
    
    1. Everything is working, but it took too long.
       To fix: increase `Spawner.start_timeout` configuration
       to a number of seconds that is enough for spawners to finish starting.
    2. The server didn't finish starting,
       or it crashed due to a configuration issue.
       Check the single-user server's logs for hints at what needs fixing.
    
[D 2023-04-25 17:29:05.211 JupyterHub user:930] Stopping user3
[D 2023-04-25 17:29:05.216 JupyterHub user:950] Deleting oauth client jupyterhub-user-user3
[D 2023-04-25 17:29:05.224 JupyterHub user:953] Finished stopping user3
[W 2023-04-25 17:29:05.233 JupyterHub base:1030] 3 consecutive spawns failed.  Hub will exit if failure count reaches 5 before succeeding
[E 2023-04-25 17:29:05.233 JupyterHub gen:630] Exception in Future <Task finished name='Task-1601' coro=<BaseHandler.spawn_single_user.<locals>.finish_user_spawn() done, defined at /usr/local/lib/python3.9/site-packages/jupyterhub/handlers/base.py:954> exception=TimeoutError('Could not create pod dev-jupyterhub/jupyter-user3')> after timeout
    Traceback (most recent call last):
      File "/usr/local/lib/python3.9/site-packages/tornado/gen.py", line 625, in error_callback
        future.result()
      File "/usr/local/lib/python3.9/site-packages/jupyterhub/handlers/base.py", line 961, in finish_user_spawn
        await spawn_future
      File "/usr/local/lib/python3.9/site-packages/jupyterhub/user.py", line 850, in spawn
        raise e
      File "/usr/local/lib/python3.9/site-packages/jupyterhub/user.py", line 747, in spawn
        url = await gen.with_timeout(timedelta(seconds=spawner.start_timeout), f)
      File "/usr/local/lib/python3.9/site-packages/kubespawner/spawner.py", line 2663, in _start
        await exponential_backoff(
      File "/usr/local/lib/python3.9/site-packages/jupyterhub/utils.py", line 236, in exponential_backoff
        raise asyncio.TimeoutError(fail_message)
    asyncio.exceptions.TimeoutError: Could not create pod dev-jupyterhub/jupyter-user3
    https://discourse.jupyter.org/t/jupyterhub-deployed-on-kubernetes-cannot-spawn-users/17511
[I 2023-04-25 17:29:05.234 JupyterHub log:186] 200 GET /hub/api/users/user3/server/progress (user3@100.64.31.165) 33307.70ms
[D 2023-04-25 17:29:05.325 JupyterHub proxy:884] Proxy: Fetching GET http://proxy-api:8001/api/routes
[D 2023-04-25 17:29:05.327 JupyterHub proxy:395] Checking routes
[I 2023-04-25 17:29:05.495 JupyterHub log:186] 200 GET /hub/api/ (jupyterhub-idle-culler@127.0.0.1) 7.92ms
[D 2023-04-25 17:29:05.498 JupyterHub scopes:796] Checking access via scope list:users

Kubernetes Version: 1.23
Helm Chart Version: 2.0.0

manics · April 25, 2023, 7:53pm

Can you show us your Z2JH configuration file? Can you also get the logs from the failed pod, and show us the output from kubectl describe pod ...? You’ll need to catch it before it’s terminated.

JunaidChaudry · April 27, 2023, 1:52pm

Please find my values file attached below (the image registry has been updated so that I can deploy it within our company, but the images are the same as the default chart):

# fullnameOverride and nameOverride distinguishes blank strings, null values,
# and non-blank strings. For more details, see the configuration reference.
fullnameOverride: ""
nameOverride:

# custom can contain anything you want to pass to the hub pod, as all passed
# Helm template values will be made available there.
custom: {}


# imagePullSecrets is configuration to reference the k8s Secret resources the
# Helm chart's pods can get credentials from to pull their images.
imagePullSecrets: 
  - name: jdtf-artifactory

# hub relates to the hub pod, responsible for running JupyterHub, its configured
# Authenticator class KubeSpawner, and its configured Proxy class
# ConfigurableHTTPProxy. KubeSpawner creates the user pods, and
# ConfigurableHTTPProxy speaks with the actual ConfigurableHTTPProxy server in
# the proxy pod.
hub:
  revisionHistoryLimit:
  config:
    # Authenticator:  
    #   auto_login: true
    #   enable_auth_state: true
    # AzureAdOAuthenticator:
    #   client_id: your-client-id
    #   client_secret: your-client-secret
    #   oauth_callback_url: https://your-jupyterhub-domain/hub/oauth_callback
    #   tenant_id: your-tenant-id
    # JupyterHub:
    #   authenticator_class: azuread
    #   admin_access: true
    Authenticator:
      admin_users:
        - notregular
      allowed_users:
        - user3
        - user4
    DummyAuthenticator:
      password: a-shared-
    JupyterHub:
      admin_access: true
      authenticator_class: dummy

  service:
    type: ClusterIP
    annotations: {}
    extraPorts: []
  baseUrl: /
  cookieSecret:
  initContainers: []
  nodeSelector: {}
  tolerations: []
  concurrentSpawnLimit: 64
  consecutiveFailureLimit: 5
  activeServerLimit:
  deploymentStrategy:
    ## type: Recreate
    ## - sqlite-pvc backed hubs require the Recreate deployment strategy as a
    ##   typical PVC storage can only be bound to one pod at the time.
    ## - JupyterHub isn't designed to support being run in parallell. More work
    ##   needs to be done in JupyterHub itself for a fully highly available (HA)
    ##   deployment of JupyterHub on k8s is to be possible.
    type: Recreate
  db:
    type: sqlite-pvc
    upgrade:
    pvc:
      annotations: {}
      selector: {}
      accessModes:
        - ReadWriteOnce
      storage: 1Gi
      subPath:
      storageClassName:
    url:
    password:
  labels: {}
  annotations: {}
  command: []
  args: []
  extraConfig: {}
  extraFiles: {}
  extraEnv: {}
  extraContainers: []
  extraVolumes: []
  extraVolumeMounts: []
  image:
    name: jdtf-docker.artifactrepo.jnj.com/jupyterhub/k8s-hub
    tag: "2.0.0"
    pullPolicy:
    pullSecrets: 
      - name: jdtf-artifactory
  # resources:
  #   limits:
  #     cpu: 500m # 0m - 1000m
  #     memory: 2Gi # 200Mi - 4Gi
  podSecurityContext:
    fsGroup: 1000
  containerSecurityContext:
    runAsUser: 1000
    runAsGroup: 1000
    allowPrivilegeEscalation: false
  lifecycle: {}
  loadRoles: {}
  services: {}
  pdb:
    enabled: false
    maxUnavailable:
    minAvailable: 1
  networkPolicy:
    enabled: false
    ingress: []
    egress: []
    egressAllowRules:
      cloudMetadataServer: true
      dnsPortsPrivateIPs: true
      nonPrivateIPs: true
      privateIPs: true
    interNamespaceAccessLabels: ignore
    allowedIngressPorts: []
  allowNamedServers: false
  namedServerLimitPerUser:
  authenticatePrometheus:
  redirectToServer:
  shutdownOnLogout:
  templatePaths: []
  templateVars: {}
  livenessProbe:
    # The livenessProbe's aim to give JupyterHub sufficient time to startup but
    # be able to restart if it becomes unresponsive for ~5 min.
    enabled: true
    initialDelaySeconds: 300
    periodSeconds: 10
    failureThreshold: 30
    timeoutSeconds: 3
  readinessProbe:
    # The readinessProbe's aim is to provide a successful startup indication,
    # but following that never become unready before its livenessProbe fail and
    # restarts it if needed. To become unready following startup serves no
    # purpose as there are no other pod to fallback to in our non-HA deployment.
    enabled: true
    initialDelaySeconds: 0
    periodSeconds: 2
    failureThreshold: 1000
    timeoutSeconds: 1
  existingSecret:
  serviceAccount:
    create: true
    name:
    annotations: {}
  extraPodSpec: {}

rbac:
  create: true

# proxy relates to the proxy pod, the proxy-public service, and the autohttps
# pod and proxy-http service.
proxy:
  secretToken:
  annotations: {}
  deploymentStrategy:
    ## type: Recreate
    ## - JupyterHub's interaction with the CHP proxy becomes a lot more robust
    ##   with this configuration. To understand this, consider that JupyterHub
    ##   during startup will interact a lot with the k8s service to reach a
    ##   ready proxy pod. If the hub pod during a helm upgrade is restarting
    ##   directly while the proxy pod is making a rolling upgrade, the hub pod
    ##   could end up running a sequence of interactions with the old proxy pod
    ##   and finishing up the sequence of interactions with the new proxy pod.
    ##   As CHP proxy pods carry individual state this is very error prone. One
    ##   outcome when not using Recreate as a strategy has been that user pods
    ##   have been deleted by the hub pod because it considered them unreachable
    ##   as it only configured the old proxy pod but not the new before trying
    ##   to reach them.
    type: Recreate
    ## rollingUpdate:
    ## - WARNING:
    ##   This is required to be set explicitly blank! Without it being
    ##   explicitly blank, k8s will let eventual old values under rollingUpdate
    ##   remain and then the Deployment becomes invalid and a helm upgrade would
    ##   fail with an error like this:
    ##
    ##     UPGRADE FAILED
    ##     Error: Deployment.apps "proxy" is invalid: spec.strategy.rollingUpdate: Forbidden: may not be specified when strategy `type` is 'Recreate'
    ##     Error: UPGRADE FAILED: Deployment.apps "proxy" is invalid: spec.strategy.rollingUpdate: Forbidden: may not be specified when strategy `type` is 'Recreate'
    rollingUpdate:
  # service relates to the proxy-public service
  service:
    type: ClusterIP
    labels: {}
    annotations: {}
    nodePorts:
      http:
      https:
    disableHttpPort: false
    extraPorts: []
    loadBalancerIP:
    loadBalancerSourceRanges: []
  # chp relates to the proxy pod, which is responsible for routing traffic based
  # on dynamic configuration sent from JupyterHub to CHP's REST API.
  chp:
    revisionHistoryLimit:
    containerSecurityContext:
      runAsUser: 65534 # nobody user
      runAsGroup: 65534 # nobody group
      allowPrivilegeEscalation: false
    image:
      name: jdtf-docker.artifactrepo.jnj.com/jupyterhub/configurable-http-proxy
      # tag is automatically bumped to new patch versions by the
      # watch-dependencies.yaml workflow.
      #
      tag: "4.5.3" # https://github.com/jupyterhub/configurable-http-proxy/releases
      pullPolicy:
      pullSecrets: []
    extraCommandLineFlags: []
    livenessProbe:
      enabled: true
      initialDelaySeconds: 60
      periodSeconds: 10
      failureThreshold: 30
      timeoutSeconds: 3
    readinessProbe:
      enabled: true
      initialDelaySeconds: 0
      periodSeconds: 2
      failureThreshold: 1000
      timeoutSeconds: 1
    resources: {}
    defaultTarget:
    errorTarget:
    extraEnv: {}
    nodeSelector: {}
    tolerations: []
    networkPolicy:
      enabled: false
      ingress: []
      egress: []
      egressAllowRules:
        cloudMetadataServer: true
        dnsPortsPrivateIPs: true
        nonPrivateIPs: true
        privateIPs: true
      interNamespaceAccessLabels: ignore
      allowedIngressPorts: [http, https]
    pdb:
      enabled: false
      maxUnavailable:
      minAvailable: 1
    extraPodSpec: {}
  # traefik relates to the autohttps pod, which is responsible for TLS
  # termination when proxy.https.type=letsencrypt.
  traefik:
    revisionHistoryLimit:
    containerSecurityContext:
      runAsUser: 65534 # nobody user
      runAsGroup: 65534 # nobody group
      allowPrivilegeEscalation: false
    image:
      name: traefik
      # tag is automatically bumped to new patch versions by the
      # watch-dependencies.yaml workflow.
      #
      tag: "v2.8.4" # ref: https://hub.docker.com/_/traefik?tab=tags
      pullPolicy:
      pullSecrets: []
    hsts:
      includeSubdomains: false
      preload: false
      maxAge: 15724800 # About 6 months
    resources: {}
    labels: {}
    extraInitContainers: []
    extraEnv: {}
    extraVolumes: []
    extraVolumeMounts: []
    extraStaticConfig: {}
    extraDynamicConfig: {}
    nodeSelector: {}
    tolerations: []
    extraPorts: []
    networkPolicy:
      enabled: false
      ingress: []
      egress: []
      egressAllowRules:
        cloudMetadataServer: true
        dnsPortsPrivateIPs: true
        nonPrivateIPs: true
        privateIPs: true
      interNamespaceAccessLabels: ignore
      allowedIngressPorts: [http, https]
    pdb:
      enabled: false
      maxUnavailable:
      minAvailable: 1
    serviceAccount:
      create: true
      name:
      annotations: {}
    extraPodSpec: {}
  secretSync:
    containerSecurityContext:
      runAsUser: 65534 # nobody user
      runAsGroup: 65534 # nobody group
      allowPrivilegeEscalation: false
    image:
      name: jupyterhub/k8s-secret-sync
      tag: "2.0.0"
      pullPolicy:
      pullSecrets: []
    resources: {}
  labels: {}
  https:
    enabled: false
    type: letsencrypt
    #type: letsencrypt, manual, offload, secret
    letsencrypt:
      contactEmail:
      # Specify custom server here (https://acme-staging-v02.api.letsencrypt.org/directory) to hit staging LE
      acmeServer: https://acme-v02.api.letsencrypt.org/directory
    manual:
      key:
      cert:
    secret:
      name:
      key: tls.key
      crt: tls.crt
    hosts: []

# singleuser relates to the configuration of KubeSpawner which runs in the hub
# pod, and its spawning of user pods such as jupyter-myusername.
singleuser:
  defaultUrl: "/lab"
  extraEnv:
    JUPYTERHUB_SINGLEUSER_APP: "jupyter_server.serverapp.ServerApp"
  nodeSelector: {}
  extraNodeAffinity:
    required: []
    preferred: []
  networkTools:
    image:
      name: jdtf-docker.artifactrepo.jnj.com/jupyterhub/k8s-network-tools
      tag: "2.0.0"
      pullPolicy: IfNotPresent
      pullSecrets: 
        - name: jdtf-artifactory
    resources: {}
  # cloudMetadata:
  #   # block set to true will append a privileged initContainer using the
  #   # iptables to block the sensitive metadata server at the provided ip.
  #   blockWithIptables: true
  #   ip: 169.254.169.254
  networkPolicy:
    enabled: false
    ingress: []
    egress: []
    egressAllowRules:
      cloudMetadataServer: false
      dnsPortsPrivateIPs: true
      nonPrivateIPs: true
      privateIPs: false
    interNamespaceAccessLabels: ignore
    allowedIngressPorts: []
  events: true
  extraLabels:
    hub.jupyter.org/network-access-hub: "true"
  uid: 1000
  fsGid: 100
  serviceAccountName:
  storage:
    type: dynamic
    extraLabels: {}
    extraVolumes: []
    extraVolumeMounts: []
    static:
      pvcName:
      subPath: "{username}"
    capacity: 5Gi
    homeMountPath: /home/jovyan
    dynamic:
      storageClass: gp3
      pvcNameTemplate: claim-{username}{servername}
      volumeNameTemplate: volume-{username}{servername}
      storageAccessModes: [ReadWriteOnce]
  image:
    name: jdtf-docker.artifactrepo.jnj.com/jupyterhub/k8s-singleuser-sample
    tag: "2.0.0"
    pullPolicy:
    pullSecrets: 
      - name: jdtf-artifactory
  startTimeout: 3600
  cpu:
    limit: 0.5
    guarantee: 0.2
  memory:
    limit: 4G
    guarantee: 1G
  # extraResource:
  #   limits: {}
  #   guarantees: {}
  cmd: jupyterhub-singleuser

# scheduling relates to the user-scheduler pods and user-placeholder pods.
scheduling:
  userScheduler:
    enabled: true
    revisionHistoryLimit:
    replicas: 1
    logLevel: 4
    # plugins are configured on the user-scheduler to make us score how we
    # schedule user pods in a way to help us schedule on the most busy node. By
    # doing this, we help scale down more effectively. It isn't obvious how to
    # enable/disable scoring plugins, and configure them, to accomplish this.
    #
    # plugins ref: https://kubernetes.io/docs/reference/scheduling/config/#scheduling-plugins-1
    # migration ref: https://kubernetes.io/docs/reference/scheduling/config/#scheduler-configuration-migrations
    #
    plugins:
      score:
        # These scoring plugins are enabled by default according to
        # https://kubernetes.io/docs/reference/scheduling/config/#scheduling-plugins
        # 2022-02-22.
        #
        # Enabled with high priority:
        # - NodeAffinity
        # - InterPodAffinity
        # - NodeResourcesFit
        # - ImageLocality
        # Remains enabled with low default priority:
        # - TaintToleration
        # - PodTopologySpread
        # - VolumeBinding
        # Disabled for scoring:
        # - NodeResourcesBalancedAllocation
        #
        disabled:
          # We disable these plugins (with regards to scoring) to not interfere
          # or complicate our use of NodeResourcesFit.
          - name: NodeResourcesBalancedAllocation
          # Disable plugins to be allowed to enable them again with a different
          # weight and avoid an error.
          - name: NodeAffinity
          - name: InterPodAffinity
          - name: NodeResourcesFit
          - name: ImageLocality
        enabled:
          - name: NodeAffinity
            weight: 14631
          - name: InterPodAffinity
            weight: 1331
          - name: NodeResourcesFit
            weight: 121
          - name: ImageLocality
            weight: 11
    pluginConfig:
      # Here we declare that we should optimize pods to fit based on a
      # MostAllocated strategy instead of the default LeastAllocated.
      - name: NodeResourcesFit
        args:
          scoringStrategy:
            resources:
              - name: cpu
                weight: 1
              - name: memory
                weight: 1
            type: MostAllocated
    containerSecurityContext:
      runAsUser: 65534 # nobody user
      runAsGroup: 65534 # nobody group
      allowPrivilegeEscalation: false
    image:
      # IMPORTANT: Bumping the minor version of this binary should go hand in
      #            hand with an inspection of the user-scheduelrs RBAC resources
      #            that we have forked in
      #            templates/scheduling/user-scheduler/rbac.yaml.
      #
      #            Debugging advice:
      #
      #            - Is configuration of kube-scheduler broken in
      #              templates/scheduling/user-scheduler/configmap.yaml?
      #
      #            - Is the kube-scheduler binary's compatibility to work
      #              against a k8s api-server that is too new or too old?
      #
      #            - You can update the GitHub workflow that runs tests to
      #              include "deploy/user-scheduler" in the k8s namespace report
      #              and reduce the user-scheduler deployments replicas to 1 in
      #              dev-config.yaml to get relevant logs from the user-scheduler
      #              pods. Inspect the "Kubernetes namespace report" action!
      #
      #            - Typical failures are that kube-scheduler fails to search for
      #              resources via its "informers", and won't start trying to
      #              schedule pods before they succeed which may require
      #              additional RBAC permissions or that the k8s api-server is
      #              aware of the resources.
      #
      #            - If "successfully acquired lease" can be seen in the logs, it
      #              is a good sign kube-scheduler is ready to schedule pods.
      #
      name: jdtf-docker.artifactrepo.jnj.com/jupyterhub/kube-scheduler
      # tag is automatically bumped to new patch versions by the
      # watch-dependencies.yaml workflow. The minor version is pinned in the
      # workflow, and should be updated there if a minor version bump is done
      # here.
      #
      tag: "v1.23.10" # ref: https://github.com/kubernetes/website/blob/main/content/en/releases/patch-releases.md
      pullPolicy:
      pullSecrets: []
    nodeSelector: {}
    tolerations: []
    labels: {}
    annotations: {}
    pdb:
      enabled: true
      maxUnavailable: 1
      minAvailable:
    resources: {}
    serviceAccount:
      create: true
      name:
      annotations: {}
    extraPodSpec: {}
  podPriority:
    enabled: false
    globalDefault: false
    defaultPriority: 0
    imagePullerPriority: -5
    userPlaceholderPriority: -10
  userPlaceholder:
    enabled: false
    image:
      name: jdtf-docker.artifactrepo.jnj.com/jupyterhub/pause
      # tag is automatically bumped to new patch versions by the
      # watch-dependencies.yaml workflow.
      #
      # If you update this, also update prePuller.pause.image.tag
      #
      tag: "3.8"
      pullPolicy:
      pullSecrets: 
        - name: jdtf-artifactory
    revisionHistoryLimit:
    replicas: 2
    labels: {}
    annotations: {}
    containerSecurityContext:
      runAsUser: 65534 # nobody user
      runAsGroup: 65534 # nobody group
      allowPrivilegeEscalation: false
    resources: {}
  corePods:
    tolerations:
      - key: hub.jupyter.org/dedicated
        operator: Equal
        value: core
        effect: NoSchedule
      - key: hub.jupyter.org_dedicated
        operator: Equal
        value: core
        effect: NoSchedule
    nodeAffinity:
      matchNodePurpose: prefer
  userPods:
    tolerations:
      - key: hub.jupyter.org/dedicated
        operator: Equal
        value: user
        effect: NoSchedule
      - key: hub.jupyter.org_dedicated
        operator: Equal
        value: user
        effect: NoSchedule
    nodeAffinity:
      matchNodePurpose: prefer

# prePuller relates to the hook|continuous-image-puller DaemonsSets
prePuller:
  revisionHistoryLimit:
  labels: {}
  annotations: {}
  resources: {}
  containerSecurityContext:
    runAsUser: 65534 # nobody user
    runAsGroup: 65534 # nobody group
    allowPrivilegeEscalation: false
  extraTolerations: []
  # hook relates to the hook-image-awaiter Job and hook-image-puller DaemonSet
  hook:
    enabled: true
    pullOnlyOnChanges: true
    # image and the configuration below relates to the hook-image-awaiter Job
    image:
      name: jdtf-docker.artifactrepo.jnj.com/jupyterhub/k8s-image-awaiter
      tag: "2.0.0"
      pullPolicy:
      pullSecrets: []
    containerSecurityContext:
      runAsUser: 65534 # nobody user
      runAsGroup: 65534 # nobody group
      allowPrivilegeEscalation: false
    podSchedulingWaitDuration: 10
    nodeSelector: {}
    tolerations: []
    resources: {}
    serviceAccount:
      create: true
      name:
      annotations: {}
  continuous:
    enabled: false
  pullProfileListImages: true
  extraImages: {}
  pause:
    containerSecurityContext:
      runAsUser: 65534 # nobody user
      runAsGroup: 65534 # nobody group
      allowPrivilegeEscalation: false
    image:
      name: jdtf-docker.artifactrepo.jnj.com/jupyterhub/pause
      # tag is automatically bumped to new patch versions by the
      # watch-dependencies.yaml workflow.
      #
      # If you update this, also update scheduling.userPlaceholder.image.tag
      #
      tag: "3.8"
      pullPolicy:
      pullSecrets: 
        - name: jdtf-artifactory

ingress:
  enabled: true
  annotations: 
    cert-manager.io/cluster-issuer: sectigo
  ingressClassName: nginx
  hosts: 
    - dev.jupyterhub.jetdecisionengine.apps.jnj.com
  pathType: ImplementationSpecific
  tls:
    - hosts:
        - dev.jupyterhub.jetdecisionengine.apps.jnj.com
      secretName: dev-jupyterhub-tls

# cull relates to the jupyterhub-idle-culler service, responsible for evicting
# inactive singleuser pods.
#
# The configuration below, except for enabled, corresponds to command-line flags
# for jupyterhub-idle-culler as documented here:
# https://github.com/jupyterhub/jupyterhub-idle-culler#as-a-standalone-script
#
cull:
  enabled: true
  users: false # --cull-users
  adminUsers: true # --cull-admin-users
  removeNamedServers: false # --remove-named-servers
  timeout: 3600 # --timeout
  every: 600 # --cull-every
  concurrency: 10 # --concurrency
  maxAge: 0 # --max-age

debug:
  enabled: true

global:
  safeToShowValues: false

The user pod never starts up… so there are no logs for the pod. The hub spits out the timeout error in the logs, that I mentioned above, while trying to spin up the user pods.

JunaidChaudry · April 27, 2023, 1:54pm

I believe my issue will be resolved if I can find a way to increase the timeout that is mentioned in the log line above as 3.

As far as the configuration file goes, I have posted it, but has been hidden by the bot. Hopefully it will show up soon.
No user pod gets created, so there are no logs / describe pod for that. The hub pod provides the logs that I pasted in the initial thread

JunaidChaudry · April 27, 2023, 3:57pm

@manics I started looking into some github issues, and ended up finding where the timeout 3 is coming for the spawner. kubespawner/spawner.py at 9f31d48569025b7bf58dd9bcfe77984cf204c562 · jupyterhub/kubespawner (github.com)

The default is set to 3 seconds, which is what gets used, and based on logs, doesn’t look like startTimeout overwrites that. Is there a way to pass in a different value for that?

JunaidChaudry · May 1, 2023, 6:04pm

@manics I have found the issue, and opened a PR to fix it. Can you please get it merged and released in the upcoming helm chart release?
Adding support for spawner ‘k8s_api_request_timeout’ parameter override by JunaidChaudry · Pull Request #3104 · jupyterhub/zero-to-jupyterhub-k8s (github.com)

Topic		Replies	Views
Spawn failed due to 30 second timeout JupyterHub	4	4662	June 7, 2021
Jupyterhub helm chart 3.1.0+k8s 1.27.6: User pods not spawning got TimeoutError Zero to JupyterHub on Kubernetes jupyterhub , how-to , help-wanted	4	1096	October 19, 2023
Spawn failed: Timeout even when start_timeout is set to 3600 seconds Zero to JupyterHub on Kubernetes help-wanted	24	12612	September 4, 2024
Spawn failed: Could not create PVC claim-jovyan with spawner.start_timeout JupyterHub jupyterhub	1	406	March 17, 2024
Spawn failed: Could not create pod JupyterHub community , help-wanted	5	1993	May 24, 2022

JupyterHub on Kubernetes: Spawn Failed for users with `timeout 3`. Increase pod creation Timeout

Related topics