Hub is failing to start single user server

SUN_KUMAR · March 14, 2023, 10:32am

Hello everyone,

I am facing and issue while setting up jupyterhub.
For me both Hub and proxy are running and when I try to login(Authenticator disabled) and get redirected to single user server my Hub crashes after few attempts. Adding my Hub Logs for more detail:

**[I 2023-03-14 10:11:20.211 JupyterHub pages:396] sunkumar is pending spawn**
**[I 2023-03-14 10:11:20.214 JupyterHub log:186] 200 GET /hub/spawn-pending/sunkumar (sunkumar@10.244.43.64) 5.17ms**
**[D 2023-03-14 10:11:20.520 JupyterHub scopes:863] Checking access via scope read:servers**
**[D 2023-03-14 10:11:20.520 JupyterHub scopes:690] Argument-based access to /hub/api/users/sunkumar/server/progress via read:servers**
**[D 2023-03-14 10:11:20.521 JupyterHub spawner:2328] progress generator: jupyter-sunkumar**
**[D 2023-03-14 10:11:20.929 JupyterHub log:186] 200 GET /hub/health (@10.44.186.212) 0.56ms**
**[D 2023-03-14 10:11:21.605 JupyterHub log:186] 200 GET /hub/static/components/font-awesome/fonts/fontawesome-webfont.woff2?v=4.7.0 (@10.44.186.205) 1.12ms**
**[D 2023-03-14 10:11:22.930 JupyterHub log:186] 200 GET /hub/health (@10.44.186.212) 0.61ms**
**[I 2023-03-14 10:11:22.938 JupyterHub spawner:2529] Attempting to create pvc claim-sunkumar, with timeout 3**
**[D 2023-03-14 10:11:24.932 JupyterHub log:186] 200 GET /hub/health (@10.44.186.212) 1.25ms**
**[I 2023-03-14 10:11:25.985 JupyterHub spawner:2529] Attempting to create pvc claim-sunkumar, with timeout 3**
**[D 2023-03-14 10:11:26.930 JupyterHub log:186] 200 GET /hub/health (@10.44.186.212) 0.55ms**
**[D 2023-03-14 10:11:28.930 JupyterHub log:186] 200 GET /hub/health (@10.44.186.212) 0.53ms**
**[I 2023-03-14 10:11:29.116 JupyterHub spawner:2529] Attempting to create pvc claim-sunkumar, with timeout 3**
**[D 2023-03-14 10:11:30.929 JupyterHub log:186] 200 GET /hub/health (@10.44.186.212) 0.58ms**
**[I 2023-03-14 10:11:32.710 JupyterHub spawner:2529] Attempting to create pvc claim-sunkumar, with timeout 3**
**[D 2023-03-14 10:11:32.929 JupyterHub log:186] 200 GET /hub/health (@10.44.186.212) 0.53ms**
**[D 2023-03-14 10:11:34.930 JupyterHub log:186] 200 GET /hub/health (@10.44.186.212) 0.62ms**
**[I 2023-03-14 10:11:36.428 JupyterHub spawner:2529] Attempting to create pvc claim-sunkumar, with timeout 3**
**[D 2023-03-14 10:11:36.930 JupyterHub log:186] 200 GET /hub/health (@10.44.186.212) 0.61ms**
**[D 2023-03-14 10:11:38.930 JupyterHub log:186] 200 GET /hub/health (@10.44.186.212) 0.51ms**
**[D 2023-03-14 10:11:40.929 JupyterHub log:186] 200 GET /hub/health (@10.44.186.212) 0.55ms**
**[I 2023-03-14 10:11:42.881 JupyterHub spawner:2529] Attempting to create pvc claim-sunkumar, with timeout 3**
**[D 2023-03-14 10:11:42.929 JupyterHub log:186] 200 GET /hub/health (@10.44.186.212) 0.56ms**
**[D 2023-03-14 10:11:43.904 JupyterHub proxy:880] Proxy: Fetching GET http://proxy-api:8001/api/routes**
**[D 2023-03-14 10:11:43.908 JupyterHub proxy:392] Checking routes**
**[D 2023-03-14 10:11:44.930 JupyterHub log:186] 200 GET /hub/health (@10.44.186.212) 0.56ms**
**[D 2023-03-14 10:11:46.930 JupyterHub log:186] 200 GET /hub/health (@10.44.186.212) 0.57ms**
**[D 2023-03-14 10:11:48.930 JupyterHub log:186] 200 GET /hub/health (@10.44.186.212) 0.60ms**
**[I 2023-03-14 10:11:50.884 JupyterHub spawner:2529] Attempting to create pvc claim-sunkumar, with timeout 3**
**[D 2023-03-14 10:11:50.930 JupyterHub log:186] 200 GET /hub/health (@10.44.186.212) 0.64ms**
**[D 2023-03-14 10:11:52.930 JupyterHub log:186] 200 GET /hub/health (@10.44.186.212) 0.59ms**
**[W 2023-03-14 10:11:53.885 JupyterHub user:825] sunkumar's server failed to start in 300 seconds, giving up.**
**    **
**    Common causes of this timeout, and debugging tips:**
**    **
**    1. Everything is working, but it took too long.**
**       To fix: increase `Spawner.start_timeout` configuration**
**       to a number of seconds that is enough for spawners to finish starting.**
**    2. The server didn't finish starting,**
**       or it crashed due to a configuration issue.**
**       Check the single-user server's logs for hints at what needs fixing.**
**    **
**[D 2023-03-14 10:11:53.885 JupyterHub user:931] Stopping sunkumar**
**[D 2023-03-14 10:11:54.930 JupyterHub log:186] 200 GET /hub/health (@10.44.186.212) 0.56ms**
**[D 2023-03-14 10:11:56.930 JupyterHub log:186] 200 GET /hub/health (@10.44.186.212) 0.57ms**
**[D 2023-03-14 10:11:58.930 JupyterHub log:186] 200 GET /hub/health (@10.44.186.212) 0.53ms**
**[D 2023-03-14 10:12:00.930 JupyterHub log:186] 200 GET /hub/health (@10.44.186.212) 0.52ms**
**[D 2023-03-14 10:12:02.929 JupyterHub log:186] 200 GET /hub/health (@10.44.186.212) 0.58ms**
**[D 2023-03-14 10:12:04.930 JupyterHub log:186] 200 GET /hub/health (@10.44.186.212) 0.55ms**
**[D 2023-03-14 10:12:06.930 JupyterHub log:186] 200 GET /hub/health (@10.44.186.212) 0.54ms**
**[D 2023-03-14 10:12:08.930 JupyterHub log:186] 200 GET /hub/health (@10.44.186.212) 0.58ms**
**[D 2023-03-14 10:12:10.930 JupyterHub log:186] 200 GET /hub/health (@10.44.186.212) 0.53ms**
**[D 2023-03-14 10:12:12.931 JupyterHub log:186] 200 GET /hub/health (@10.44.186.212) 0.99ms**
**[D 2023-03-14 10:12:14.930 JupyterHub log:186] 200 GET /hub/health (@10.44.186.212) 0.79ms**
**[D 2023-03-14 10:12:16.930 JupyterHub log:186] 200 GET /hub/health (@10.44.186.212) 0.54ms**
**[D 2023-03-14 10:12:18.930 JupyterHub log:186] 200 GET /hub/health (@10.44.186.212) 0.51ms**
**[E 2023-03-14 10:12:20.418 JupyterHub reflector:385] Initial list of pods failed**
**    Traceback (most recent call last):**
**      File "/usr/local/lib/python3.11/site-packages/kubespawner/reflector.py", line 383, in start**
**        await self._list_and_update()**
**      File "/usr/local/lib/python3.11/site-packages/kubespawner/reflector.py", line 228, in _list_and_update**
**        initial_resources_raw = await list_method(**kwargs)**
**                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^**
**      File "/usr/local/lib/python3.11/site-packages/kubernetes_asyncio/client/api_client.py", line 185, in __call_api**
**        response_data = await self.request(**
**                        ^^^^^^^^^^^^^^^^^^^**
**      File "/usr/local/lib/python3.11/site-packages/kubernetes_asyncio/client/rest.py", line 193, in GET**
**        return (await self.request("GET", url,**
**                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^**
**      File "/usr/local/lib/python3.11/site-packages/kubernetes_asyncio/client/rest.py", line 177, in request**
**        r = await self.pool_manager.request(**args)**
**            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^**
**      File "/usr/local/lib/python3.11/site-packages/aiohttp/client.py", line 467, in _request**
**        with timer:**
**      File "/usr/local/lib/python3.11/site-packages/aiohttp/helpers.py", line 720, in __exit__**
**        raise asyncio.TimeoutError from None**
**    TimeoutError**
**    **
**[E 2023-03-14 10:12:20.420 JupyterHub reflector:385] Initial list of events failed**
**    Traceback (most recent call last):**
**      File "/usr/local/lib/python3.11/site-packages/kubespawner/reflector.py", line 383, in start**
**        await self._list_and_update()**
**      File "/usr/local/lib/python3.11/site-packages/kubespawner/reflector.py", line 228, in _list_and_update**
**        initial_resources_raw = await list_method(**kwargs)**
**                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^**
**      File "/usr/local/lib/python3.11/site-packages/kubernetes_asyncio/client/api_client.py", line 185, in __call_api**
**        response_data = await self.request(**
**                        ^^^^^^^^^^^^^^^^^^^**
**      File "/usr/local/lib/python3.11/site-packages/kubernetes_asyncio/client/rest.py", line 193, in GET**
**        return (await self.request("GET", url,**
**                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^**
**      File "/usr/local/lib/python3.11/site-packages/kubernetes_asyncio/client/rest.py", line 177, in request**
**        r = await self.pool_manager.request(**args)**
**            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^**
**      File "/usr/local/lib/python3.11/site-packages/aiohttp/client.py", line 467, in _request**
**        with timer:**
**      File "/usr/local/lib/python3.11/site-packages/aiohttp/helpers.py", line 720, in __exit__**
**        raise asyncio.TimeoutError from None**
**    TimeoutError**
**    **
**[E 2023-03-14 10:12:20.508 JupyterHub user:843] Failed to cleanup sunkumar's server that failed to start**
**    Traceback (most recent call last):**
**      File "/usr/local/lib/python3.11/site-packages/jupyterhub/user.py", line 841, in spawn**
**        await self.stop(spawner.name)**
**      File "/usr/local/lib/python3.11/site-packages/jupyterhub/user.py", line 935, in stop**
**        status = await spawner.poll()**
**                 ^^^^^^^^^^^^^^^^^^^^**
**      File "/usr/local/lib/python3.11/site-packages/kubespawner/spawner.py", line 212, in async_method**
**        await self.pod_reflector.first_load_future**
**      File "/usr/local/lib/python3.11/site-packages/kubespawner/reflector.py", line 383, in start**
**        await self._list_and_update()**
**      File "/usr/local/lib/python3.11/site-packages/kubespawner/reflector.py", line 228, in _list_and_update**
**        initial_resources_raw = await list_method(**kwargs)**
**                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^**
**      File "/usr/local/lib/python3.11/site-packages/kubernetes_asyncio/client/api_client.py", line 185, in __call_api**
**        response_data = await self.request(**
**                        ^^^^^^^^^^^^^^^^^^^**
**      File "/usr/local/lib/python3.11/site-packages/kubernetes_asyncio/client/rest.py", line 193, in GET**
**        return (await self.request("GET", url,**
**                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^**
**      File "/usr/local/lib/python3.11/site-packages/kubernetes_asyncio/client/rest.py", line 177, in request**
**        r = await self.pool_manager.request(**args)**
**            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^**
**      File "/usr/local/lib/python3.11/site-packages/aiohttp/client.py", line 467, in _request**
**        with timer:**
**      File "/usr/local/lib/python3.11/site-packages/aiohttp/helpers.py", line 720, in __exit__**
**        raise asyncio.TimeoutError from None**
**    TimeoutError**
**    **
**[E 2023-03-14 10:12:20.510 JupyterHub spawner:2422] Reflector for pods failed to start.**
**    Traceback (most recent call last):**
**      File "/usr/local/lib/python3.11/site-packages/kubespawner/spawner.py", line 2420, in catch_reflector_start**
**        await f**
**      File "/usr/local/lib/python3.11/site-packages/kubespawner/reflector.py", line 383, in start**
**        await self._list_and_update()**
**      File "/usr/local/lib/python3.11/site-packages/kubespawner/reflector.py", line 228, in _list_and_update**
**        initial_resources_raw = await list_method(**kwargs)**
**                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^**
**      File "/usr/local/lib/python3.11/site-packages/kubernetes_asyncio/client/api_client.py", line 185, in __call_api**
**        response_data = await self.request(**
**                        ^^^^^^^^^^^^^^^^^^^**
**      File "/usr/local/lib/python3.11/site-packages/kubernetes_asyncio/client/rest.py", line 193, in GET**
**        return (await self.request("GET", url,**
**                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^**
**      File "/usr/local/lib/python3.11/site-packages/kubernetes_asyncio/client/rest.py", line 177, in request**
**        r = await self.pool_manager.request(**args)**
**            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^**
**      File "/usr/local/lib/python3.11/site-packages/aiohttp/client.py", line 467, in _request**
**        with timer:**
**      File "/usr/local/lib/python3.11/site-packages/aiohttp/helpers.py", line 720, in __exit__**
**        raise asyncio.TimeoutError from None**
**    TimeoutError**
**    **
**Task was destroyed but it is pending!**
**task: <Task pending name='Task-233' coro=<KubeSpawner._start_reflector.<locals>.catch_reflector_start() running at /usr/local/lib/python3.11/site-packages/kubespawner/spawner.py:2420> wait_for=<Task finished name='Task-232' coro=<ResourceReflector.start() done, defined at /usr/local/lib/python3.11/site-packages/kubespawner/reflector.py:370> exception=TimeoutError()>>**
**Task exception was never retrieved**
**future: <Task finished name='Task-232' coro=<ResourceReflector.start() done, defined at /usr/local/lib/python3.11/site-packages/kubespawner/reflector.py:370> exception=TimeoutError()>**
**Traceback (most recent call last):**
**  File "/usr/local/lib/python3.11/site-packages/kubespawner/reflector.py", line 383, in start**
**    await self._list_and_update()**
**  File "/usr/local/lib/python3.11/site-packages/kubespawner/reflector.py", line 228, in _list_and_update**
**    initial_resources_raw = await list_method(**kwargs)**
**                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^**
**  File "/usr/local/lib/python3.11/site-packages/kubernetes_asyncio/client/api_client.py", line 185, in __call_api**
**    response_data = await self.request(**
**                    ^^^^^^^^^^^^^^^^^^^**
**  File "/usr/local/lib/python3.11/site-packages/kubernetes_asyncio/client/rest.py", line 193, in GET**
**    return (await self.request("GET", url,**
**            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^**
**  File "/usr/local/lib/python3.11/site-packages/kubernetes_asyncio/client/rest.py", line 177, in request**
**    r = await self.pool_manager.request(**args)**
**        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^**
**  File "/usr/local/lib/python3.11/site-packages/aiohttp/client.py", line 467, in _request**
**    with timer:**
**  File "/usr/local/lib/python3.11/site-packages/aiohttp/helpers.py", line 720, in __exit__**
**    raise asyncio.TimeoutError from None**
**TimeoutError**
**Task was destroyed but it is pending!**
**task: <Task pending name='Task-229' coro=<shared_client.<locals>.close_client_task() running at /usr/local/lib/python3.11/site-packages/kubespawner/clients.py:58> wait_for=<Future pending cb=[Task.task_wakeup()]>>**
**Exception ignored in: <coroutine object shared_client.<locals>.close_client_task at 0x7f4459687100>**
**RuntimeError: coroutine ignored GeneratorExit**
**Task exception was never retrieved**
**future: <Task finished name='Task-231' coro=<KubeSpawner._start_reflector.<locals>.catch_reflector_start() done, defined at /usr/local/lib/python3.11/site-packages/kubespawner/spawner.py:2418> exception=SystemExit(1)>**
**Traceback (most recent call last):**
**  File "/usr/local/lib/python3.11/site-packages/kubespawner/spawner.py", line 2420, in catch_reflector_start**
**    await f**
**  File "/usr/local/lib/python3.11/site-packages/kubespawner/reflector.py", line 383, in start**
**    await self._list_and_update()**
**  File "/usr/local/lib/python3.11/site-packages/kubespawner/reflector.py", line 228, in _list_and_update**
**    initial_resources_raw = await list_method(**kwargs)**
**                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^**
**  File "/usr/local/lib/python3.11/site-packages/kubernetes_asyncio/client/api_client.py", line 185, in __call_api**
**    response_data = await self.request(**
**                    ^^^^^^^^^^^^^^^^^^^**
**  File "/usr/local/lib/python3.11/site-packages/kubernetes_asyncio/client/rest.py", line 193, in GET**
**    return (await self.request("GET", url,**
**            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^**
**  File "/usr/local/lib/python3.11/site-packages/kubernetes_asyncio/client/rest.py", line 177, in request**
**    r = await self.pool_manager.request(**args)**
**        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^**
**  File "/usr/local/lib/python3.11/site-packages/aiohttp/client.py", line 467, in _request**
**    with timer:**
**  File "/usr/local/lib/python3.11/site-packages/aiohttp/helpers.py", line 720, in __exit__**
**    raise asyncio.TimeoutError from None**
**TimeoutError**

**During handling of the above exception, another exception occurred:**

**Traceback (most recent call last):**
**  File "/usr/local/lib/python3.11/site-packages/jupyterhub/app.py", line 3350, in launch_instance**
**    loop.start()**
**  File "/usr/local/lib/python3.11/site-packages/tornado/platform/asyncio.py", line 215, in start**
**    self.asyncio_loop.run_forever()**
**  File "/usr/local/lib/python3.11/asyncio/base_events.py", line 607, in run_forever**
**    self._run_once()**
**  File "/usr/local/lib/python3.11/asyncio/base_events.py", line 1922, in _run_once**
**    handle._run()**
**  File "/usr/local/lib/python3.11/asyncio/events.py", line 80, in _run**
**    self._context.run(self._callback, *self._args)**
**  File "/usr/local/lib/python3.11/site-packages/kubespawner/spawner.py", line 2423, in catch_reflector_start**
**    sys.exit(1)**
**SystemExit: 1**
**Task was destroyed but it is pending!**
**task: <Task pending name='Task-248' coro=<RequestHandler._execute() running at /usr/local/lib/python3.11/site-packages/tornado/web.py:1713> wait_for=<Future pending cb=[Task.task_wakeup()]> cb=[_HandlerDelegate.execute.<locals>.<lambda>() at /usr/local/lib/python3.11/site-packages/tornado/web.py:2361]>**

I am not able to understand the possible root cause of this behaviour so that I could fix this. I am quite new to this setup. Any help would be greatly appreciated.

vvcb · March 14, 2023, 8:47pm

Hi, there are some similarities with the error messages that I was getting → After OAuthenticator successfully returned the user, kubespawner stuck at creating PVC - #9 by vvcb

Maybe entirely unrelated, though. Are you able to provide some information on what infrastructure you are running this on?

minrk · March 15, 2023, 9:41am

It looks like KubeSpawner is not able to talk to the kubernetes API at all. This may have to do with network policies, service accounts, or similar. Can you share more about how you deployed kubernetes and the configuration you have used?

SUN_KUMAR · March 15, 2023, 11:22am

K8s cluster I am using is already running on Bare Metal machines with 3 master node and 10 worker node. It is behind a corporate proxy
K8s cluster was already having k8s-scheduler and pause available so I have disabled scheduler component of jupyterhub.
I can see rbac and role binding created post deployment(helm) also services are up.

values config:

fullnameOverride: ""
nameOverride:

custom: {}

imagePullSecret:
  create: false
  automaticReferenceInjection: true
  registry:
  username:
  password:
  email:

imagePullSecrets: []

hub:
  revisionHistoryLimit:
  config:
    JupyterHub:
      admin_access: true
      authenticator_class: dummy
  service:
    type: ClusterIP
    annotations: {}
    ports:
      nodePort:
    extraPorts: []
    loadBalancerIP:
  baseUrl: /
  cookieSecret:
  initContainers: []
  nodeSelector: {}
  tolerations: []
  concurrentSpawnLimit: 64
  consecutiveFailureLimit: 5
  activeServerLimit:
  deploymentStrategy:
    type: Recreate
  db:
    type: sqlite-pvc
    upgrade:
    pvc:
      annotations: {}
      selector: {}
      accessModes:
        - ReadWriteOnce
      storage: 1Gi
      subPath:
      storageClassName: longhorn
    url:
    password:
  labels: {}
  annotations: {}
  command: []
  args: []
  extraConfig: {}
  extraFiles: {}
  extraEnv: {}
  extraContainers: []
  extraVolumes: []
  extraVolumeMounts: []
  image:
    name: <redacted>
    tag: "0.3.0"
    pullPolicy:
    pullSecrets: []
  resources:
    limits:
      cpu: 450m
      memory: 1000Mi
    requests:
      cpu: 250m
      memory: 768Mi
  podSecurityContext:
    fsGroup: 1000
  containerSecurityContext:
    runAsNonRoot: true
    runAsUser: 1000
    runAsGroup: 1000
    allowPrivilegeEscalation: false
  lifecycle: {}
  loadRoles: {}
  services: {}
  pdb:
    enabled: false
    maxUnavailable:
    minAvailable: 1
  networkPolicy:
    enabled: true
    ingress: []
    egress: []
    egressAllowRules:
      cloudMetadataServer: false
      dnsPortsPrivateIPs: true
      nonPrivateIPs: true
      privateIPs: false
    interNamespaceAccessLabels: ignore
    allowedIngressPorts: []
  allowNamedServers: false
  namedServerLimitPerUser:
  authenticatePrometheus:
  redirectToServer:
  shutdownOnLogout:
  templatePaths: []
  templateVars: {}
  livenessProbe:

    enabled: true
    initialDelaySeconds: 300
    periodSeconds: 10
    failureThreshold: 30
    timeoutSeconds: 3
  readinessProbe:

    enabled: true
    initialDelaySeconds: 0
    periodSeconds: 2
    failureThreshold: 1000
    timeoutSeconds: 1
  existingSecret:
  serviceAccount:
    create: true
    name:
    annotations: {}
  extraPodSpec: {}

rbac:
  create: true


proxy:
  secretToken:
  annotations: {}
  deploymentStrategy:
    type: Recreate
  service:
    type: ClusterIP
    labels: {}
    annotations: {}
    nodePorts:
      http:
      https:
    disableHttpPort: false
    extraPorts: []
    loadBalancerIP:
    loadBalancerSourceRanges: []

  chp:
    revisionHistoryLimit:
    containerSecurityContext:
      runAsNonRoot: true
      runAsUser: 65534 # nobody user
      runAsGroup: 65534 # nobody group
      allowPrivilegeEscalation: false
    image:
      name: <redacted>
      tag: "0.2.0" # https://github.com/jupyterhub/configurable-http-proxy/releases
      pullPolicy: IfNotPresent
      pullSecrets: []
    extraCommandLineFlags: []
    livenessProbe:
      enabled: true
      initialDelaySeconds: 60
      periodSeconds: 10
      failureThreshold: 30
      timeoutSeconds: 3
    readinessProbe:
      enabled: true
      initialDelaySeconds: 0
      periodSeconds: 2
      failureThreshold: 1000
      timeoutSeconds: 1
    resources:
      limits:
        cpu: 450m
        memory: 1000Mi
      requests:
        cpu: 250m
        memory: 768Mi
    defaultTarget:
    errorTarget:
    extraEnv: {}
    nodeSelector: {}
    tolerations: []
    networkPolicy:
      enabled: true
      ingress: []
      egress: []
      egressAllowRules:
        cloudMetadataServer: false
        dnsPortsPrivateIPs: true
        nonPrivateIPs: true
        privateIPs: false
      interNamespaceAccessLabels: ignore
      allowedIngressPorts: [http, https]
    pdb:
      enabled: false
      maxUnavailable:
      minAvailable: 1
    extraPodSpec: {}
  # traefik relates to the autohttps pod, which is responsible for TLS
  # termination when proxy.https.type=letsencrypt.
  traefik:
    revisionHistoryLimit:
    containerSecurityContext:
      runAsNonRoot: true
      runAsUser: 65534 # nobody user
      runAsGroup: 65534 # nobody group
      allowPrivilegeEscalation: false
    image:
      name: <redacted>
      # tag is automatically bumped to new patch versions by the
      # watch-dependencies.yaml workflow.
      #
      tag: "0.1.0" # ref: https://hub.docker.com/_/traefik?tab=tags
      pullPolicy:
      pullSecrets: []
    hsts:
      includeSubdomains: false
      preload: false
      maxAge: 15724800 # About 6 months
    resources:
      limits:
        cpu: 450m
        memory: 1000Mi
      requests:
        cpu: 250m
        memory: 768Mi
    labels: {}
    extraInitContainers: []
    extraEnv: {}
    extraVolumes: []
    extraVolumeMounts: []
    extraStaticConfig: {}
    extraDynamicConfig: {}
    nodeSelector: {}
    tolerations: []
    extraPorts: []
    networkPolicy:
      enabled: true
      ingress: []
      egress: []
      egressAllowRules:
        cloudMetadataServer: false
        dnsPortsPrivateIPs: false
        nonPrivateIPs: true
        privateIPs: true
      interNamespaceAccessLabels: ignore
      allowedIngressPorts: [http, https]
    pdb:
      enabled: false
      maxUnavailable:
      minAvailable: 1
    serviceAccount:
      create: true
      name:
      annotations: {}
    extraPodSpec: {}
  secretSync:
    containerSecurityContext:
      runAsNonRoot: true
      runAsUser: 65534 # nobody user
      runAsGroup: 65534 # nobody group
      allowPrivilegeEscalation: false
    image:
      name: <redacted>
      tag: "0.2.0"
      pullPolicy:
      pullSecrets: []
    resources: 
      limits:
        cpu: 450m
        memory: 1000Mi
      requests:
        cpu: 250m
        memory: 768Mi
  labels: {}
  https:
    enabled: false
    type: letsencrypt
    #type: letsencrypt, manual, offload, secret
    letsencrypt:
      contactEmail:
      # Specify custom server here (https://acme-staging-v02.api.letsencrypt.org/directory) to hit staging LE
      acmeServer: https://acme-v02.api.letsencrypt.org/directory
    manual:
      key:
      cert:
    secret:
      name:
      key: tls.key
      crt: tls.crt
    hosts: []


singleuser:
  podNameTemplate:
  extraTolerations: []
  nodeSelector: {}
  extraNodeAffinity:
    required: []
    preferred: []
  extraPodAffinity:
    required: []
    preferred: []
  extraPodAntiAffinity:
    required: []
    preferred: []
  networkTools:
    image:
      name: <redacted>
      tag: "0.2.0"
      pullPolicy:
      pullSecrets: []
    resources: 
      limits:
        cpu: 450m
        memory: 1000Mi
      requests:
        cpu: 250m
        memory: 768Mi
  cloudMetadata:
    # block set to true will append a privileged initContainer using the
    # iptables to block the sensitive metadata server at the provided ip.
    blockWithIptables: true
    ip: 169.254.169.254
  networkPolicy:
    enabled: true
    ingress: []
    egress: []
    egressAllowRules:
      cloudMetadataServer: false
      dnsPortsPrivateIPs: true
      nonPrivateIPs: true
      privateIPs: false
    interNamespaceAccessLabels: ignore
    allowedIngressPorts: []
  events: true
  extraAnnotations: {}
  extraLabels:
    hub.jupyter.org/network-access-hub: "true"
  extraFiles: {}
  extraEnv: {}
  lifecycleHooks: {}
  initContainers: []
  extraContainers: []
  allowPrivilegeEscalation: false
  uid: 1000
  fsGid: 100
  serviceAccountName:
  storage:
    type: dynamic
    extraLabels: {}
    extraVolumes: []
    extraVolumeMounts: []
    static:
      pvcName:
      subPath: "{username}"
    capacity: 1Gi
    homeMountPath: /storage/data/jupyterhub/sunkumar
    dynamic:
      storageClass: longhorn
      pvcNameTemplate: claim-{username}{servername}
      volumeNameTemplate: volume-{username}{servername}
      storageAccessModes: [ReadWriteOnce]
  image:
    name: <redacted>
    tag: "0.3.0"
    pullPolicy:
    pullSecrets: []
  startTimeout: 300
  cpu:
    limit: 2
    guarantee: 1
  memory:
    limit: 2G
    guarantee: 1G
  extraResource:
    limits: {}
    guarantees: {}
  cmd: jupyterhub-singleuser
  defaultUrl:
  extraPodConfig: {}
  profileList: []

# scheduling relates to the user-scheduler pods and user-placeholder pods.
scheduling:
  userScheduler:
    enabled: false
    revisionHistoryLimit:
    replicas: 2
    logLevel: 4
    # plugins are configured on the user-scheduler to make us score how we
    # schedule user pods in a way to help us schedule on the most busy node. By
    # doing this, we help scale down more effectively. It isn't obvious how to
    # enable/disable scoring plugins, and configure them, to accomplish this.
    #
    # plugins ref: https://kubernetes.io/docs/reference/scheduling/config/#scheduling-plugins-1
    # migration ref: https://kubernetes.io/docs/reference/scheduling/config/#scheduler-configuration-migrations
    #
    plugins:
      score:
        # These scoring plugins are enabled by default according to
        # https://kubernetes.io/docs/reference/scheduling/config/#scheduling-plugins
        # 2022-02-22.
        #
        # Enabled with high priority:
        # - NodeAffinity
        # - InterPodAffinity
        # - NodeResourcesFit
        # - ImageLocality
        # Remains enabled with low default priority:
        # - TaintToleration
        # - PodTopologySpread
        # - VolumeBinding
        # Disabled for scoring:
        # - NodeResourcesBalancedAllocation
        #
        disabled:
          # We disable these plugins (with regards to scoring) to not interfere
          # or complicate our use of NodeResourcesFit.
          - name: NodeResourcesBalancedAllocation
          # Disable plugins to be allowed to enable them again with a different
          # weight and avoid an error.
          - name: NodeAffinity
          - name: InterPodAffinity
          - name: NodeResourcesFit
          - name: ImageLocality
        enabled:
          - name: NodeAffinity
            weight: 14631
          - name: InterPodAffinity
            weight: 1331
          - name: NodeResourcesFit
            weight: 121
          - name: ImageLocality
            weight: 11
    pluginConfig:
      # Here we declare that we should optimize pods to fit based on a
      # MostAllocated strategy instead of the default LeastAllocated.
      - name: NodeResourcesFit
        args:
          scoringStrategy:
            resources:
              - name: cpu
                weight: 1
              - name: memory
                weight: 1
            type: MostAllocated
    containerSecurityContext:
      runAsNonRoot: true
      runAsUser: 65534 # nobody user
      runAsGroup: 65534 # nobody group
      allowPrivilegeEscalation: false
    image:

      name: <redacted>
      tag: "v1.24.8" 
      pullPolicy: IfNotPresent
      pullSecrets: []
    nodeSelector: {}
    tolerations: []
    labels: {}
    annotations: {}
    pdb:
      enabled: true
      maxUnavailable: 1
      minAvailable:
    resources: 
      limits:
        cpu: 450m
        memory: 1000Mi
      requests:
        cpu: 250m
        memory: 768Mi
    serviceAccount:
      create: true
      name:
      annotations: {}
    extraPodSpec: {}
  podPriority:
    enabled: false
    globalDefault: false
    defaultPriority: 0
    imagePullerPriority: -5
    userPlaceholderPriority: -10
  userPlaceholder:
    enabled: false
    image:
      name: <redacted>
      # tag is automatically bumped to new patch versions by the
      # watch-dependencies.yaml workflow.
      #
      # If you update this, also update prePuller.pause.image.tag
      #
      tag: "3.7"
      pullPolicy: IfNotPresent
      pullSecrets: []
    livenessProbe:
    # The livenessProbe's aim to give JupyterHub sufficient time to startup but
    # be able to restart if it becomes unresponsive for ~5 min.
      enabled: true
      initialDelaySeconds: 300
      periodSeconds: 10
      failureThreshold: 30
      timeoutSeconds: 3
    readinessProbe:
    # The readinessProbe's aim is to provide a successful startup indication,
    # but following that never become unready before its livenessProbe fail and
    # restarts it if needed. To become unready following startup serves no
    # purpose as there are no other pod to fallback to in our non-HA deployment.
      enabled: true
      initialDelaySeconds: 0
      periodSeconds: 2
      failureThreshold: 1000
      timeoutSeconds: 1
    revisionHistoryLimit:
    replicas: 0
    labels: {}
    annotations: {}
    containerSecurityContext:
      runAsNonRoot: true
      runAsUser: 65534 # nobody user
      runAsGroup: 65534 # nobody group
      allowPrivilegeEscalation: false
    resources: 
      limits:
        cpu: 450m
        memory: 1000Mi
      requests:
        cpu: 250m
        memory: 768Mi
  corePods:
    tolerations:
      - key: hub.jupyter.org/dedicated
        operator: Equal
        value: core
        effect: NoSchedule
      - key: hub.jupyter.org_dedicated
        operator: Equal
        value: core
        effect: NoSchedule
    nodeAffinity:
      matchNodePurpose: prefer
  userPods:
    tolerations:
      - key: hub.jupyter.org/dedicated
        operator: Equal
        value: user
        effect: NoSchedule
      - key: hub.jupyter.org_dedicated
        operator: Equal
        value: user
        effect: NoSchedule
    nodeAffinity:
      matchNodePurpose: prefer

# prePuller relates to the hook|continuous-image-puller DaemonsSets
prePuller:
  revisionHistoryLimit:
  labels: {}
  annotations: {}
  resources:
    limits:
      cpu: 450m
      memory: 1000Mi
    requests:
      cpu: 250m
      memory: 768Mi
  containerSecurityContext:
    runAsNonRoot: true
    runAsUser: 65534 # nobody user
    runAsGroup: 65534 # nobody group
    allowPrivilegeEscalation: false
  extraTolerations: []
  # hook relates to the hook-image-awaiter Job and hook-image-puller DaemonSet
  hook:
    enabled: false
    pullOnlyOnChanges: true
    image:
      name: <redacted>
      tag: "0.1.0"
      pullPolicy: IfNotPresent
      pullSecrets: []
    containerSecurityContext:
      runAsNonRoot: true
      runAsUser: 65534 # nobody user
      runAsGroup: 65534 # nobody group
      allowPrivilegeEscalation: false
    livenessProbe:
      enabled: true
      initialDelaySeconds: 300
      periodSeconds: 10
      failureThreshold: 30
      timeoutSeconds: 3
    readinessProbe:
      enabled: true
      initialDelaySeconds: 0
      periodSeconds: 2
      failureThreshold: 1000
      timeoutSeconds: 1
    podSchedulingWaitDuration: 10
    nodeSelector: {}
    tolerations: []
    resources:
      limits:
        cpu: 450m
        memory: 1000Mi
      requests:
        cpu: 250m
        memory: 768Mi
    serviceAccount:
      create: true
      name:
      annotations: {}
  continuous:
    enabled: false
  pullProfileListImages: true
  extraImages: {}
  pause:
    containerSecurityContext:
      runAsNonRoot: true
      runAsUser: 65534 # nobody user
      runAsGroup: 65534 # nobody group
      allowPrivilegeEscalation: false
    image:
      name: <redacted>
      tag: "3.7"
      pullPolicy: IfNotPresent
      pullSecrets: []

ingress:
  enabled: true
  annotations:
    ingress.kubernetes.io/rewrite-target: /
    ingress.kubernetes.io/ingress.allow-http: "true"
  ingressClassName:
  hosts:
    - kubeingress.xyz.com<redacted>
  pathSuffix:
  pathType: Prefix
  tls: []

cull:
  enabled: true
  users: false # --cull-users
  adminUsers: true # --cull-admin-users
  removeNamedServers: false # --remove-named-servers
  timeout: 3600 # --timeout
  every: 600 # --cull-every
  concurrency: 10 # --concurrency
  maxAge: 0 # --max-age

debug:
  enabled: true

global:
  safeToShowValues: false

minrk · March 22, 2023, 11:49am

From your network policy config:

SUN_KUMAR:

  networkPolicy:
    enabled: true
    ingress: []
    egress: []
    egressAllowRules:
      cloudMetadataServer: false
      dnsPortsPrivateIPs: true
      nonPrivateIPs: true
      privateIPs: false
    interNamespaceAccessLabels: ignore
    allowedIngressPorts: []

it looks like you’re explicitly banning the Hub from accessing anything private to the cluster, which means it won’t be able to access the kubernetes API, which it needs to function. The Hub probably needs privateIPs: true in general, but you might be able to get it to work if you add a single egress rule for the kubernetes API server (I’m not exactly sure what that should be)

SUN_KUMAR · March 24, 2023, 10:32am

Thanks @minrk adding egress netpolicy separately resolved this issue

carlos_ww · May 11, 2023, 8:44am

Hello， We have encountered similar issues. Can you provide us with the configuration you modifi？thank you!

Topic		Replies	Views
Spawner crashing the hub Zero to JupyterHub on Kubernetes jupyterhub	1	788	September 20, 2022
Single user server failed to connect hub API JupyterHub jupyterhub	0	262	March 5, 2024
Cannot spawn server for new user Zero to JupyterHub on Kubernetes help-wanted	28	570	April 17, 2024
JupyterHub spawns not actually launching Zero to JupyterHub on Kubernetes help-wanted	15	4233	June 16, 2023
Not able to start the server Zero to JupyterHub on Kubernetes	3	2156	April 26, 2020

Hub is failing to start single user server

Related topics