Spawn failed: Could not create PVC claim-foobar

Hi there - where can I start to find out why JupyterHub couldn’t make a PVC for a test user? This is with dummy auth turned on. All pods are healthy, using k8s 1.27, Cilium (CNI), MetalLB (load balancer), Longhorn (CSI).

My config.yaml used in Helm:

singleuser:
  storage:
    capacity: 100Mi
    dynamic:
      storageClass: longhorn
Logs

hub pod logs

[W 2024-01-02 23:11:37.079 JupyterHub user:876] foobar's server failed to start in 300 seconds, giving up.

    Common causes of this timeout, and debugging tips:

    1. Everything is working, but it took too long.
       To fix: increase `Spawner.start_timeout` configuration
       to a number of seconds that is enough for spawners to finish starting.
    2. The server didn't finish starting,
       or it crashed due to a configuration issue.
       Check the single-user server's logs for hints at what needs fixing.

[W 2024-01-02 23:11:37.117 JupyterHub base:1057] 3 consecutive spawns failed.  Hub will exit if failure count reaches 5 before succeeding
[E 2024-01-02 23:11:37.117 JupyterHub gen:630] Exception in Future <Task finished name='Task-3582' coro=<BaseHandler.spawn_single_user.<locals>.finish_user_spawn() done, defined at /usr/local/lib/python3.11/site-packages/jupyterhub/handlers/base.py:981> exception=TimeoutError('Could not create PVC claim-foobar')> after timeout
    Traceback (most recent call last):
      File "/usr/local/lib/python3.11/site-packages/tornado/gen.py", line 625, in error_callback
        future.result()
      File "/usr/local/lib/python3.11/site-packages/jupyterhub/handlers/base.py", line 988, in finish_user_spawn
        await spawn_future
      File "/usr/local/lib/python3.11/site-packages/jupyterhub/user.py", line 902, in spawn
        raise e
      File "/usr/local/lib/python3.11/site-packages/jupyterhub/user.py", line 798, in spawn
        url = await gen.with_timeout(timedelta(seconds=spawner.start_timeout), f)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/usr/local/lib/python3.11/site-packages/kubespawner/spawner.py", line 2706, in _start
        await exponential_backoff(
      File "/usr/local/lib/python3.11/site-packages/jupyterhub/utils.py", line 237, in exponential_backoff
        raise asyncio.TimeoutError(fail_message)
    TimeoutError: Could not create PVC claim-foobar

Longhorn logs

I1231 21:45:57.666042       1 volume_store.go:97] Starting save volume queue
I1231 21:45:57.767083       1 controller.go:860] Started provisioner controller driver.longhorn.io_csi-provisioner-65cb5cc4ff-lpxz4_0dfa0311-522c-4a9d-9902-7552d5123113!
I0102 22:44:26.849063       1 controller.go:1337] provision "quant/hub-db-dir" class "longhorn": started
W0102 22:44:26.858453       1 controller.go:620] "fstype" is deprecated and will be removed in a future release, please use "csi.storage.k8s.io/fstype" instead
I0102 22:44:26.858494       1 event.go:285] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"quant", Name:"hub-db-dir", UID:"1a7dd209-570f-42b7-90cf-99184e84ed58", APIVersion:"v1", ResourceVersion:"2649558", FieldPath:""}): type: 'Normal' reason: 'Provisioning' External provisioner is provisioning volume for claim "quant/hub-db-dir"
I0102 22:44:29.065338       1 controller.go:923] successfully created PV pvc-1a7dd209-570f-42b7-90cf-99184e84ed58 for PVC hub-db-dir and csi volume name pvc-1a7dd209-570f-42b7-90cf-99184e84ed58
I0102 22:44:29.065375       1 controller.go:1420] provision "quant/hub-db-dir" class "longhorn": volume "pvc-1a7dd209-570f-42b7-90cf-99184e84ed58" provisioned
I0102 22:44:29.065393       1 controller.go:1433] provision "quant/hub-db-dir" class "longhorn": succeeded
I0102 22:44:29.080611       1 event.go:285] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"quant", Name:"hub-db-dir", UID:"1a7dd209-570f-42b7-90cf-99184e84ed58", APIVersion:"v1", ResourceVersion:"2649558", FieldPath:""}): type: 'Normal' reason: 'ProvisioningSucceeded' Successfully provisioned volume pvc-1a7dd209-570f-42b7-90cf-99184e84ed58

Z2JH assumes you’ve got a fully working Kubernetes cluster:

Is your dynamic volume provisioner (longhorn) definitely working properly? What happens if you manually creating a minimal PVC and pod manifest (outside of Z2JH)- do you get a new volume?

What does kubectl describe pvc claim-foobar show?

hey, cluster is healthy and the CSI (longhorn) has no failed pods, and i can see PVs are attached for other unrelated projects like Airflow

claim-foobar doesn’t exist in the namespace JH is running in. The only PVC is hub-db-dir

(so volumes are definitely bound as far as i can tell)

Appreciate any thoughts at all

Can you turn on debug logging, and show us the full hub logs after a restart?

Done and logs below :slight_smile:

Logs: first login as foobarbaz (no mentiom about PVC failure in UI)
[root@kube1-lon ~]# k logs -nquant hub-c58f764b8-2smqz
[D 2024-01-04 19:32:06.372 JupyterHub application:908] Looking for /usr/local/etc/jupyterhub/jupyterhub_config in /srv/jupyterhub
Loading /usr/local/etc/jupyterhub/secret/values.yaml
No config at /usr/local/etc/jupyterhub/existing-secret/values.yaml
[D 2024-01-04 19:32:06.683 JupyterHub application:929] Loaded config file: /usr/local/etc/jupyterhub/jupyterhub_config.py
[I 2024-01-04 19:32:06.704 JupyterHub app:2859] Running JupyterHub version 4.0.2
[I 2024-01-04 19:32:06.704 JupyterHub app:2889] Using Authenticator: jupyterhub.auth.DummyAuthenticator-4.0.2
[I 2024-01-04 19:32:06.704 JupyterHub app:2889] Using Spawner: kubespawner.spawner.KubeSpawner-6.2.0
[I 2024-01-04 19:32:06.704 JupyterHub app:2889] Using Proxy: jupyterhub.proxy.ConfigurableHTTPProxy-4.0.2
[D 2024-01-04 19:32:06.713 JupyterHub app:1833] Connecting to db: sqlite:///jupyterhub.sqlite
[D 2024-01-04 19:32:06.731 JupyterHub orm:1016] database schema version found: 0eee8c825d24
[D 2024-01-04 19:32:06.737 JupyterHub orm:1016] database schema version found: 0eee8c825d24
[D 2024-01-04 19:32:06.786 JupyterHub app:2104] Loading roles into database
[I 2024-01-04 19:32:06.795 JupyterHub app:1984] Not using allowed_users. Any authenticated user will be allowed.
[D 2024-01-04 19:32:06.821 JupyterHub app:2343] Purging expired APITokens
[D 2024-01-04 19:32:06.823 JupyterHub app:2343] Purging expired OAuthCodes
[D 2024-01-04 19:32:06.824 JupyterHub app:2179] Loading role assignments from config
[D 2024-01-04 19:32:06.838 JupyterHub app:2502] Initializing spawners
[D 2024-01-04 19:32:06.839 JupyterHub app:2633] Loaded users:

[I 2024-01-04 19:32:06.839 JupyterHub app:2928] Initialized 0 spawners in 0.002 seconds
[I 2024-01-04 19:32:06.843 JupyterHub metrics:278] Found 2 active users in the last ActiveUserPeriods.twenty_four_hours
[I 2024-01-04 19:32:06.844 JupyterHub metrics:278] Found 6 active users in the last ActiveUserPeriods.seven_days
[I 2024-01-04 19:32:06.844 JupyterHub metrics:278] Found 6 active users in the last ActiveUserPeriods.thirty_days
[I 2024-01-04 19:32:06.844 JupyterHub app:3142] Not starting proxy
[D 2024-01-04 19:32:06.845 JupyterHub proxy:880] Proxy: Fetching GET http://proxy-api:8001/api/routes
[D 2024-01-04 19:32:06.851 JupyterHub proxy:953] Omitting non-jupyterhub route '/'
[I 2024-01-04 19:32:06.851 JupyterHub app:3178] Hub API listening on http://:8081/hub/
[I 2024-01-04 19:32:06.851 JupyterHub app:3180] Private Hub API connect url http://hub:8081/hub/
[I 2024-01-04 19:32:06.851 JupyterHub app:3189] Starting managed service jupyterhub-idle-culler
[I 2024-01-04 19:32:06.851 JupyterHub service:385] Starting service 'jupyterhub-idle-culler': ['python3', '-m', 'jupyterhub_idle_culler', '--url=http://localhost:8081/hub/api', '--timeout=3600', '--cull-every=600', '--concurrency=10']
[I 2024-01-04 19:32:06.852 JupyterHub service:133] Spawning python3 -m jupyterhub_idle_culler --url=http://localhost:8081/hub/api --timeout=3600 --cull-every=600 --concurrency=10
[D 2024-01-04 19:32:06.853 JupyterHub spawner:1384] Polling subprocess every 30s
[D 2024-01-04 19:32:06.854 JupyterHub proxy:389] Fetching routes to check
[D 2024-01-04 19:32:06.854 JupyterHub proxy:880] Proxy: Fetching GET http://proxy-api:8001/api/routes
[D 2024-01-04 19:32:06.855 JupyterHub proxy:953] Omitting non-jupyterhub route '/'
[D 2024-01-04 19:32:06.855 JupyterHub proxy:392] Checking routes
[I 2024-01-04 19:32:06.855 JupyterHub proxy:477] Adding route for Hub: / => http://hub:8081
[D 2024-01-04 19:32:06.855 JupyterHub proxy:880] Proxy: Fetching POST http://proxy-api:8001/api/routes/
[I 2024-01-04 19:32:06.858 JupyterHub app:3247] JupyterHub is now running, internal Hub API at http://hub:8081/hub/
[D 2024-01-04 19:32:06.859 JupyterHub app:2852] It took 0.489 seconds for the Hub to start
[D 2024-01-04 19:32:07.115 JupyterHub log:191] 200 GET /hub/health (@10.3.132.180) 1.01ms
[D 2024-01-04 19:32:07.358 JupyterHub base:299] Recording first activity for <APIToken('1d7d...', service='jupyterhub-idle-culler', client_id='jupyterhub')>
[I 2024-01-04 19:32:07.370 JupyterHub log:191] 200 GET /hub/api/ (jupyterhub-idle-culler@::1) 14.53ms
[D 2024-01-04 19:32:07.372 JupyterHub scopes:877] Checking access to /hub/api/users via scope list:users
[D 2024-01-04 19:32:07.373 JupyterHub scopes:677] Unrestricted access to /hub/api/users via list:users
[I 2024-01-04 19:32:07.377 JupyterHub log:191] 200 GET /hub/api/users?state=[secret] (jupyterhub-idle-culler@::1) 5.91ms
[D 2024-01-04 19:32:09.116 JupyterHub log:191] 200 GET /hub/health (@10.3.132.180) 0.93ms
[D 2024-01-04 19:32:09.641 JupyterHub base:344] Refreshing auth for foobar1
[I 2024-01-04 19:32:09.644 JupyterHub login:44] User logged out: foobar1
[I 2024-01-04 19:32:09.647 JupyterHub log:191] 302 GET /hub/logout -> /hub/login (@::ffff:10.3.129.170) 25.84ms
[I 2024-01-04 19:32:09.678 JupyterHub log:191] 200 GET /hub/login (@::ffff:10.3.129.170) 21.53ms
[D 2024-01-04 19:32:11.115 JupyterHub log:191] 200 GET /hub/health (@10.3.132.180) 0.76ms
[D 2024-01-04 19:32:13.115 JupyterHub log:191] 200 GET /hub/health
[... omitted more lines for brevity]
[D 2024-01-04 19:32:19.115 JupyterHub log:191] 200 GET /hub/health (@10.3.132.180) 0.91ms
[D 2024-01-04 19:32:20.279 JupyterHub roles:281] Assigning default role to User foobarbaz
[I 2024-01-04 19:32:20.282 JupyterHub roles:238] Adding role user for User: foobarbaz
[D 2024-01-04 19:32:20.307 JupyterHub roles:281] Assigning default role to User foobarbaz
[D 2024-01-04 19:32:20.310 JupyterHub base:587] Setting cookie jupyterhub-session-id: {'httponly': True, 'path': '/'}
[D 2024-01-04 19:32:20.310 JupyterHub base:591] Setting cookie for foobarbaz: jupyterhub-hub-login
[D 2024-01-04 19:32:20.310 JupyterHub base:587] Setting cookie jupyterhub-hub-login: {'httponly': True, 'path': '/hub/'}
[I 2024-01-04 19:32:20.310 JupyterHub base:837] User logged in: foobarbaz
[D 2024-01-04 19:32:20.310 JupyterHub user:431] Creating <class 'kubespawner.spawner.KubeSpawner'> for foobarbaz:
[I 2024-01-04 19:32:20.313 JupyterHub log:191] 302 POST /hub/login?next= -> /hub/spawn (foobarbaz@::ffff:10.3.129.170) 39.16ms
[D 2024-01-04 19:32:20.322 JupyterHub base:299] Recording first activity for <User(foobarbaz 0/1 running)>
[D 2024-01-04 19:32:20.333 JupyterHub scopes:877] Checking access to /hub/spawn via scope servers
[D 2024-01-04 19:32:20.333 JupyterHub scopes:690] Argument-based access to /hub/spawn via servers
[D 2024-01-04 19:32:20.333 JupyterHub pages:217] Triggering spawn with default options for foobarbaz
[D 2024-01-04 19:32:20.334 JupyterHub base:961] Initiating spawn for foobarbaz
[D 2024-01-04 19:32:20.334 JupyterHub base:965] 0/64 concurrent spawns
[D 2024-01-04 19:32:20.334 JupyterHub base:970] 0 active servers
[I 2024-01-04 19:32:20.365 JupyterHub provider:659] Creating oauth client jupyterhub-user-foobarbaz
[D 2024-01-04 19:32:20.397 JupyterHub user:794] Calling Spawner.start for foobarbaz
[I 2024-01-04 19:32:20.399 JupyterHub log:191] 302 GET /hub/spawn -> /hub/spawn-pending/foobarbaz (foobarbaz@::ffff:10.3.129.170) 78.65ms
[D 2024-01-04 19:32:20.408 JupyterHub scopes:877] Checking access to /hub/spawn-pending/foobarbaz via scope servers
[D 2024-01-04 19:32:20.409 JupyterHub scopes:690] Argument-based access to /hub/spawn-pending/foobarbaz via servers
[I 2024-01-04 19:32:20.409 JupyterHub pages:398] foobarbaz is pending spawn
[I 2024-01-04 19:32:20.411 JupyterHub log:191] 200 GET /hub/spawn-pending/foobarbaz (foobarbaz@::ffff:10.3.129.170) 5.65ms
[D 2024-01-04 19:32:20.465 JupyterHub scopes:877] Checking access to /hub/api/users/foobarbaz/server/progress via scope read:servers
[D 2024-01-04 19:32:20.465 JupyterHub scopes:690] Argument-based access to /hub/api/users/foobarbaz/server/progress via read:servers
[D 2024-01-04 19:32:21.115 JupyterHub log:191] 200 GET /hub/health 
[... omitted more lines for brevity]
[D 2024-01-04 19:33:05.115 JupyterHub log:191] 200 GET /hub/health (@10.3.132.180) 0.73ms
[D 2024-01-04 19:33:06.859 JupyterHub proxy:880] Proxy: Fetching GET http://proxy-api:8001/api/routes
[D 2024-01-04 19:33:06.863 JupyterHub proxy:392] Checking routes
[D 2024-01-04 19:33:07.115 JupyterHub log:191] 200 GET /hub/health (@10.3.132.180) 0.78ms
[... omitted more lines for brevity]
(@10.3.132.180) 0.78ms
[E 2024-01-04 19:33:21.025 JupyterHub reflector:397] Initial list of events failed
    Traceback (most recent call last):
      File "/usr/local/lib/python3.11/site-packages/kubespawner/reflector.py", line 395, in start
        await self._list_and_update()
      File "/usr/local/lib/python3.11/site-packages/kubespawner/reflector.py", line 231, in _list_and_update
        initial_resources_raw = await list_method(**kwargs)
                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/usr/local/lib/python3.11/site-packages/kubernetes_asyncio/client/api_client.py", line 185, in __call_api
        response_data = await self.request(
                        ^^^^^^^^^^^^^^^^^^^
      File "/usr/local/lib/python3.11/site-packages/kubernetes_asyncio/client/rest.py", line 196, in GET
        return (await self.request("GET", url,
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/usr/local/lib/python3.11/site-packages/kubernetes_asyncio/client/rest.py", line 180, in request
        r = await self.pool_manager.request(**args)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/usr/local/lib/python3.11/site-packages/aiohttp/client.py", line 500, in _request
        with timer:
      File "/usr/local/lib/python3.11/site-packages/aiohttp/helpers.py", line 735, in __exit__
        raise asyncio.TimeoutError from None
    TimeoutError

[E 2024-01-04 19:33:21.028 JupyterHub spawner:2414] Reflector with key ('events', 'quant') failed to start.
    Traceback (most recent call last):
      File "/usr/local/lib/python3.11/site-packages/kubespawner/spawner.py", line 2412, in catch_reflector_start
        await func
      File "/usr/local/lib/python3.11/site-packages/kubespawner/reflector.py", line 395, in start
        await self._list_and_update()
      File "/usr/local/lib/python3.11/site-packages/kubespawner/reflector.py", line 231, in _list_and_update
        initial_resources_raw = await list_method(**kwargs)
                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/usr/local/lib/python3.11/site-packages/kubernetes_asyncio/client/api_client.py", line 185, in __call_api
        response_data = await self.request(
                        ^^^^^^^^^^^^^^^^^^^
      File "/usr/local/lib/python3.11/site-packages/kubernetes_asyncio/client/rest.py", line 196, in GET
        return (await self.request("GET", url,
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/usr/local/lib/python3.11/site-packages/kubernetes_asyncio/client/rest.py", line 180, in request
        r = await self.pool_manager.request(**args)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/usr/local/lib/python3.11/site-packages/aiohttp/client.py", line 500, in _request
        with timer:
      File "/usr/local/lib/python3.11/site-packages/aiohttp/helpers.py", line 735, in __exit__
        raise asyncio.TimeoutError from None
    TimeoutError

[E 2024-01-04 19:33:21.030 JupyterHub reflector:397] Initial list of pods failed
    Traceback (most recent call last):
      File "/usr/local/lib/python3.11/site-packages/kubespawner/reflector.py", line 395, in start
        await self._list_and_update()
      File "/usr/local/lib/python3.11/site-packages/kubespawner/reflector.py", line 231, in _list_and_update
        initial_resources_raw = await list_method(**kwargs)
                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/usr/local/lib/python3.11/site-packages/kubernetes_asyncio/client/api_client.py", line 185, in __call_api
        response_data = await self.request(
                        ^^^^^^^^^^^^^^^^^^^
      File "/usr/local/lib/python3.11/site-packages/kubernetes_asyncio/client/rest.py", line 196, in GET
        return (await self.request("GET", url,
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/usr/local/lib/python3.11/site-packages/kubernetes_asyncio/client/rest.py", line 180, in request
        r = await self.pool_manager.request(**args)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/usr/local/lib/python3.11/site-packages/aiohttp/client.py", line 500, in _request
        with timer:
      File "/usr/local/lib/python3.11/site-packages/aiohttp/helpers.py", line 735, in __exit__
        raise asyncio.TimeoutError from None
    TimeoutError

[E 2024-01-04 19:33:21.031 JupyterHub spawner:2414] Reflector with key ('pods', 'quant') failed to start.
    Traceback (most recent call last):
      File "/usr/local/lib/python3.11/site-packages/kubespawner/spawner.py", line 2412, in catch_reflector_start
        await func
      File "/usr/local/lib/python3.11/site-packages/kubespawner/reflector.py", line 395, in start
        await self._list_and_update()
      File "/usr/local/lib/python3.11/site-packages/kubespawner/reflector.py", line 231, in _list_and_update
        initial_resources_raw = await list_method(**kwargs)
                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/usr/local/lib/python3.11/site-packages/kubernetes_asyncio/client/api_client.py", line 185, in __call_api
        response_data = await self.request(
                        ^^^^^^^^^^^^^^^^^^^
      File "/usr/local/lib/python3.11/site-packages/kubernetes_asyncio/client/rest.py", line 196, in GET
        return (await self.request("GET", url,
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/usr/local/lib/python3.11/site-packages/kubernetes_asyncio/client/rest.py", line 180, in request
        r = await self.pool_manager.request(**args)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/usr/local/lib/python3.11/site-packages/aiohttp/client.py", line 500, in _request
        with timer:
      File "/usr/local/lib/python3.11/site-packages/aiohttp/helpers.py", line 735, in __exit__
        raise asyncio.TimeoutError from None
    TimeoutError

[E 2024-01-04 19:33:21.032 JupyterHub web:1871] Uncaught exception GET /hub/api/users/foobarbaz/server/progress?_xsrf=2%7C0e27187c%7Ca89dca201f228530c3d23565d8fa1cf7%7C1704396729 (::ffff:10.3.129.170)
    HTTPServerRequest(protocol='http', host='10.3.209.0', method='GET', uri='/hub/api/users/foobarbaz/server/progress?_xsrf=2%7C0e27187c%7Ca89dca201f228530c3d23565d8fa1cf7%7C1704396729', version='HTTP/1.1', remote_ip='::ffff:10.3.129.170')
    Traceback (most recent call last):
      File "/usr/local/lib/python3.11/site-packages/tornado/web.py", line 1786, in _execute
        result = await result
                 ^^^^^^^^^^^^
      File "/usr/local/lib/python3.11/site-packages/jupyterhub/apihandlers/users.py", line 752, in get
        async for event in events:
      File "/usr/local/lib/python3.11/site-packages/jupyterhub/utils.py", line 623, in iterate_until
        yield item_future.result()
              ^^^^^^^^^^^^^^^^^^^^
      File "/usr/local/lib/python3.11/site-packages/jupyterhub/spawner.py", line 1274, in _generate_progress
        async for event in progress:
      File "/usr/local/lib/python3.11/site-packages/kubespawner/spawner.py", line 2301, in progress
        await self._start_watching_events()
      File "/usr/local/lib/python3.11/site-packages/kubespawner/spawner.py", line 2444, in _start_watching_events
        return await self._start_reflector(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/usr/local/lib/python3.11/site-packages/kubespawner/spawner.py", line 2380, in _start_reflector
        await previous_reflector.first_load_future
      File "/usr/local/lib/python3.11/site-packages/kubespawner/reflector.py", line 395, in start
        await self._list_and_update()
      File "/usr/local/lib/python3.11/site-packages/kubespawner/reflector.py", line 231, in _list_and_update
        initial_resources_raw = await list_method(**kwargs)
                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/usr/local/lib/python3.11/site-packages/kubernetes_asyncio/client/api_client.py", line 185, in __call_api
        response_data = await self.request(
                        ^^^^^^^^^^^^^^^^^^^
      File "/usr/local/lib/python3.11/site-packages/kubernetes_asyncio/client/rest.py", line 196, in GET
        return (await self.request("GET", url,
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/usr/local/lib/python3.11/site-packages/kubernetes_asyncio/client/rest.py", line 180, in request
        r = await self.pool_manager.request(**args)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/usr/local/lib/python3.11/site-packages/aiohttp/client.py", line 500, in _request
        with timer:
      File "/usr/local/lib/python3.11/site-packages/aiohttp/helpers.py", line 735, in __exit__
        raise asyncio.TimeoutError from None
    TimeoutError

[E 2024-01-04 19:33:21.035 JupyterHub web:1274] Cannot send error response after headers written
[I 2024-01-04 19:33:21.035 JupyterHub log:191] 200 GET /hub/api/users/foobarbaz/server/progress?_xsrf=[secret] (foobarbaz@::ffff:10.3.129.170) 60571.84ms
[W 2024-01-04 19:33:21.036 JupyterHub user:876] foobarbaz's server failed to start in 300 seconds, giving up.

    Common causes of this timeout, and debugging tips:

    1. Everything is working, but it took too long.
       To fix: increase `Spawner.start_timeout` configuration
       to a number of seconds that is enough for spawners to finish starting.
    2. The server didn't finish starting,
       or it crashed due to a configuration issue.
       Check the single-user server's logs for hints at what needs fixing.

[D 2024-01-04 19:33:21.036 JupyterHub user:982] Stopping foobarbaz
[D 2024-01-04 19:33:21.043 JupyterHub user:1002] Deleting oauth client jupyterhub-user-foobarbaz
[D 2024-01-04 19:33:21.066 JupyterHub user:1005] Finished stopping foobarbaz
[E 2024-01-04 19:33:21.092 JupyterHub gen:630] Exception in Future <Task finished name='Task-47' coro=<BaseHandler.spawn_single_user.<locals>.finish_user_spawn() done, defined at /usr/local/lib/python3.11/site-packages/jupyterhub/handlers/base.py:981> exception=TimeoutError()> after timeout
    Traceback (most recent call last):
      File "/usr/local/lib/python3.11/site-packages/tornado/gen.py", line 625, in error_callback
        future.result()
      File "/usr/local/lib/python3.11/site-packages/jupyterhub/handlers/base.py", line 988, in finish_user_spawn
        await spawn_future
      File "/usr/local/lib/python3.11/site-packages/jupyterhub/user.py", line 902, in spawn
        raise e
      File "/usr/local/lib/python3.11/site-packages/jupyterhub/user.py", line 798, in spawn
        url = await gen.with_timeout(timedelta(seconds=spawner.start_timeout), f)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/usr/local/lib/python3.11/site-packages/tornado/web.py", line 1786, in _execute
        result = await result
                 ^^^^^^^^^^^^
      File "/usr/local/lib/python3.11/site-packages/jupyterhub/apihandlers/users.py", line 752, in get
        async for event in events:
      File "/usr/local/lib/python3.11/site-packages/jupyterhub/utils.py", line 623, in iterate_until
        yield item_future.result()
              ^^^^^^^^^^^^^^^^^^^^
      File "/usr/local/lib/python3.11/site-packages/jupyterhub/spawner.py", line 1274, in _generate_progress
        async for event in progress:
      File "/usr/local/lib/python3.11/site-packages/kubespawner/spawner.py", line 2301, in progress
        await self._start_watching_events()
      File "/usr/local/lib/python3.11/site-packages/kubespawner/spawner.py", line 2444, in _start_watching_events
        return await self._start_reflector(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/usr/local/lib/python3.11/site-packages/kubespawner/spawner.py", line 2380, in _start_reflector
        await previous_reflector.first_load_future
      File "/usr/local/lib/python3.11/site-packages/kubespawner/reflector.py", line 395, in start
        await self._list_and_update()
      File "/usr/local/lib/python3.11/site-packages/kubespawner/reflector.py", line 231, in _list_and_update
        initial_resources_raw = await list_method(**kwargs)
                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/usr/local/lib/python3.11/site-packages/kubernetes_asyncio/client/api_client.py", line 185, in __call_api
        response_data = await self.request(
                        ^^^^^^^^^^^^^^^^^^^
      File "/usr/local/lib/python3.11/site-packages/kubernetes_asyncio/client/rest.py", line 196, in GET
        return (await self.request("GET", url,
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/usr/local/lib/python3.11/site-packages/kubernetes_asyncio/client/rest.py", line 180, in request
        r = await self.pool_manager.request(**args)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/usr/local/lib/python3.11/site-packages/aiohttp/client.py", line 500, in _request
        with timer:
      File "/usr/local/lib/python3.11/site-packages/aiohttp/helpers.py", line 735, in __exit__
        raise asyncio.TimeoutError from None
    TimeoutError

[D 2024-01-04 19:33:21.115 JupyterHub log:191] 200 GET /hub/health (@10.3.132.180) 0.52ms
[D 2024-01-04 19:33:23.115 JupyterHub log:191] 200 GET /hub/health (@10.3.132.180) 0.80ms
[D 2024-01-04 19:33:24.057 JupyterHub scopes:877] Checking access to /hub/api/users/foobarbaz/server/progress via scope read:servers
[D 2024-01-04 19:33:24.057 JupyterHub scopes:690] Argument-based access to /hub/api/users/foobarbaz/server/progress via read:servers
[I 2024-01-04 19:33:24.058 JupyterHub log:191] 200 GET /hub/api/users/foobarbaz/server/progress?_xsrf=[secret] (foobarbaz@::ffff:10.3.129.170) 4.33ms
[D 2024-01-04 19:33:25.116 JupyterHub log:191] 200 GET /hub/health
[... omitted more lines for brevity]
[D 2024-01-04 19:34:05.115 JupyterHub log:191] 200 GET /hub/health (@10.3.132.180) 0.68ms
[D 2024-01-04 19:34:06.859 JupyterHub proxy:880] Proxy: Fetching GET http://proxy-api:8001/api/routes
[D 2024-01-04 19:34:06.863 JupyterHub proxy:392] Checking routes
[D 2024-01-04 19:34:07.116 JupyterHub log:191] 200 GET /hub/health (@10.3.132.180) 0.82ms
[... omitted more lines for brevity]
[D 2024-01-04 19:35:05.115 JupyterHub log:191] 200 GET /hub/health (@10.3.132.180) 0.71ms
[D 2024-01-04 19:35:06.858 JupyterHub proxy:880] Proxy: Fetching GET http://proxy-api:8001/api/routes
[D 2024-01-04 19:35:06.862 JupyterHub proxy:392] Checking routes
[D 2024-01-04 19:35:07.115 JupyterHub log:191] 200 GET /hub/health (@10.3.132.180) 0.76ms

Logs: clicking 'Start My Server' (PVC failure mentioned in UI)
[D 2024-01-04 19:36:06.859 JupyterHub proxy:880] Proxy: Fetching GET http://proxy-api:8001/api/routes
[D 2024-01-04 19:36:06.863 JupyterHub proxy:392] Checking routes
[D 2024-01-04 19:36:07.115 JupyterHub log:191] 200 GET /hub/health (@10.3.132.180) 0.73ms
[... omitted more lines for brevity]
(@10.3.132.180) 0.93ms
[D 2024-01-04 19:37:05.117 JupyterHub log:191] 200 GET /hub/health (@10.3.132.180) 1.11ms
[I 2024-01-04 19:37:05.851 JupyterHub log:191] 200 GET /hub/home (foobarbaz@::ffff:10.3.129.170) 40.45ms
[D 2024-01-04 19:37:05.919 JupyterHub log:191] 200 GET /hub/static/js/home.js?v=20240104193206 (@::ffff:10.3.129.170) 9.08ms
[D 2024-01-04 19:37:05.993 JupyterHub log:191] 200 GET /hub/static/components/moment/moment.js?v=20240104193206 (@::ffff:10.3.129.170) 34.64ms
[D 2024-01-04 19:37:05.999 JupyterHub log:191] 200 GET /hub/static/js/jhapi.js?v=20240104193206 (@::ffff:10.3.129.170) 5.17ms
[D 2024-01-04 19:37:06.009 JupyterHub log:191] 200 GET /hub/static/js/utils.js?v=20240104193206 (@::ffff:10.3.129.170) 0.69ms
[D 2024-01-04 19:37:06.859 JupyterHub proxy:880] Proxy: Fetching GET http://proxy-api:8001/api/routes
[D 2024-01-04 19:37:06.864 JupyterHub proxy:392] Checking routes
[D 2024-01-04 19:37:07.115 JupyterHub log:191] 200 GET /hub/health (@10.3.132.180) 0.81ms
[D 2024-01-04 19:37:07.931 JupyterHub scopes:877] Checking access to /hub/spawn/foobarbaz via scope servers
[D 2024-01-04 19:37:07.931 JupyterHub scopes:690] Argument-based access to /hub/spawn/foobarbaz via servers
[D 2024-01-04 19:37:07.931 JupyterHub user:297] Discarding failed spawner foobarbaz
[D 2024-01-04 19:37:07.931 JupyterHub user:431] Creating <class 'kubespawner.spawner.KubeSpawner'> for foobarbaz:
[D 2024-01-04 19:37:07.934 JupyterHub pages:217] Triggering spawn with default options for foobarbaz
[D 2024-01-04 19:37:07.934 JupyterHub base:961] Initiating spawn for foobarbaz
[D 2024-01-04 19:37:07.934 JupyterHub base:965] 0/64 concurrent spawns
[D 2024-01-04 19:37:07.934 JupyterHub base:970] 0 active servers
[I 2024-01-04 19:37:07.966 JupyterHub provider:659] Creating oauth client jupyterhub-user-foobarbaz
[D 2024-01-04 19:37:08.001 JupyterHub user:794] Calling Spawner.start for foobarbaz
[I 2024-01-04 19:37:08.002 JupyterHub log:191] 302 GET /hub/spawn/foobarbaz -> /hub/spawn-pending/foobarbaz (foobarbaz@::ffff:10.3.129.170) 73.29ms
[I 2024-01-04 19:37:08.003 JupyterHub spawner:2551] Attempting to create pvc claim-foobarbaz, with timeout 3
[D 2024-01-04 19:37:08.010 JupyterHub scopes:877] Checking access to /hub/spawn-pending/foobarbaz via scope servers
[D 2024-01-04 19:37:08.010 JupyterHub scopes:690] Argument-based access to /hub/spawn-pending/foobarbaz via servers
[I 2024-01-04 19:37:08.010 JupyterHub pages:398] foobarbaz is pending spawn
[I 2024-01-04 19:37:08.011 JupyterHub log:191] 200 GET /hub/spawn-pending/foobarbaz (foobarbaz@::ffff:10.3.129.170) 2.88ms
[D 2024-01-04 19:37:08.050 JupyterHub scopes:877] Checking access to /hub/api/users/foobarbaz/server/progress via scope read:servers
[D 2024-01-04 19:37:08.050 JupyterHub scopes:690] Argument-based access to /hub/api/users/foobarbaz/server/progress via read:servers
[D 2024-01-04 19:37:08.051 JupyterHub spawner:2303] progress generator: jupyter-foobarbaz
[D 2024-01-04 19:37:09.115 JupyterHub log:191] 200 GET /hub/health (@10.3.132.180) 0.80ms
[I 2024-01-04 19:37:11.052 JupyterHub spawner:2551] Attempting to create pvc claim-foobarbaz, with timeout 3
[D 2024-01-04 19:37:11.115 JupyterHub log:191] 200 GET /hub/health (@10.3.132.180) 0.68ms
[D 2024-01-04 19:37:13.115 JupyterHub log:191] 200 GET /hub/health (@10.3.132.180) 0.73ms
[I 2024-01-04 19:37:14.345 JupyterHub spawner:2551] Attempting to create pvc claim-foobarbaz, with timeout 3
[D 2024-01-04 19:37:15.116 JupyterHub log:191] 200 GET /hub/health (@10.3.132.180) 0.90ms
[D 2024-01-04 19:37:15.116 JupyterHub log:191] 200 GET /hub/health (@10.3.132.180) 0.97ms
[D 2024-01-04 19:37:17.115 JupyterHub log:191] 200 GET /hub/health (@10.3.132.180) 0.69ms
[I 2024-01-04 19:37:18.111 JupyterHub spawner:2551] Attempting to create pvc claim-foobarbaz, with timeout 3
[D 2024-01-04 19:37:19.115 JupyterHub log:191] 200 GET /hub/health (@10.3.132.180) 0.71ms
[D 2024-01-04 19:37:21.115 JupyterHub log:191] 200 GET /hub/health (@10.3.132.180) 0.71ms
[I 2024-01-04 19:37:21.329 JupyterHub spawner:2551] Attempting to create pvc claim-foobarbaz, with timeout 3
[D 2024-01-04 19:37:23.116 JupyterHub log:191] 200 GET /hub/health (@10.3.132.180) 0.75ms
[D 2024-01-04 19:37:25.115 JupyterHub log:191] 200 GET /hub/health (@10.3.132.180) 1.12ms
[D 2024-01-04 19:37:25.116 JupyterHub log:191] 200 GET /hub/health (@10.3.132.180) 1.24ms
[I 2024-01-04 19:37:25.916 JupyterHub spawner:2551] Attempting to create pvc claim-foobarbaz, with timeout 3
[D 2024-01-04 19:37:27.115 JupyterHub log:191] 200 GET /hub/health (@10.3.132.180) 0.72ms
[D 2024-01-04 19:37:29.115 JupyterHub log:191] 200 GET /hub/health (@10.3.132.180) 0.71ms
[D 2024-01-04 19:37:31.115 JupyterHub log:191] 200 GET /hub/health (@10.3.132.180) 0.70ms
[D 2024-01-04 19:37:33.115 JupyterHub log:191] 200 GET /hub/health (@10.3.132.180) 0.70ms
[I 2024-01-04 19:37:33.524 JupyterHub spawner:2551] Attempting to create pvc claim-foobarbaz, with timeout 3
[D 2024-01-04 19:37:35.115 JupyterHub log:191] 200 GET /hub/health (@10.3.132.180) 0.97ms
[D 2024-01-04 19:37:35.116 JupyterHub log:191] 200 GET /hub/health (@10.3.132.180) 1.22ms
[W 2024-01-04 19:37:36.526 JupyterHub user:876] foobarbaz's server failed to start in 300 seconds, giving up.

    Common causes of this timeout, and debugging tips:

    1. Everything is working, but it took too long.
       To fix: increase `Spawner.start_timeout` configuration
       to a number of seconds that is enough for spawners to finish starting.
    2. The server didn't finish starting,
       or it crashed due to a configuration issue.
       Check the single-user server's logs for hints at what needs fixing.

[D 2024-01-04 19:37:36.526 JupyterHub user:982] Stopping foobarbaz
[D 2024-01-04 19:37:36.533 JupyterHub user:1002] Deleting oauth client jupyterhub-user-foobarbaz
[D 2024-01-04 19:37:36.550 JupyterHub user:1005] Finished stopping foobarbaz
[W 2024-01-04 19:37:36.564 JupyterHub base:1057] 2 consecutive spawns failed.  Hub will exit if failure count reaches 5 before succeeding
[E 2024-01-04 19:37:36.565 JupyterHub gen:630] Exception in Future <Task finished name='Task-666' coro=<BaseHandler.spawn_single_user.<locals>.finish_user_spawn() done, defined at /usr/local/lib/python3.11/site-packages/jupyterhub/handlers/base.py:981> exception=TimeoutError('Could not create PVC claim-foobarbaz')> after timeout
    Traceback (most recent call last):
      File "/usr/local/lib/python3.11/site-packages/tornado/gen.py", line 625, in error_callback
        future.result()
      File "/usr/local/lib/python3.11/site-packages/jupyterhub/handlers/base.py", line 988, in finish_user_spawn
        await spawn_future
      File "/usr/local/lib/python3.11/site-packages/jupyterhub/user.py", line 902, in spawn
        raise e
      File "/usr/local/lib/python3.11/site-packages/jupyterhub/user.py", line 798, in spawn
        url = await gen.with_timeout(timedelta(seconds=spawner.start_timeout), f)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/usr/local/lib/python3.11/site-packages/kubespawner/spawner.py", line 2706, in _start
        await exponential_backoff(
      File "/usr/local/lib/python3.11/site-packages/jupyterhub/utils.py", line 237, in exponential_backoff
        raise asyncio.TimeoutError(fail_message)
    TimeoutError: Could not create PVC claim-foobarbaz

[I 2024-01-04 19:37:36.566 JupyterHub log:191] 200 GET /hub/api/users/foobarbaz/server/progress?_xsrf=[secret] (foobarbaz@::ffff:10.3.129.170) 28517.02ms
[D 2024-01-04 19:37:37.116 JupyterHub log:191] 200 GET /hub/health
[... omitted more lines for brevity]
[D 2024-01-04 19:38:06.859 JupyterHub proxy:880] Proxy: Fetching GET http://proxy-api:8001/api/routes
[D 2024-01-04 19:38:06.863 JupyterHub proxy:392] Checking routes
[D 2024-01-04 19:38:07.115 JupyterHub log:191] 200 GET /hub/health (@10.3.132.180) 0.70ms
[... omitted more lines for brevity]
[D 2024-01-04 19:38:25.117 JupyterHub log:191] 200 GET /hub/health (@10.3.132.180) 1.26ms

(omitted some spammy GET /hub/health lines for brevity)

I think the storage problem is a side effect of the actual problem- it looks like JupyterHub can’t access the kubernetes API.

Do you have a fully functioning network controller? You could try disabling all Network Policies on Z2JH Security — Zero to JupyterHub with Kubernetes documentation

The CNI being used is Cilium which reports all is rosey. A connectivity test reported no obvious problems.
image

I can also exec into the hub pod and reach the API server:

However the hub and proxy pods’ IP addresses (k get pods -o wide) don’t respond to pings…

I’ll note that the pod CIDR (10.3.x.x) in this cluster is routable inside the cluster but not externally accessible. There is an externally routable LoadBalancer which is how I’m viewing the UI and that seems to be fine.

I would love to test if the hub can speak to the proxy but I’m not sure how to ascertain this. ping isn’t installed in the image and I seem unable to escalate to root inside the hub container.

Partially solved. I spotted some Policy denied traffic from hub to proxy (using Hubble, a Cilium visualisation tool):

Do you know what yaml/config I could add, and where, to allow this :80 network traffic?

Notebook running and looks good:


image

If you’re using the standard Z2JH config the hub pod should be connecting to the proxy pod on port 8001 not port 80. This is the case according to your earlier logs:

The proxy should be receiving public traffic on port 8000 (via a load-balancer or ingress).

Do you have any other configuration, extensions, etc?

Nothing comes to mind. I do see proxy-public’s endpoint on port 8000 (which responds). Services in the namespace show hub=8081, proxy-api=8001 and proxy-public=80

image

Happy to hear if you have more ideas but at this point it’s a good outcome to have this running mostly. The load balancer being used is MetalLB if that makes a difference (this is an on prem / bare metal cluster). Thanks for the help!

Can you check your debug hub logs, see where a query to the proxy on port 80 is being made?

I think this is a ‘standard’ cilium issue. With cilium you have to use a cilium-specific network policy to allow access to the API server. If the hub is covered by a network policy, then this means it will not be able to connect to the API server if network policies are enabled. Disabling network policies in values.yaml should allow it to work.

But it is better to keep network policy enabled and to add a network policy to allow access to the API server,:

kind: CiliumNetworkPolicy
apiVersion: cilium.io/v2
metadata:
  name: allow-access-to-api-server
  namespace: jupyterhub
spec:
  endpointSelector:
    matchLabels: {}
  egress:
    - toEntities:
        - kube-apiserver
    - toPorts:
        - ports:
            - port: "6443"
              protocol: TCP

The matchLabels above could be more specific.

1 Like