After OAuthenticator successfully returned the user, kubespawner stuck at creating PVC

Hello community,
I am using Jupyterhub helm chart version 2.0.0 (with some updates). Using Generic oauthenticator. Seeing an issue that after successful authentication, username is returned correctly and Kubespawner is trying to spawn a user pod. In this process, it gets stuck at creating PVC and finally gives up saying server failed to start. Here’s the log:

Looking for /usr/local/etc/jupyterhub/jupyterhub_config in /srv/jupyterhub
Loading /usr/local/etc/jupyterhub/secret/values.yaml
No config at /usr/local/etc/jupyterhub/existing-secret/values.yaml
[D 2022-12-01 17:24:44.771 JupyterHub application:858] Loaded config file: /usr/local/etc/jupyterhub/jupyterhub_config.py
[I 2022-12-01 17:24:44.832 JupyterHub app:2775] Running JupyterHub version 3.0.0
[I 2022-12-01 17:24:44.832 JupyterHub app:2805] Using Authenticator: oauthenticator.generic.GenericOAuthenticator-15.1.0
[I 2022-12-01 17:24:44.832 JupyterHub app:2805] Using Spawner: kubespawner.spawner.KubeSpawner-4.2.0
[I 2022-12-01 17:24:44.832 JupyterHub app:2805] Using Proxy: jupyterhub.proxy.ConfigurableHTTPProxy-3.0.0


[D 2022-12-01 17:25:34.847 JupyterHub roles:281] Assigning default role to User xxx_yyy
[I 2022-12-01 17:25:34.851 JupyterHub roles:238] Adding role user for User: xxx_yyy
[D 2022-12-01 17:25:34.865 JupyterHub roles:281] Assigning default role to User xxx_yyy
[D 2022-12-01 17:25:34.870 JupyterHub base:563] Setting cookie for xxx_yyy: jupyterhub-hub-login
[D 2022-12-01 17:25:34.870 JupyterHub base:559] Setting cookie jupyterhub-hub-login: {‘httponly’: True, ‘path’: ‘/hub/’}
[I 2022-12-01 17:25:34.871 JupyterHub base:810] User logged in: xxx_yyy
[I 2022-12-01 17:25:34.871 JupyterHub log:186] 302 GET /hub/oauth_callback?state=[secret]&code=[secret] → /hub/ (@::ffff:172.18.218.144) 1001.92ms
[D 2022-12-01 17:25:34.923 JupyterHub base:275] Recording first activity for <User(xxx_yyy 0/1 running)>
[D 2022-12-01 17:25:34.931 JupyterHub user:430] Creating <class ‘kubespawner.spawner.KubeSpawner’> for xxx_yyy:
[I 2022-12-01 17:25:34.936 JupyterHub log:186] 302 GET /hub/ → /hub/spawn (xxx_yyy@::ffff:172.18.218.144) 15.48ms
[D 2022-12-01 17:25:34.985 JupyterHub scopes:796] Checking access via scope servers
[D 2022-12-01 17:25:34.985 JupyterHub scopes:623] Argument-based access to /hub/spawn via servers
[D 2022-12-01 17:25:34.985 JupyterHub pages:213] Triggering spawn with default options for xxx_yyy
[D 2022-12-01 17:25:34.985 JupyterHub base:934] Initiating spawn for xxx_yyy
[D 2022-12-01 17:25:34.985 JupyterHub base:938] 0/64 concurrent spawns
[D 2022-12-01 17:25:34.985 JupyterHub base:943] 0 active servers
[I 2022-12-01 17:25:35.000 JupyterHub provider:651] Creating oauth client jupyterhub-user-xxx_yyy
[D 2022-12-01 17:25:35.019 JupyterHub user:743] Calling Spawner.start for xxx_yyy
[I 2022-12-01 17:25:35.020 JupyterHub spawner:2509] Attempting to create pvc claim-xxx-5fyyy, with timeout 3
[I 2022-12-01 17:25:35.022 JupyterHub log:186] 302 GET /hub/spawn → /hub/spawn-pending/xxx_yyy (xxx_yyy@::ffff:172.18.218.144) 38.92ms
[D 2022-12-01 17:25:35.079 JupyterHub scopes:796] Checking access via scope servers
[D 2022-12-01 17:25:35.079 JupyterHub scopes:623] Argument-based access to /hub/spawn-pending/xxx_yyy via servers
[I 2022-12-01 17:25:35.079 JupyterHub pages:394] xxx_yyy is pending spawn
[I 2022-12-01 17:25:35.083 JupyterHub log:186] 200 GET /hub/spawn-pending/xxx_yyy (xxx_yyy@::ffff:172.18.218.144) 6.20ms
[D 2022-12-01 17:25:35.146 JupyterHub log:186] 304 GET /hub/static/css/style.min.css?v=bff49b4a161afb17ee3b71927ce7d6c4e5b0e4b9ef6f18ca3e356a05f29e69776d3a76aee167060dd2ae2ee62d3cfdcf203b4b0090b1423f7d629ea7daa3f9da (@::ffff:172.18.218.144) 0.89ms
[D 2022-12-01 17:25:35.152 JupyterHub log:186] 304 GET /hub/static/components/requirejs/require.js?v=bd1aa102bdb0b27fbf712b32cfcd29b016c272acf3d864ee8469376eaddd032cadcf827ff17c05a8c8e20061418fe58cf79947049f5c0dff3b4f73fcc8cad8ec (@::ffff:172.18.218.144) 0.70ms
[D 2022-12-01 17:25:35.154 JupyterHub log:186] 304 GET /hub/static/components/jquery/dist/jquery.min.js?v=69528a4518bf43f615fb89a3a0a06c138c771fe0647a0a0cfde9b8e8d3650aa3539946000e305b78d79f371615ee0894a74571202b6a76b6ea53b89569e64d5c (@::ffff:172.18.218.144) 0.62ms
[D 2022-12-01 17:25:35.156 JupyterHub log:186] 304 GET /hub/static/components/bootstrap/dist/js/bootstrap.min.js?v=a014e9acc78d10a0a7a9fbaa29deac6ef17398542d9574b77b40bf446155d210fa43384757e3837da41b025998ebfab4b9b6f094033f9c226392b800df068bce (@::ffff:172.18.218.144) 0.67ms
[D 2022-12-01 17:25:35.204 JupyterHub log:186] 200 GET /hub/logo (@::ffff:172.18.218.144) 0.80ms
[D 2022-12-01 17:25:35.221 JupyterHub log:186] 304 GET /hub/static/components/font-awesome/fonts/fontawesome-webfont.woff2?v=4.7.0 (@::ffff:172.18.218.144) 0.68ms
[D 2022-12-01 17:25:35.227 JupyterHub scopes:796] Checking access via scope read:servers
[D 2022-12-01 17:25:35.227 JupyterHub scopes:610] Unrestricted access to /hub/api/users/xxx_yyy/server/progress via read:servers
[D 2022-12-01 17:25:35.229 JupyterHub spawner:2308] progress generator: jupyter-xxx-5fyyy
[D 2022-12-01 17:25:35.825 JupyterHub log:186] 200 GET /hub/health (@172.18.217.225) 0.69ms
[D 2022-12-01 17:25:37.825 JupyterHub log:186] 200 GET /hub/health (@172.18.217.225) 0.65ms
[I 2022-12-01 17:25:38.040 JupyterHub spawner:2509] Attempting to create pvc claim-xxx-5fyyy, with timeout 3
[D 2022-12-01 17:25:39.826 JupyterHub log:186] 200 GET /hub/health (@172.18.217.225) 0.72ms
[I 2022-12-01 17:25:41.370 JupyterHub spawner:2509] Attempting to create pvc claim-xxx-5fyyy, with timeout 3
[D 2022-12-01 17:25:41.825 JupyterHub log:186] 200 GET /hub/health (@172.18.217.225) 0.74ms
[D 2022-12-01 17:25:43.825 JupyterHub log:186] 200 GET /hub/health (@172.18.217.225) 0.66ms
[I 2022-12-01 17:25:44.432 JupyterHub spawner:2509] Attempting to create pvc claim-xxx-5fyyy, with timeout 3
[D 2022-12-01 17:25:45.106 JupyterHub proxy:884] Proxy: Fetching GET http://proxy-api:8001/api/routes
[D 2022-12-01 17:25:45.110 JupyterHub proxy:395] Checking routes
[D 2022-12-01 17:25:45.825 JupyterHub log:186] 200 GET /hub/health (@172.18.217.225) 0.66ms
[D 2022-12-01 17:25:47.825 JupyterHub log:186] 200 GET /hub/health (@172.18.217.225) 0.66ms
[I 2022-12-01 17:25:47.993 JupyterHub spawner:2509] Attempting to create pvc claim-xxx-5fyyy, with timeout 3
[D 2022-12-01 17:25:49.825 JupyterHub log:186] 200 GET /hub/health (@172.18.217.225) 0.66ms
[D 2022-12-01 17:25:51.826 JupyterHub log:186] 200 GET /hub/health (@172.18.217.225) 0.70ms
[I 2022-12-01 17:25:52.390 JupyterHub spawner:2509] Attempting to create pvc claim-xxx-5fyyy, with timeout 3
[D 2022-12-01 17:25:53.825 JupyterHub log:186] 200 GET /hub/health (@172.18.217.225) 0.68ms
[D 2022-12-01 17:25:55.825 JupyterHub log:186] 200 GET /hub/health (@172.18.217.225) 0.68ms
[D 2022-12-01 17:25:57.825 JupyterHub log:186] 200 GET /hub/health (@172.18.217.225) 0.65ms
[W 2022-12-01 17:25:58.774 JupyterHub web:1796] 403 GET /hub/metrics (172.18.219.176): Access to metrics requires scope ‘read:metrics’
[D 2022-12-01 17:25:58.775 JupyterHub base:1342] No template for 403
[W 2022-12-01 17:25:58.775 JupyterHub log:186] 403 GET /hub/metrics (@172.18.219.176) 1.38ms
[D 2022-12-01 17:25:59.825 JupyterHub log:186] 200 GET /hub/health (@172.18.217.225) 0.67ms
[I 2022-12-01 17:26:00.394 JupyterHub spawner:2509] Attempting to create pvc claim-xxx-5fyyy, with timeout 3
[D 2022-12-01 17:26:01.825 JupyterHub log:186] 200 GET /hub/health (@172.18.217.225) 0.66ms
[D 2022-12-01 17:26:03.825 JupyterHub log:186] 200 GET /hub/health (@172.18.217.225) 0.69ms
[D 2022-12-01 17:26:05.825 JupyterHub log:186] 200 GET /hub/health (@172.18.217.225) 0.67ms
[I 2022-12-01 17:26:07.405 JupyterHub spawner:2509] Attempting to create pvc claim-xxx-5fyyy, with timeout 3
[D 2022-12-01 17:26:07.825 JupyterHub log:186] 200 GET /hub/health (@172.18.217.225) 0.66ms
[D 2022-12-01 17:26:09.825 JupyterHub log:186] 200 GET /hub/health (@172.18.217.225) 0.68ms
[W 2022-12-01 17:26:10.406 JupyterHub user:824] xxx_yyy’s server failed to start in 900 seconds, giving up.

Common causes of this timeout, and debugging tips:

1. Everything is working, but it took too long.
   To fix: increase `Spawner.start_timeout` configuration
   to a number of seconds that is enough for spawners to finish starting.
2. The server didn't finish starting,
   or it crashed due to a configuration issue.
   Check the single-user server's logs for hints at what needs fixing.

[D 2022-12-01 17:26:10.406 JupyterHub user:930] Stopping xxx_yyy
[D 2022-12-01 17:26:11.825 JupyterHub log:186] 200 GET /hub/health (@172.18.217.225) 0.64ms
[D 2022-12-01 17:26:13.825 JupyterHub log:186] 200 GET /hub/health (@172.18.217.225) 0.63ms

Please any help here?

It sounds like Kubernetes is unable to create the PVC.

Can you show us your Z2JH configuration?

How did you setup K8s? Are dynamic volumes working for all other applications?

Thank you ! The issue was https_proxy, http_proxy set for authentication. For spawner, I had to delete those env variables and k8s was able to create pvc and pods.

1 Like

Hi, I am facing the same issue.

Our AKS setup is as follows:

  • Private AKS cluster in a peered VNET connected to on-prem network in a hub-spoke model
  • AAD authentication
  • On AKS, both http_proxy and https_proxy variables set to a ‘http’ corporate proxy

Depending on how the http_proxy/https_proxy are set in hub.extraEnv, I get a variety of error messages:

  1. If neither is set, I get a 503 error with the hub unable to reach proxy api endpoint.
  2. If I set http_proxy=“”, effectively forcing the use of the https_proxy variable, hub starts up and I am able to authenticate with our AD and reach a custom profile page. But, then I get the same error as above (failure to create a pvc) when I try to spawn a server.
  3. I then set https_proxy=“” as well and do another helm upgrade and this time the server spins up fine.
  4. However, if I now logout (after both http and https proxy have been set to “”), I am unable to log back in!!!

And, amongst all of the above, aiohttp through a service unavailable exception when trying to reach the proxy!

I am happy to provide detailed logs if someone would be kind enough to have a look.

In fact the error is indeed in aiohttp.

File "/usr/local/lib/python3.9/site-packages/aiohttp/connector.py", line 905, in _create_connection

_, proto = await self._create_proxy_connection(req, traces, timeout)

File "/usr/local/lib/python3.9/site-packages/aiohttp/connector.py", line 1288, in _create_proxy_connection

raise ClientHttpProxyError(
    aiohttp.client_exceptions.ClientHttpProxyError: 503, 
    message='Service Unavailable',
      url=URL('http://proxy-url:800/'
              )
)


Task was destroyed but it is pending!

task: <Task pending name='Task-149' coro=<shared_client.<locals>.close_client_task() running at /usr/local/lib/python3.9/site-packages/kubespawner/clients.py:58> wait_for=<Future pending cb=[<TaskWakeupMethWrapper object at 0x7f08d3182a00>()]>>

Exception ignored in: <coroutine object shared_client.<locals>.close_client_task at 0x7f08d30ba3c0>

RuntimeError: coroutine ignored GeneratorExit

Task exception was never retrieved

future: <Task finished name='Task-151' coro=<KubeSpawner._start_reflector.<locals>.catch_reflector_start() done, defined at /usr/local/lib/python3.9/site-packages/kubespawner/spawner.py:2398> exception=SystemExit(1)>

Traceback (most recent call last):

File "/usr/local/lib/python3.9/site-packages/kubespawner/spawner.py", line 2400, in catch_reflector_start

await f

File "/usr/local/lib/python3.9/site-packages/kubespawner/reflector.py", line 383, in start

await self._list_and_update()

File "/usr/local/lib/python3.9/site-packages/kubespawner/reflector.py", line 228, in _list_and_update

initial_resources_raw = await list_method(**kwargs)

File "/usr/local/lib/python3.9/site-packages/kubernetes_asyncio/client/api_client.py", line 185, in __call_api

response_data = await self.request(

File "/usr/local/lib/python3.9/site-packages/kubernetes_asyncio/client/rest.py", line 193, in GET

return (await self.request("GET", url,

File "/usr/local/lib/python3.9/site-packages/kubernetes_asyncio/client/rest.py", line 177, in request

r = await self.pool_manager.request(**args)

File "/usr/local/lib/python3.9/site-packages/aiohttp/client.py", line 535, in _request

conn = await self._connector.connect(

File "/usr/local/lib/python3.9/site-packages/aiohttp/connector.py", line 542, in connect

proto = await self._create_connection(req, traces, timeout)

File "/usr/local/lib/python3.9/site-packages/aiohttp/connector.py", line 905, in _create_connection

_, proto = await self._create_proxy_connection(req, traces, timeout)

File "/usr/local/lib/python3.9/site-packages/aiohttp/connector.py", line 1288, in _create_proxy_connection

raise ClientHttpProxyError(

aiohttp.client_exceptions.ClientHttpProxyError: 503, message='Service Unavailable', url=URL('http://proxy-url:800/')


During handling of the above exception, another exception occurred:


Traceback (most recent call last):

File "/usr/local/lib/python3.9/site-packages/jupyterhub/app.py", line 3313, in launch_instance

loop.start()

File "/usr/local/lib/python3.9/site-packages/tornado/platform/asyncio.py", line 215, in start

self.asyncio_loop.run_forever()

File "/usr/local/lib/python3.9/asyncio/base_events.py", line 601, in run_forever

self._run_once()

File "/usr/local/lib/python3.9/asyncio/base_events.py", line 1905, in _run_once

handle._run()

File "/usr/local/lib/python3.9/asyncio/events.py", line 80, in _run

self._context.run(self._callback, *self._args)

File "/usr/local/lib/python3.9/site-packages/kubespawner/spawner.py", line 2403, in catch_reflector_start

sys.exit(1)

SystemExit: 1

Task was destroyed but it is pending!

task: <Task pending name='Task-152' coro=<ResourceReflector.start() running at /usr/local/lib/python3.9/site-packages/kubespawner/reflector.py:383> wait_for=<Future pending cb=[<TaskWakeupMethWrapper object at 0x7f08d30af160>()]> cb=[<TaskWakeupMethWrapper object at 0x7f08d30c4e80>()]>

Task was destroyed but it is pending!

task: <Task pending name='Task-153' coro=<KubeSpawner._start_reflector.<locals>.catch_reflector_start() running at /usr/local/lib/python3.9/site-packages/kubespawner/spawner.py:2400> wait_for=<Task pending name='Task-152' coro=<ResourceReflector.start() done, defined at /usr/local/lib/python3.9/site-packages/kubespawner/reflector.py:370> wait_for=<Future pending cb=[<TaskWakeupMethWrapper object at 0x7f08d30af160>()]> cb=[<TaskWakeupMethWrapper object at 0x7f08d30c4e80>()]>>

This sounds quite strange. The connection between JupyterHub’s hub and proxy pods are entirely within the same namespace of the k8s cluster, there shouldn’t be a need for a proxy.

Are your hub and proxy pods on the same or different nodes? Does it make a difference if they are/aren’t?

@manics , thank you for the quick response.

The hub and proxy pods are on the same node. And there is no fancy networking policies in place. There is just one node in the cluster at the moment.

The ideal solution would be for us to get z2jh 2.0 up and running. This is unlikely to be a z2jh issue itself and is more likely to be related to AKS and aiohttp but I will be grateful for any insights from the community. Especially around how z2jh treats http_proxy and https_proxy environment variables. I can’t set up a cluster without these and disabling them just on the hub seems to break authentication.

JupyterHub has been awesome for us as a secure data analytics / trusted research environment for health care data - but a kubernetes upgrade broke our cluster and I have had to rebuild the cluster and reinstall z2jh.

Apologies in advance if my bug reports are not heavy on detail. I am a clinician technologist who has built and managed our AKS cluster and JupyterHub instance for a couple of years and have always been operating at the limits of my capabilities.

Unfortunately there’s a known problem with OAuthenticator

I’ll have a think about possible solutions, in the mean time if you know what your previous Z2JH version was you could try reverting to that… not ideal as it’ll be out of date.

Sounds great… I’m also working with TREs, though not (yet) running JupyterHub. Do you mind if I contact you separately about TREs?

Thanks once again. I will read through the links you have provided.

Regarding TREs, would love to talk to you about what we have or had as of a week ago until I broke it. A short presentation on the topic here - http://northwest-lsc-tre.surge.sh/#/4 and easiest to contact through LinkedIn (https://www.linkedin.com/in/vvcb/) and we can set up a Teams call

Previous version (1.2) was working beautifully. Since rebuilding the cluster (with Kubernetes 1.26), that does not work either. Authentication works but things get stuck again - this time a singleuser pod is deployed but block-cloud-metadata init container is stuck in waiting. If I disable this in the helm chart, things get stuck at the next stage (the single user container itself).

I wonder this may have something to do with how aiohttp handles HTTPS_PROXY (although this variable is set to a http endpoint).

Found the following:

I suspect I maybe headed down the wrong rabbit hole here - but aren’t they all interesting?! :slight_smile:

I am thinking about rewriting this section https://github.com/tomplus/kubernetes_asyncio/blob/d236c1cfa9fd8ae87f61adfe9d951239577f0fbf/kubernetes_asyncio/client/rest.py#L53-L65 and injecting it into the hub pod using the extraFiles key. (I should not be allowed to play with this stuff!).

A miore detailed error log below:

LogEntry
task: <Task pending name='Task-305' coro=<RequestHandler._execute() running at /usr/local/lib/python3.9/site-packages/tornado/web.py:1713> wait_for=<Future pending cb=[<TaskWakeupMethWrapper object at 0x7f342baabfd0>()]> cb=[_HandlerDelegate.execute.<locals>.<lambda>() at /usr/local/lib/python3.9/site-packages/tornado/web.py:2361]>
Task was destroyed but it is pending!
RuntimeError: coroutine ignored GeneratorExit
Exception ignored in: <coroutine object shared_client.<locals>.close_client_task at 0x7f342bb5f4c0>
task: <Task pending name='Task-243' coro=<KubeSpawner._start_reflector.<locals>.catch_reflector_start() running at /usr/local/lib/python3.9/site-packages/kubespawner/spawner.py:2400> wait_for=<Task pending name='Task-242' coro=<ResourceReflector.start() running at /usr/local/lib/python3.9/site-packages/kubespawner/reflector.py:383> wait_for=<Future pending cb=[<TaskWakeupMethWrapper object at 0x7f342bb297c0>()]> cb=[<TaskWakeupMethWrapper object at 0x7f342bb67520>()]>>
Task was destroyed but it is pending!
task: <Task pending name='Task-242' coro=<ResourceReflector.start() running at /usr/local/lib/python3.9/site-packages/kubespawner/reflector.py:383> wait_for=<Future pending cb=[<TaskWakeupMethWrapper object at 0x7f342bb297c0>()]> cb=[<TaskWakeupMethWrapper object at 0x7f342bb67520>()]>
Task was destroyed but it is pending!
SystemExit: 1
    sys.exit(1)
  File "/usr/local/lib/python3.9/site-packages/kubespawner/spawner.py", line 2403, in catch_reflector_start
    self._context.run(self._callback, *self._args)
  File "/usr/local/lib/python3.9/asyncio/events.py", line 80, in _run
    handle._run()
  File "/usr/local/lib/python3.9/asyncio/base_events.py", line 1905, in _run_once
    self._run_once()
  File "/usr/local/lib/python3.9/asyncio/base_events.py", line 601, in run_forever
    self.asyncio_loop.run_forever()
  File "/usr/local/lib/python3.9/site-packages/tornado/platform/asyncio.py", line 215, in start
    loop.start()
  File "/usr/local/lib/python3.9/site-packages/jupyterhub/app.py", line 3313, in launch_instance
Traceback (most recent call last):

During handling of the above exception, another exception occurred:

aiohttp.client_exceptions.ClientHttpProxyError: 503, message='Service Unavailable', url=URL('http://proxy-url:800/')
    raise ClientHttpProxyError(
  File "/usr/local/lib/python3.9/site-packages/aiohttp/connector.py", line 1288, in _create_proxy_connection
    _, proto = await self._create_proxy_connection(req, traces, timeout)
  File "/usr/local/lib/python3.9/site-packages/aiohttp/connector.py", line 905, in _create_connection
    proto = await self._create_connection(req, traces, timeout)
  File "/usr/local/lib/python3.9/site-packages/aiohttp/connector.py", line 542, in connect
    conn = await self._connector.connect(
  File "/usr/local/lib/python3.9/site-packages/aiohttp/client.py", line 535, in _request
    r = await self.pool_manager.request(**args)
  File "/usr/local/lib/python3.9/site-packages/kubernetes_asyncio/client/rest.py", line 177, in request
    return (await self.request("GET", url,
  File "/usr/local/lib/python3.9/site-packages/kubernetes_asyncio/client/rest.py", line 193, in GET
    response_data = await self.request(
  File "/usr/local/lib/python3.9/site-packages/kubernetes_asyncio/client/api_client.py", line 185, in __call_api
    initial_resources_raw = await list_method(**kwargs)
  File "/usr/local/lib/python3.9/site-packages/kubespawner/reflector.py", line 228, in _list_and_update
    await self._list_and_update()
    await self.pod_reflector.first_load_future
  File "/usr/local/lib/python3.9/site-packages/kubespawner/spawner.py", line 213, in async_method
    status = await spawner.poll()
  File "/usr/local/lib/python3.9/site-packages/jupyterhub/user.py", line 934, in stop
    await self.stop(spawner.name)
  File "/usr/local/lib/python3.9/site-packages/jupyterhub/user.py", line 840, in spawn
    await f
  File "/usr/local/lib/python3.9/site-packages/kubespawner/spawner.py", line 2400, in catch_reflector_start
Traceback (most recent call last):

During handling of the above exception, another exception occurred:

asyncio.exceptions.TimeoutError: Could not create pod landerhub-prd/jupyter-single-user-pod-name
    raise asyncio.TimeoutError(fail_message)
  File "/usr/local/lib/python3.9/site-packages/jupyterhub/utils.py", line 236, in exponential_backoff
    await exponential_backoff(
  File "/usr/local/lib/python3.9/site-packages/kubespawner/spawner.py", line 2663, in _start
    url = await gen.with_timeout(timedelta(seconds=spawner.start_timeout), f)
  File "/usr/local/lib/python3.9/site-packages/jupyterhub/user.py", line 747, in spawn
    raise e
  File "/usr/local/lib/python3.9/site-packages/jupyterhub/user.py", line 850, in spawn
    await spawn_future
  File "/usr/local/lib/python3.9/site-packages/jupyterhub/handlers/base.py", line 961, in finish_user_spawn
Traceback (most recent call last):
future: <Task finished name='Task-241' coro=<KubeSpawner._start_reflector.<locals>.catch_reflector_start() done, defined at /usr/local/lib/python3.9/site-packages/kubespawner/spawner.py:2398> exception=SystemExit(1)>
Task exception was never retrieved
task: <Task pending name='Task-239' coro=<shared_client.<locals>.close_client_task() running at /usr/local/lib/python3.9/site-packages/kubespawner/clients.py:58> wait_for=<Future pending cb=[<TaskWakeupMethWrapper object at 0x7f342c406fd0>()]>>
Task was destroyed but it is pending!
    
    aiohttp.client_exceptions.ClientHttpProxyError: 503, message='Service Unavailable', url=URL('http://proxy-url:800/')
        raise ClientHttpProxyError(
      File "/usr/local/lib/python3.9/site-packages/aiohttp/connector.py", line 1288, in _create_proxy_connection
        _, proto = await self._create_proxy_connection(req, traces, timeout)
      File "/usr/local/lib/python3.9/site-packages/aiohttp/connector.py", line 905, in _create_connection
        proto = await self._create_connection(req, traces, timeout)
      File "/usr/local/lib/python3.9/site-packages/aiohttp/connector.py", line 542, in connect
        conn = await self._connector.connect(
      File "/usr/local/lib/python3.9/site-packages/aiohttp/client.py", line 535, in _request
        r = await self.pool_manager.request(**args)
      File "/usr/local/lib/python3.9/site-packages/kubernetes_asyncio/client/rest.py", line 177, in request
        return (await self.request("GET", url,
      File "/usr/local/lib/python3.9/site-packages/kubernetes_asyncio/client/rest.py", line 193, in GET
        response_data = await self.request(
      File "/usr/local/lib/python3.9/site-packages/kubernetes_asyncio/client/api_client.py", line 185, in __call_api
        initial_resources_raw = await list_method(**kwargs)
      File "/usr/local/lib/python3.9/site-packages/kubespawner/reflector.py", line 228, in _list_and_update
        await self._list_and_update()
      File "/usr/local/lib/python3.9/site-packages/kubespawner/reflector.py", line 383, in start
        await self.pod_reflector.first_load_future
      File "/usr/local/lib/python3.9/site-packages/kubespawner/spawner.py", line 213, in async_method
        status = await spawner.poll()
      File "/usr/local/lib/python3.9/site-packages/jupyterhub/user.py", line 934, in stop
        await self.stop(spawner.name)
      File "/usr/local/lib/python3.9/site-packages/jupyterhub/user.py", line 840, in spawn
        await f
      File "/usr/local/lib/python3.9/site-packages/kubespawner/spawner.py", line 2400, in catch_reflector_start
    Traceback (most recent call last):
[E 2023-03-11 12:42:03.019 JupyterHub spawner:2402] Reflector for pods failed to start.
    
    aiohttp.client_exceptions.ClientHttpProxyError: 503, message='Service Unavailable', url=URL('http://proxy-url:800/')
        raise ClientHttpProxyError(
      File "/usr/local/lib/python3.9/site-packages/aiohttp/connector.py", line 1288, in _create_proxy_connection
        _, proto = await self._create_proxy_connection(req, traces, timeout)
      File "/usr/local/lib/python3.9/site-packages/aiohttp/connector.py", line 905, in _create_connection
        proto = await self._create_connection(req, traces, timeout)
      File "/usr/local/lib/python3.9/site-packages/aiohttp/connector.py", line 542, in connect
        conn = await self._connector.connect(
      File "/usr/local/lib/python3.9/site-packages/aiohttp/client.py", line 535, in _request
        r = await self.pool_manager.request(**args)
      File "/usr/local/lib/python3.9/site-packages/kubernetes_asyncio/client/rest.py", line 177, in request
        return (await self.request("GET", url,
      File "/usr/local/lib/python3.9/site-packages/kubernetes_asyncio/client/rest.py", line 193, in GET
        response_data = await self.request(
      File "/usr/local/lib/python3.9/site-packages/kubernetes_asyncio/client/api_client.py", line 185, in __call_api
        initial_resources_raw = await list_method(**kwargs)
      File "/usr/local/lib/python3.9/site-packages/kubespawner/reflector.py", line 228, in _list_and_update
        await self._list_and_update()
      File "/usr/local/lib/python3.9/site-packages/kubespawner/reflector.py", line 383, in start
        await self.pod_reflector.first_load_future
      File "/usr/local/lib/python3.9/site-packages/kubespawner/spawner.py", line 213, in async_method
        status = await spawner.poll()
      File "/usr/local/lib/python3.9/site-packages/jupyterhub/user.py", line 934, in stop
        await self.stop(spawner.name)
      File "/usr/local/lib/python3.9/site-packages/jupyterhub/user.py", line 840, in spawn
    Traceback (most recent call last):
[E 2023-03-11 12:42:03.019 JupyterHub user:842] Failed to cleanup single-user-name's server that failed to start
    
    aiohttp.client_exceptions.ClientHttpProxyError: 503, message='Service Unavailable', url=URL('http://proxy-url:800/')
        raise ClientHttpProxyError(
      File "/usr/local/lib/python3.9/site-packages/aiohttp/connector.py", line 1288, in _create_proxy_connection
        _, proto = await self._create_proxy_connection(req, traces, timeout)
      File "/usr/local/lib/python3.9/site-packages/aiohttp/connector.py", line 905, in _create_connection
        proto = await self._create_connection(req, traces, timeout)
      File "/usr/local/lib/python3.9/site-packages/aiohttp/connector.py", line 542, in connect
        conn = await self._connector.connect(
      File "/usr/local/lib/python3.9/site-packages/aiohttp/client.py", line 535, in _request
        r = await self.pool_manager.request(**args)
      File "/usr/local/lib/python3.9/site-packages/kubernetes_asyncio/client/rest.py", line 177, in request
        return (await self.request("GET", url,
      File "/usr/local/lib/python3.9/site-packages/kubernetes_asyncio/client/rest.py", line 193, in GET
        response_data = await self.request(
      File "/usr/local/lib/python3.9/site-packages/kubernetes_asyncio/client/api_client.py", line 185, in __call_api
        initial_resources_raw = await list_method(**kwargs)
      File "/usr/local/lib/python3.9/site-packages/kubespawner/reflector.py", line 228, in _list_and_update
        await self._list_and_update()
      File "/usr/local/lib/python3.9/site-packages/kubespawner/reflector.py", line 383, in start
    Traceback (most recent call last):
[E 2023-03-11 12:42:03.005 JupyterHub reflector:385] Initial list of pods failed
  File "/usr/local/lib/python3.9/site-packages/kubespawner/reflector.py", line 383, in start
    
       Check the single-user server's logs for hints at what needs fixing.
       or it crashed due to a configuration issue.
    2. The server didn't finish starting,
       to a number of seconds that is enough for spawners to finish starting.
       To fix: increase `Spawner.start_timeout` configuration
    1. Everything is working, but it took too long.
    
    Common causes of this timeout, and debugging tips:
    
[W 2023-03-11 12:41:43.860 JupyterHub user:824] single-user-name's server failed to start in 900 seconds, giving up.
[I 2023-03-11 12:41:40.858 JupyterHub spawner:2469] Attempting to create pod jupyter-single-user-pod-name, with timeout 3
[I 2023-03-11 12:41:36.729 JupyterHub spawner:2469] Attempting to create pod jupyter-single-user-pod-name, with timeout 3
[I 2023-03-11 12:41:29.136 JupyterHub spawner:2469] Attempting to create pod jupyter-single-user-pod-name, with timeout 3
[I 2023-03-11 12:41:25.377 JupyterHub spawner:2469] Attempting to create pod jupyter-single-user-pod-name, with timeout 3
[I 2023-03-11 12:41:22.160 JupyterHub spawner:2469] Attempting to create pod jupyter-single-user-pod-name, with timeout 3
[I 2023-03-11 12:41:18.870 JupyterHub spawner:2469] Attempting to create pod jupyter-single-user-pod-name, with timeout 3
[I 2023-03-11 12:41:15.674 JupyterHub spawner:2469] Attempting to create pod jupyter-single-user-pod-name, with timeout 3
[I 2023-03-11 12:41:14.462 JupyterHub log:186] 200 GET /landerhub/hub/login?next=%2Flanderhub%2Fhub%2Fuser%2Fsingle-user-name%40email.com%2Fapi%2Fsessions%3F1678538473615 (@10.244.4.1) 1.26ms
[I 2023-03-11 12:41:14.172 JupyterHub log:186] 302 GET /landerhub/hub/user/single-user-name/api/sessions?1678538473615 -> /landerhub/hub/login?next=%2Flanderhub%2Fhub%2Fuser%2Fsingle-user-name%40email.com%2Fapi%2Fsessions%3F1678538473615 (@10.244.4.1) 1.40ms
[I 2023-03-11 12:41:13.836 JupyterHub log:186] 302 GET /landerhub/user/single-user-name/api/sessions?1678538473615 -> /landerhub/hub/user/single-user-name/api/sessions?1678538473615 (@10.244.4.1) 0.58ms
[I 2023-03-11 12:41:12.584 JupyterHub log:186] 200 GET /landerhub/hub/spawn-pending/single-user-name (single-user-name@10.244.4.1) 4.10ms
[I 2023-03-11 12:41:12.581 JupyterHub pages:394] single-user-name is pending spawn
[I 2023-03-11 12:41:12.489 JupyterHub log:186] 302 POST /landerhub/hub/spawn/single-user-name -> /landerhub/hub/spawn-pending/single-user-name (single-user-name@10.244.4.1) 58.42ms
[I 2023-03-11 12:41:12.488 JupyterHub spawner:2469] Attempting to create pod jupyter-single-user-pod-name, with timeout 3
[I 2023-03-11 12:41:12.487 JupyterHub jupyterhub_config_custom:159] Successfully mounted landerhub-common to /home/jovyan/shared_readonly.
[I 2023-03-11 12:41:12.487 JupyterHub jupyterhub_config_custom:126] Successfully mounted landerhub-vc-sd to /home/jovyan/landerhub-vc-sd.
[I 2023-03-11 12:41:12.487 JupyterHub jupyterhub_config_custom:99] Attempting to mount {'volume_mounts': [{'mountPath': '/home/jovyan/landerhub-vc-sd', 'name': 'landerhub-vc-sd', 'readOnly': False}], 'volumes': [{'name': 'landerhub-vc-sd', 'persistentVolumeClaim': {'claimName': 'pvc-landerhub-vc-sd'}}]}...
[I 2023-03-11 12:41:12.455 JupyterHub provider:651] Creating oauth client jupyterhub-single-user-name
[I 2023-03-11 12:41:07.916 JupyterHub log:186] 200 GET /landerhub/hub/spawn/single-user-name (single-user-name@10.244.4.1) 11.37ms
[I 2023-03-11 12:41:07.909 JupyterHub jupyterhub_config_custom:48] Workspace colorectal_2ww_jupyter expired on 2022-12-31. Consider removing it from config.
[I 2023-03-11 12:41:03.861 JupyterHub log:186] 200 GET /landerhub/hub/spawn-pending/single-user-name (single-user-name@10.244.4.1) 21.28ms
[W 2023-03-11 12:40:24.317 JupyterHub log:186] 404 GET /landerhub/hub/api/users/single-user-name/server/progress (single-user-name@10.244.4.1) 15.22ms
[I 2023-03-11 12:40:20.214 JupyterHub log:186] 200 GET /landerhub/hub/login?next=%2Flanderhub%2Fhub%2Fuser%2Fsingle-user-name%40email.com%2Fapi%2Fkernelspecs%3F1678538419337 (@10.244.4.1) 23.54ms
[I 2023-03-11 12:40:19.893 JupyterHub log:186] 302 GET /landerhub/hub/user/single-user-name/api/kernelspecs?1678538419337 -> /landerhub/hub/login?next=%2Flanderhub%2Fhub%2Fuser%2Fsingle-user-name%40email.com%2Fapi%2Fkernelspecs%3F1678538419337 (@10.244.4.1) 1.51ms
[I 2023-03-11 12:40:19.606 JupyterHub log:186] 302 GET /landerhub/user/single-user-name/api/kernelspecs?1678538419337 -> /landerhub/hub/user/single-user-name/api/kernelspecs?1678538419337 (@10.244.4.1) 0.58ms
[I 2023-03-11 12:39:26.279 JupyterHub log:186] 200 GET /landerhub/hub/api/users?state=[secret] (jupyterhub-idle-culler@::1) 7.42ms
[I 2023-03-11 12:39:26.271 JupyterHub log:186] 200 GET /landerhub/hub/api/ (jupyterhub-idle-culler@::1) 11.68ms
[I 2023-03-11 12:39:26.172 JupyterHub app:3162] JupyterHub is now running, internal Hub API at http://hub:8081/landerhub/hub/
[I 2023-03-11 12:39:26.171 JupyterHub app:3113] Adding external service dask-gateway
[I 2023-03-11 12:39:26.167 JupyterHub service:133] Spawning python3 -m jupyterhub_idle_culler --url=http://localhost:8081/landerhub/hub/api --timeout=3600 --cull-every=600 --concurrency=10
[I 2023-03-11 12:39:26.166 JupyterHub service:385] Starting service 'jupyterhub-idle-culler': ['python3', '-m', 'jupyterhub_idle_culler', '--url=http://localhost:8081/landerhub/hub/api', '--timeout=3600', '--cull-every=600', '--concurrency=10']
[I 2023-03-11 12:39:26.165 JupyterHub app:3104] Starting managed service jupyterhub-idle-culler
[I 2023-03-11 12:39:26.165 JupyterHub app:3095] Private Hub API connect url http://hub:8081/landerhub/hub/
[I 2023-03-11 12:39:26.165 JupyterHub app:3093] Hub API listening on http://:8081/landerhub/hub/
[I 2023-03-11 12:39:26.163 JupyterHub app:3057] Not starting proxy
[W 2023-03-11 12:39:26.148 JupyterHub app:2594] Deleting OAuth client jupyterhub-single-user-name
[I 2023-03-11 12:39:26.147 JupyterHub app:2844] Initialized 0 spawners in 0.002 seconds
[I 2023-03-11 12:39:26.099 JupyterHub app:1934] Not using allowed_users. Any authenticated user will be allowed.
[I 2023-03-11 12:39:26.043 JupyterHub app:2805] Using Proxy: jupyterhub.proxy.ConfigurableHTTPProxy-3.0.0
[I 2023-03-11 12:39:26.043 JupyterHub app:2805] Using Spawner: kubespawner.spawner.KubeSpawner-4.2.0
[I 2023-03-11 12:39:26.043 JupyterHub app:2805] Using Authenticator: oauthenticator.azuread.AzureAdOAuthenticator-15.1.0
[I 2023-03-11 12:39:26.043 JupyterHub app:2775] Running JupyterHub version 3.0.0
Loading /usr/local/etc/jupyterhub/jupyterhub_config.d config: jupyterhub_config_custom.py
No config at /usr/local/etc/jupyterhub/existing-secret/values.yaml
Loading /usr/local/etc/jupyterhub/secret/values.yaml

What a way to spend the weekend!

Finally got it working for now and tempted to not touch it again, ever.

Thank you @manics for the clue regarding an issue with OAuthenticator. My problem was the opposite of what was being described in the GitHub issue - ie AAD authentication works with https_proxy set correctly but subsequent steps fail. I suspect I may not have set the no_proxy variable correctly - but don’t have the courage to mess with it now.

I modified https://github.com/jupyterhub/oauthenticator/blob/730c2a437ac7819719fd466313c4e438c109c7f0/oauthenticator/oauth2.py as follows and injected it into the hub pod to replace /usr/local/lib/python3.9/site-packages/oauthenticator/oauth2.py using the hub.extraFiles key in the helm chart.

Add the following immediately after the imports.

os.environ["https_proxy"] = "http://<proxy-url>@<port>"
AsyncHTTPClient.configure("tornado.curl_httpclient.CurlAsyncHTTPClient")

Modify the async def fetch function as below:

async def fetch(self, req, label="fetching", parse_json=True, **kwargs):
        """Wrapper for http requests

        logs error responses, parses successful JSON responses

        Args:
            req: tornado HTTPRequest
            label (str): label describing what is happening,
                used in log message when the request fails.
            **kwargs: remaining keyword args
                passed to underlying `client.fetch(req, **kwargs)`
        Returns:
            r: parsed JSON response
        """
        try:
            ####################################################
            os.environ["https_proxy"] = "<proxy_url>:port"
            ####################################################
            resp = await self.http_client.fetch(req, **kwargs)
        except HTTPClientError as e:
            if e.response:
                # Log failed response message for debugging purposes
                message = e.response.body.decode("utf8", "replace")
                try:
                    # guess json, reformat for readability
                    json_message = json.loads(message)
                except ValueError:
                    # not json
                    pass
                else:
                    # reformat json log message for readability
                    message = json.dumps(json_message, sort_keys=True, indent=1)
            else:
                # didn't get a response, e.g. connection error
                message = str(e)

            # log url without query params
            url = urlunparse(urlparse(req.url)._replace(query=""))
            app_log.error(f"Error {label} {e.code} {req.method} {url}: {message}")
            raise e
        else:
            if parse_json:
                if resp.body:
                    return json.loads(resp.body.decode("utf8", "replace"))
                else:
                    # empty body is None
                    return None
            else:
                return resp
        finally: 
            ####################################################
            os.environ["https_proxy"] = ""
            ####################################################

I know this is already wrong. But, I went one step further. The following appears in 2 places in that file and I changed them both as below.

req = HTTPRequest(
            url,
            method="POST",
            headers=self.build_token_info_request_headers(),
            body=json.dumps(params),
            validate_cert=self.validate_server_cert,
            proxy_host="<proxy_url",
            proxy_port=<port>,
        )

DON’T DO THIS PEOPLE. I don’t know why it works or which of those hacks got it working. I am not sure what happens when 2 or more people try to login at the same time. Will report back if bad things start happening or my dog starts talking.

I still think that I have missed something important in the no_proxy variable which would make all of the above unnecessary.

Z2JH already defaults to using tornado.curl_httpclient.CurlAsyncHTTPClient:

I’ve opened a PR that enables the tornado proxy config to be passed directly:

Something like

c.OAuthenticator.http_request_kwargs = {
    "proxy_host": "proxy.example.org",
    "proxy_port": 8080,
}

@vvcb Are you able to test it (after backing up your current config!)?

1 Like

:smiley: So, I accidentally ended up somewhere in the right vicinity.

Thank you for working on this on a Sunday afternoon. Happy to test it (:cold_sweat:) but I think it is high time I spun up a second dev cluster to do all of this.

When I first set this all up almost 2 years ago, it was meant as a proof-of-concept. Through a combination of demand and the extremely slick experience that is z2jh, the poc turned into prod with little resilience.

Covering the junior doctor strike for the next 3 days - so probably next weekend unless I can squeeze a few hours in the evening.

Thanks once again. Will be good to catch up regarding TREs as well.

1 Like

@lavanyasiva, how did you delete the environment variables for the spawner? Wondering if there is a simpler solution than what I have implemented at present while we wait for @manics PR to go through.

For now, in the spawner.py, I deleted the env variables:

class KubeSpawner(Spawner):

“”"

A JupyterHub spawner that spawn pods in a Kubernetes Cluster. Each server

spawned by a user will have its own KubeSpawner instance.

“”"

#Unset the HTTPS_PROXY and HTTP_PROXY

del os.environ[‘HTTPS_PROXY’]

del os.environ[‘HTTP_PROXY’]

1 Like

Awesome. Thanks for the quick response. One more thing for me to test and report back.

HTTPS_PROXY and HTTP_PROXY shouldn’t be set unless you’re explicitly setting them in Z2JH, since everything is in a container.

If you need it set in some places but not others it’d be useful if someone could list all the different places, other than OAuthenticator which we already know about.

For our setup, proxy is set on the nodes in the AKS cluster at cluster creation time and cannot be changed for the node without recreating the cluster.

Without this the private cluster, which is deployed in an Azure VNET peered with on-prem network and all traffic routed down an ExpressRoute and through our on-prem firewall in our hybrid environment, never spins up as it is unable to fetch the initial images from MCR (and possibly others) needed for cluster creation. And there were other errors popping up in VMSS extension.

After successful cluster creation, all deployments inherit these variables by default.

But with Z2JH I am having to unset the proxy variable after authentication to allow user pod creation.

Which is why I think it may be an issue with the no_proxy variable - either I have not set it correctly (I think I have) or Z2JH or a dependency not respecting this.

I will get someone to look at our entire AKS networking setup and see what we have to do differently.

But with TREs and hybrid environments becoming more common, I am sure there will be others facing similar conditions.

1 Like