Jupyterhub on k8s/Azure intermittently times out with no events

arokem · March 21, 2021, 5:34pm

Hello - I have a jupyterhub installed on Azure following the z2jh instructions.

Occasionally (about 50% of the time), when I try to start a server on the hub, I get something like this:

And then, nothing happens, until eventually:

While this is happening, I see in the hub logs something like:

[I 2021-03-21 17:20:42.161 JupyterHub log:174] 302 GET /user/arokem/lab -> /hub/user/arokem/lab (@10.240.0.35) 1.13ms
[E 2021-03-21 17:20:42.184 JupyterHub log:174] 503 GET /hub/user/arokem/lab (arokem@10.240.0.35) 4.68ms
[I 2021-03-21 17:20:43.462 JupyterHub log:174] 200 GET /hub/health (@10.240.0.4) 0.80ms
[I 2021-03-21 17:20:43.573 JupyterHub log:174] 200 GET /hub/spawn/arokem?next=%2Fhub%2Fuser%2Farokem%2Flab (arokem@10.240.0.35) 2.94ms
[W 2021-03-21 17:20:45.639 JupyterHub base:950] User arokem is slow to start (timeout=0)
[I 2021-03-21 17:20:45.642 JupyterHub log:174] 302 POST /hub/spawn/arokem?next=%2Fhub%2Fuser%2Farokem%2Flab -> /hub/user/arokem/lab (arokem@10.240.0.35) 64.32ms
[I 2021-03-21 17:20:45.661 JupyterHub log:174] 303 GET /hub/user/arokem/lab (arokem@10.240.0.35) 2.47ms
[I 2021-03-21 17:20:45.679 JupyterHub pages:347] arokem is pending spawn
[I 2021-03-21 17:20:45.680 JupyterHub log:174] 200 GET /hub/spawn-pending/arokem?next=%2Fhub%2Fuser%2Farokem%2Flab (arokem@10.240.0.35) 2.41ms
[I 2021-03-21 17:20:53.462 JupyterHub log:174] 200 GET /hub/health (@10.240.0.4) 1.09ms
[I 2021-03-21 17:21:00.476 JupyterHub proxy:320] Checking routes
[I 2021-03-21 17:21:03.461 JupyterHub log:174] 200 GET /hub/health (@10.240.0.4) 0.73ms
[I 2021-03-21 17:21:13.462 JupyterHub log:174] 200 GET /hub/health (@10.240.0.4) 0.93ms
[I 2021-03-21 17:21:23.462 JupyterHub log:174] 200 GET /hub/health (@10.240.0.4) 1.11ms
[I 2021-03-21 17:21:33.462 JupyterHub log:174] 200 GET /hub/health (@10.240.0.4) 0.96ms
[I 2021-03-21 17:21:43.463 JupyterHub log:174] 200 GET /hub/health (@10.240.0.4) 1.56ms
[I 2021-03-21 17:21:53.462 JupyterHub log:174] 200 GET /hub/health (@10.240.0.4) 1.03ms
[I 2021-03-21 17:22:00.476 JupyterHub proxy:320] Checking routes
[I 2021-03-21 17:22:03.462 JupyterHub log:174] 200 GET /hub/health (@10.240.0.4) 0.74ms
[I 2021-03-21 17:22:13.462 JupyterHub log:174] 200 GET /hub/health (@10.240.0.4) 0.76ms
[I 2021-03-21 17:22:23.462 JupyterHub log:174] 200 GET /hub/health (@10.240.0.4) 1.15ms
[I 2021-03-21 17:22:33.462 JupyterHub log:174] 200 GET /hub/health (@10.240.0.4) 0.99ms
[I 2021-03-21 17:22:43.461 JupyterHub log:174] 200 GET /hub/health (@10.240.0.4) 0.73ms
[I 2021-03-21 17:22:53.462 JupyterHub log:174] 200 GET /hub/health (@10.240.0.4) 1.20ms
[I 2021-03-21 17:23:00.476 JupyterHub proxy:320] Checking routes
[I 2021-03-21 17:23:03.461 JupyterHub log:174] 200 GET /hub/health (@10.240.0.4) 0.72ms
[I 2021-03-21 17:23:13.461 JupyterHub log:174] 200 GET /hub/health (@10.240.0.4) 0.78ms
[I 2021-03-21 17:23:23.462 JupyterHub log:174] 200 GET /hub/health (@10.240.0.4) 1.17ms
[I 2021-03-21 17:23:33.462 JupyterHub log:174] 200 GET /hub/health (@10.240.0.4) 1.13ms
[I 2021-03-21 17:23:43.462 JupyterHub log:174] 200 GET /hub/health (@10.240.0.4) 1.13ms
[I 2021-03-21 17:23:53.462 JupyterHub log:174] 200 GET /hub/health (@10.240.0.4) 0.86ms
[I 2021-03-21 17:24:00.477 JupyterHub proxy:320] Checking routes
[I 2021-03-21 17:24:03.462 JupyterHub log:174] 200 GET /hub/health (@10.240.0.4) 0.82ms

Though this doesn’t seem to be different from what I see when the server does launch successfully. Any ideas how to debug/fix?

Thank you!

yuvipanda · March 21, 2021, 8:28pm

Related to AKS reliability issue - pending spawn / pending stop - resolved but undocumented fix · Issue #282 · jupyterhub/kubespawner · GitHub maybe?

arokem · March 21, 2021, 10:56pm

Yes. As suggested here, adding the following:

hub:
  extraEnv:
    KUBERNETES_SERVICE_HOST: kubernetes.default.svc.cluster.local

to my hub config might be solving the issue. Now, I see that empty bar sitting there, but just until the user pod goes from “Init” to “Running” (about 30sec in my case), and then the progress bar quickly fills and the lab UI shows up. As the issue was intermittent, I will keep monitoring this and I’ll report back if I see anything funky. Thank you!

arokem · March 22, 2021, 3:25am

OK - looks like this doesn’t solve the problem I am experiencing - it’s happening again. It’s probably related to the linked issue, but the quick fix I mentioned doesn’t work. I’ll continue to investigate and report back if I learn anything useful.

yuvipanda · March 22, 2021, 4:46am

What version of z2jh are you on? This should’ve hopefully been fixed in v0.11

arokem · March 22, 2021, 1:15pm

How can I tell what version of z2jh I used? It has been a few months. I am using version 0.9 of the helm chart.

yuvipanda · March 22, 2021, 3:13pm

Ah, the helm chart is the z2jh version. I think this was fixed in 0.10 or 0.11.

Topic		Replies	Views
Spawn failed: Timeout even when start_timeout is set to 3600 seconds Zero to JupyterHub on Kubernetes help-wanted	24	12374	September 4, 2024
JupyterHub spawns not actually launching Zero to JupyterHub on Kubernetes help-wanted	15	4475	June 16, 2023
Spawn failed: Server at http://some_ip:8888/user/a/ didn't respond in 30 seconds Zero to JupyterHub on Kubernetes jupyterhub	4	1576	February 7, 2024
Error notifying Hub of activity Zero to JupyterHub on Kubernetes jupyterhub , how-to , help-wanted	5	100	February 25, 2025
TLJH Jupyterhub randomly times out JupyterHub jupyterhub , help-wanted , reproducibility	1	955	September 29, 2022

Jupyterhub on k8s/Azure intermittently times out with no events

Related topics