Jupyterhub on k8s/Azure intermittently times out with no events

Hello - I have a jupyterhub installed on Azure following the z2jh instructions.

Occasionally (about 50% of the time), when I try to start a server on the hub, I get something like this:

And then, nothing happens, until eventually:

While this is happening, I see in the hub logs something like:

[I 2021-03-21 17:20:42.161 JupyterHub log:174] 302 GET /user/arokem/lab -> /hub/user/arokem/lab (@10.240.0.35) 1.13ms
[E 2021-03-21 17:20:42.184 JupyterHub log:174] 503 GET /hub/user/arokem/lab (arokem@10.240.0.35) 4.68ms
[I 2021-03-21 17:20:43.462 JupyterHub log:174] 200 GET /hub/health (@10.240.0.4) 0.80ms
[I 2021-03-21 17:20:43.573 JupyterHub log:174] 200 GET /hub/spawn/arokem?next=%2Fhub%2Fuser%2Farokem%2Flab (arokem@10.240.0.35) 2.94ms
[W 2021-03-21 17:20:45.639 JupyterHub base:950] User arokem is slow to start (timeout=0)
[I 2021-03-21 17:20:45.642 JupyterHub log:174] 302 POST /hub/spawn/arokem?next=%2Fhub%2Fuser%2Farokem%2Flab -> /hub/user/arokem/lab (arokem@10.240.0.35) 64.32ms
[I 2021-03-21 17:20:45.661 JupyterHub log:174] 303 GET /hub/user/arokem/lab (arokem@10.240.0.35) 2.47ms
[I 2021-03-21 17:20:45.679 JupyterHub pages:347] arokem is pending spawn
[I 2021-03-21 17:20:45.680 JupyterHub log:174] 200 GET /hub/spawn-pending/arokem?next=%2Fhub%2Fuser%2Farokem%2Flab (arokem@10.240.0.35) 2.41ms
[I 2021-03-21 17:20:53.462 JupyterHub log:174] 200 GET /hub/health (@10.240.0.4) 1.09ms
[I 2021-03-21 17:21:00.476 JupyterHub proxy:320] Checking routes
[I 2021-03-21 17:21:03.461 JupyterHub log:174] 200 GET /hub/health (@10.240.0.4) 0.73ms
[I 2021-03-21 17:21:13.462 JupyterHub log:174] 200 GET /hub/health (@10.240.0.4) 0.93ms
[I 2021-03-21 17:21:23.462 JupyterHub log:174] 200 GET /hub/health (@10.240.0.4) 1.11ms
[I 2021-03-21 17:21:33.462 JupyterHub log:174] 200 GET /hub/health (@10.240.0.4) 0.96ms
[I 2021-03-21 17:21:43.463 JupyterHub log:174] 200 GET /hub/health (@10.240.0.4) 1.56ms
[I 2021-03-21 17:21:53.462 JupyterHub log:174] 200 GET /hub/health (@10.240.0.4) 1.03ms
[I 2021-03-21 17:22:00.476 JupyterHub proxy:320] Checking routes
[I 2021-03-21 17:22:03.462 JupyterHub log:174] 200 GET /hub/health (@10.240.0.4) 0.74ms
[I 2021-03-21 17:22:13.462 JupyterHub log:174] 200 GET /hub/health (@10.240.0.4) 0.76ms
[I 2021-03-21 17:22:23.462 JupyterHub log:174] 200 GET /hub/health (@10.240.0.4) 1.15ms
[I 2021-03-21 17:22:33.462 JupyterHub log:174] 200 GET /hub/health (@10.240.0.4) 0.99ms
[I 2021-03-21 17:22:43.461 JupyterHub log:174] 200 GET /hub/health (@10.240.0.4) 0.73ms
[I 2021-03-21 17:22:53.462 JupyterHub log:174] 200 GET /hub/health (@10.240.0.4) 1.20ms
[I 2021-03-21 17:23:00.476 JupyterHub proxy:320] Checking routes
[I 2021-03-21 17:23:03.461 JupyterHub log:174] 200 GET /hub/health (@10.240.0.4) 0.72ms
[I 2021-03-21 17:23:13.461 JupyterHub log:174] 200 GET /hub/health (@10.240.0.4) 0.78ms
[I 2021-03-21 17:23:23.462 JupyterHub log:174] 200 GET /hub/health (@10.240.0.4) 1.17ms
[I 2021-03-21 17:23:33.462 JupyterHub log:174] 200 GET /hub/health (@10.240.0.4) 1.13ms
[I 2021-03-21 17:23:43.462 JupyterHub log:174] 200 GET /hub/health (@10.240.0.4) 1.13ms
[I 2021-03-21 17:23:53.462 JupyterHub log:174] 200 GET /hub/health (@10.240.0.4) 0.86ms
[I 2021-03-21 17:24:00.477 JupyterHub proxy:320] Checking routes
[I 2021-03-21 17:24:03.462 JupyterHub log:174] 200 GET /hub/health (@10.240.0.4) 0.82ms

Though this doesn’t seem to be different from what I see when the server does launch successfully. Any ideas how to debug/fix?

Thank you!

Related to AKS reliability issue - pending spawn / pending stop - resolved but undocumented fix · Issue #282 · jupyterhub/kubespawner · GitHub maybe?

1 Like

Yes. As suggested here, adding the following:

hub:
  extraEnv:
    KUBERNETES_SERVICE_HOST: kubernetes.default.svc.cluster.local

to my hub config might be solving the issue. Now, I see that empty bar sitting there, but just until the user pod goes from “Init” to “Running” (about 30sec in my case), and then the progress bar quickly fills and the lab UI shows up. As the issue was intermittent, I will keep monitoring this and I’ll report back if I see anything funky. Thank you!

OK - looks like this doesn’t solve the problem I am experiencing - it’s happening again. It’s probably related to the linked issue, but the quick fix I mentioned doesn’t work. I’ll continue to investigate and report back if I learn anything useful.

What version of z2jh are you on? This should’ve hopefully been fixed in v0.11

How can I tell what version of z2jh I used? It has been a few months. I am using version 0.9 of the helm chart.

Ah, the helm chart is the z2jh version. I think this was fixed in 0.10 or 0.11.

1 Like