TLJH Jupyterhub randomly times out

Hello everyone!

Problem: Our jupyterhub “randomly” times out. We can not reliably reproduce the error. High user numbers force the error more often than not. But sometimes it runs fine with 70 useres / sometimes it crashes with 15.

The jupyterhub service itself seems to become hung up. The server either does not produce requests at all or starts responding with 502 bad gateway. We have looked at the logs and tried to find the root source, but so far are unlucky.

Symptoms

  • Traefik gives a lot of these:
    Sep 27 13:10:34 FBMVJUPYT2 traefik[741]: time="2022-09-27T13:10:34+02:00" level=error msg="vulcand/oxy/forward/websocket: Error dialing \"ourservernamechangedbyme\": websocket: ba>

  • Tornado produces a whole lot of:
    Sep 27 12:54:03 FBMVJUPYT2 bash[4927]: raise MaxRetryError(_pool, url, error or ResponseError(cause))

  • Ressources are not a problem. The server has plenty of unused CPU and RAM

Config

We sit behind our universities firewall. Https enabled, Certificates seem to work fine. Users identify via LDAP, works fine as well. Installation via TLJH is rather new. Jupyterhub is version 1.5.0

We have adjusted the jupyterhub config so that:

c = get_config()
c.JupyterHub.bind_url = 'http://127.0.0.1:8000'

This did improve the situation. But the overall problem still exists.

Compared to our old jupyterhub (3 years) the current one also seems to produce much slower reponse times. We guess it is a proxy issue, but are currently, not sure on how to proceed debugging this issue.

Please let me know, which information is helpful and I will gladly provide more details.

I can share some more details here and truly appreciate any second thoughts on where to look for errors.

Currently, we restart the jupyterhub service every 2min via crontab. I know this is not pretty but usually this takes less than half a second and keeps the hub running most of the time.

The acutal problem remains. Every so often the crontab / systemctl fails to restart the jupyterhub service because it is completly jammed up. I have switched to debug log outputs and this is what we are seeing… Any ideas?


Sep 28 08:17:17 FBMVJUPYT2 python3[61032]: [D 2022-09-28 08:17:17.667 JupyterHub spawner:1179] Polling subprocess every 30s
Sep 28 08:17:18 FBMVJUPYT2 python3[61032]: [I 2022-09-28 08:17:18.476 JupyterHub log:189] 302 GET /hub/spawn/someuser?next=%2Fhub%2Fuser%2Fsomeuser%2Ftree ->>
Sep 28 08:17:18 FBMVJUPYT2 python3[61032]: [I 2022-09-28 08:17:18.532 JupyterHub pages:402] someuser is pending spawn
Sep 28 08:17:18 FBMVJUPYT2 python3[61032]: [I 2022-09-28 08:17:18.533 JupyterHub log:189] 200 GET /hub/spawn-pending/someuser?next=%2Fhub%2Fuser%2Fsomeuser%2>
Sep 28 08:17:19 FBMVJUPYT2 python3[61032]: [I 2022-09-28 08:17:19.362 JupyterHub log:189] 200 GET /hub/api (@127.0.0.1) 1.55ms
Sep 28 08:17:19 FBMVJUPYT2 python3[61032]: [D 2022-09-28 08:17:19.381 JupyterHub base:283] Recording first activity for <APIToken('3ad0...', user='someuser')>
Sep 28 08:17:19 FBMVJUPYT2 python3[61032]: [D 2022-09-28 08:17:19.399 JupyterHub users:778] Activity for user someuser: 2022-09-28T06:17:18.557800Z
Sep 28 08:17:19 FBMVJUPYT2 python3[61032]: [D 2022-09-28 08:17:19.399 JupyterHub users:796] Activity on server someuser/: 2022-09-28T06:17:18.557800Z
Sep 28 08:17:19 FBMVJUPYT2 python3[61032]: [I 2022-09-28 08:17:19.407 JupyterHub log:189] 200 POST /hub/api/users/someuser/activity (someuser@127.0.0.1) 30.9>
Sep 28 08:17:20 FBMVJUPYT2 python3[61032]: [D 2022-09-28 08:17:20.634 JupyterHub utils:225] Server at http://127.0.0.1:37945/user/someuser/ responded with 302
Sep 28 08:17:20 FBMVJUPYT2 python3[61032]: [D 2022-09-28 08:17:20.635 JupyterHub _version:76] jupyterhub and jupyterhub-singleuser both on version 1.5.0
Sep 28 08:17:20 FBMVJUPYT2 python3[61032]: [I 2022-09-28 08:17:20.635 JupyterHub base:909] User someuser took 3.163 seconds to start
Sep 28 08:17:20 FBMVJUPYT2 python3[61032]: [I 2022-09-28 08:17:20.635 JupyterHub proxy:285] Adding user someuser to proxy /user/someuser/ => http://127.0.0.1>
Sep 28 08:17:20 FBMVJUPYT2 python3[61032]: [I 2022-09-28 08:17:20.639 JupyterHub proxy:135] Waiting for /user/someuser to register with traefik
Sep 28 08:17:20 FBMVJUPYT2 python3[61032]: [D 2022-09-28 08:17:20.640 JupyterHub proxy:159] Fetching traefik api http://127.0.0.1:8099/api/providers/file/bac>
Sep 28 08:17:20 FBMVJUPYT2 python3[61032]: [D 2022-09-28 08:17:20.646 JupyterHub proxy:169] 200 GET http://127.0.0.1:8099/api/providers/file/backends
Sep 28 08:17:20 FBMVJUPYT2 python3[61032]: [D 2022-09-28 08:17:20.647 JupyterHub proxy:127] traefik backend__2Fuser_2Fsomeuser not yet in backends
Sep 28 08:17:20 FBMVJUPYT2 python3[61032]: [D 2022-09-28 08:17:20.647 JupyterHub proxy:128] Current traefik backends: {'backend__2F': {'servers': {'server1':>
Sep 28 08:17:20 FBMVJUPYT2 python3[61032]: [D 2022-09-28 08:17:20.783 JupyterHub proxy:159] Fetching traefik api http://127.0.0.1:8099/api/providers/file/bac>
Sep 28 08:17:20 FBMVJUPYT2 python3[61032]: [D 2022-09-28 08:17:20.786 JupyterHub proxy:169] 200 GET http://127.0.0.1:8099/api/providers/file/backends
Sep 28 08:17:20 FBMVJUPYT2 python3[61032]: [D 2022-09-28 08:17:20.786 JupyterHub proxy:159] Fetching traefik api http://127.0.0.1:8099/api/providers/file/fro>
Sep 28 08:17:20 FBMVJUPYT2 python3[61032]: [D 2022-09-28 08:17:20.789 JupyterHub proxy:169] 200 GET http://127.0.0.1:8099/api/providers/file/frontends
Sep 28 08:17:20 FBMVJUPYT2 python3[61032]: [I 2022-09-28 08:17:20.790 JupyterHub users:677] Server someuser is ready
Sep 28 08:17:20 FBMVJUPYT2 python3[61032]: [I 2022-09-28 08:17:20.791 JupyterHub log:189] 200 GET /hub/api/users/someuser/server/progress (someuser@144.222.1>
Sep 28 08:17:20 FBMVJUPYT2 python3[61032]: [I 2022-09-28 08:17:20.987 JupyterHub log:189] 302 GET /hub/spawn-pending/someuser?next=%2Fhub%2Fuser%2Fsomeuser%2>
Sep 28 08:17:21 FBMVJUPYT2 python3[61032]: [I 2022-09-28 08:17:21.017 JupyterHub log:189] 302 GET /hub/user/someuser/tree -> /user/someuser/tree?redirects=1 >
Sep 28 08:17:21 FBMVJUPYT2 python3[61032]: [D 2022-09-28 08:17:21.071 JupyterHub provider:409] Validating client id jupyterhub-user-someuser
Sep 28 08:17:21 FBMVJUPYT2 python3[61032]: [D 2022-09-28 08:17:21.073 JupyterHub provider:484] validate_redirect_uri: client_id=jupyterhub-user-someuser, red>
Sep 28 08:17:21 FBMVJUPYT2 python3[61032]: [D 2022-09-28 08:17:21.075 JupyterHub base:526] Setting cookie jupyterhub-session-id: {'httponly': True, 'secure':>
Sep 28 08:17:21 FBMVJUPYT2 python3[61032]: [D 2022-09-28 08:17:21.077 JupyterHub auth:252] Skipping oauth confirmation for <User(someuser 1/1 running)> acces>
Sep 28 08:17:21 FBMVJUPYT2 python3[61032]: [D 2022-09-28 08:17:21.078 JupyterHub provider:409] Validating client id jupyterhub-user-someuser
Sep 28 08:17:21 FBMVJUPYT2 python3[61032]: [D 2022-09-28 08:17:21.079 JupyterHub provider:484] validate_redirect_uri: client_id=jupyterhub-user-someuser, red>
Sep 28 08:17:21 FBMVJUPYT2 python3[61032]: [D 2022-09-28 08:17:21.081 JupyterHub provider:231] Saving authorization code jupyterhub-user-someuser, Gie..., ()>
Sep 28 08:17:21 FBMVJUPYT2 python3[61032]: [I 2022-09-28 08:17:21.526 JupyterHub log:189] 302 GET /hub/api/oauth2/authorize?client_id=jupyterhub-user-smfakli>
Sep 28 08:17:21 FBMVJUPYT2 python3[61032]: [D 2022-09-28 08:17:21.633 JupyterHub provider:55] authenticate_client <oauthlib.Request SANITIZED>
Sep 28 08:17:21 FBMVJUPYT2 python3[61032]: [D 2022-09-28 08:17:21.640 JupyterHub provider:111] confirm_redirect_uri: client_id=jupyterhub-user-someuser, redi>
Sep 28 08:17:21 FBMVJUPYT2 python3[61032]: [D 2022-09-28 08:17:21.640 JupyterHub provider:334] Saving bearer token {'access_token': 'REDACTED', 'expires_in':>
Sep 28 08:17:21 FBMVJUPYT2 python3[61032]: [D 2022-09-28 08:17:21.684 JupyterHub provider:189] Deleting oauth code Gie... for jupyterhub-user-someuser
Sep 28 08:17:21 FBMVJUPYT2 python3[61032]: [I 2022-09-28 08:17:21.700 JupyterHub log:189] 200 POST /hub/api/oauth2/token (someuser@127.0.0.1) 74.16ms
Sep 28 08:17:21 FBMVJUPYT2 python3[61032]: [I 2022-09-28 08:17:21.741 JupyterHub log:189] 200 GET /hub/api/authorizations/token/[secret] (someuser@127.0.0.1)>
Sep 28 08:17:21 FBMVJUPYT2 python3[61032]: [D 2022-09-28 08:17:21.860 JupyterHub log:189] 304 GET /hub/logo (@144.222.132.39) 1.97ms
Sep 28 08:17:23 FBMVJUPYT2 python3[61032]: [I 2022-09-28 08:17:23.446 JupyterHub log:189] 200 GET /hub/api/ (cull-idle@127.0.0.1) 17.14ms
Sep 28 08:17:23 FBMVJUPYT2 python3[61032]: [I 2022-09-28 08:17:23.453 JupyterHub log:189] 200 GET /hub/api/users?state=[secret] (cull-idle@127.0.0.1) 5.06ms
Sep 28 08:17:27 FBMVJUPYT2 python3[61032]: [I 2022-09-28 08:17:27.213 JupyterHub log:189] 302 GET / -> /hub/ (@144.222.199.103) 2.34ms
Sep 28 08:17:27 FBMVJUPYT2 python3[61032]: [I 2022-09-28 08:17:27.220 JupyterHub log:189] 302 GET /hub/ -> /hub/login?next=%2Fhub%2F (@144.222.199.103) 1.49ms
Sep 28 08:17:27 FBMVJUPYT2 python3[61032]: [I 2022-09-28 08:17:27.230 JupyterHub log:189] 200 GET /hub/login?next=%2Fhub%2F (@144.222.199.103) 4.81ms
Sep 28 08:17:43 FBMVJUPYT2 python3[61032]: [I 2022-09-28 08:17:43.438 JupyterHub log:189] 200 GET /hub/api/ (cull-idle@127.0.0.1) 8.82ms
Sep 28 08:17:43 FBMVJUPYT2 python3[61032]: [I 2022-09-28 08:17:43.445 JupyterHub log:189] 200 GET /hub/api/users?state=[secret] (cull-idle@127.0.0.1) 5.29ms
Sep 28 08:18:01 FBMVJUPYT2 systemd[1]: Stopping jupyterhub.service...
Sep 28 08:19:32 FBMVJUPYT2 systemd[1]: jupyterhub.service: State 'stop-sigterm' timed out. Killing.
Sep 28 08:19:32 FBMVJUPYT2 systemd[1]: jupyterhub.service: Killing process 61032 (python3) with signal SIGKILL.
Sep 28 08:19:32 FBMVJUPYT2 systemd[1]: jupyterhub.service: Main process exited, code=killed, status=9/KILL
Sep 28 08:19:32 FBMVJUPYT2 systemd[1]: jupyterhub.service: Failed with result 'timeout'.
Sep 28 08:19:32 FBMVJUPYT2 systemd[1]: Stopped jupyterhub.service.
Sep 28 08:19:32 FBMVJUPYT2 systemd[1]: Started jupyterhub.service.