Jupyterhub: idle_culler errors out with tornado-ioloop error

Bug description

[E 250114 23:32:18 ioloop:941] Exception in callback functools.partial(<function cull_idle at 0x7feaec7f3ca0>, url=‘http://localhost:8081/hub/api’, api_token=‘2f57c9af4794465d92285fcbe4d31b9e’, inactive_limit=72000, cull_users=False, remove_named_servers=True, max_age=0, concurrency=10, ssl_enabled=False, internal_certs_location=‘internal-ssl’, cull_admin_users=False, api_page_size=0, cull_default_servers=True, cull_named_servers=True)
Traceback (most recent call last):
File “/usr/local/lib/python3.9/dist-packages/tornado/ioloop.py”, line 939, in _run
await val
File “/usr/local/lib/python3.9/dist-packages/jupyterhub_idle_culler/init.py”, line 171, in cull_idle
resp = await fetch(HTTPRequest(url=f"{url}/", headers=auth_header))
File “/usr/local/lib/python3.9/dist-packages/jupyterhub_idle_culler/init.py”, line 124, in fetch
return await client.fetch(req)
tornado.simple_httpclient.HTTPTimeoutError: Timeout during request

These logs show up on the hub pod and it stops responding to any requests

How to reproduce

Unable to reproduce.
Hub just stops working after we see the error. We didnt see the error when we were on :
jupyterhub-idle-culler==1.3.1
jupyterhub-kubespawner==5.0.0
jupyterhub==4.0.0
tornado==6.4

Actual behaviour

Hub just stops working after we see the error in the logs.

Your personal set up

We have these on our jupyterhub K8s pod:
jupyterhub==4.1.6
jupyterhub-idle-culler==1.4.0
jupyterhub-kubespawner==5.0.0
tornado==6.4
python=3.9

values.yaml →
cull:
adminUsers: false # --cull-admin-users
removeNamedServers: true # --remove-named-servers
timeout: 72000 # 20hrs
every: 600
concurrency: 10 # --concurrency
maxAge: 0 # --max-age

Please let me know if you need more info. Thanks!

Since you said the hub stops working after this, I think it is likely this is a symptom of the hub stopping working, not the cause. If the hub hangs, this is what it would look like in the idle culler. Can you share more logs from the Hub preceding this event?

Hello @minrk ,

" likely this is a symptom of the hub stopping working, not the cause" - this makes sense! We just assumed something was wrong on the culler :confused:

We see this error a few times but the hub recovers:

[D 2025-01-14 17:00:44.419 JupyterHub roles:282] Assigning default role to User username1
[I 2025-01-14 17:00:44.425 JupyterHub roles:239] Adding role user for User: username1
[D 2025-01-14 17:00:44.440 JupyterHub roles:282] Assigning default role to User username1
[D 2025-01-14 17:00:44.454 JupyterHub base:668] Setting cookie jupyterhub-session-id: {'httponly': True, 'secure': True, 'path': '/'}
[D 2025-01-14 17:00:44.454 JupyterHub base:672] Setting cookie for username1: jupyterhub-hub-login
[D 2025-01-14 17:00:44.454 JupyterHub base:668] Setting cookie jupyterhub-hub-login: {'httponly': True, 'secure': True, 'path': '/hub/'}
[I 2025-01-14 17:00:44.454 JupyterHub _xsrf_utils:125] Setting new xsrf cookie for b'yyy=:2b58' {'path': '/hub/'}
[I 2025-01-14 17:00:44.454 JupyterHub base:937] User logged in: username1
[I 2025-01-14 17:00:44.455 JupyterHub log:192] 302 GET /hub/oauth_callback?code=[secret]&state=[secret] -> /hub/ (username1@10.2.13.49) 799.85ms
[D 2025-01-14 17:00:44.563 JupyterHub base:362] Recording first activity for <User(username1 0/1 running)>
[D 2025-01-14 17:00:44.573 JupyterHub user:431] Creating <class 'kubespawner.spawner.KubeSpawner'> for username1:
[I 2025-01-14 17:00:44.575 JupyterHub log:192] 302 GET /hub/ -> /hub/spawn (username1@10.2.13.49) 18.72ms
[D 2025-01-14 17:00:44.677 JupyterHub scopes:884] Checking access to /hub/spawn via scope servers
[D 2025-01-14 17:00:44.677 JupyterHub scopes:697] Argument-based access to /hub/spawn via servers
-----------Auth State -------------------{'access_token': 'h0JGN--6Qd3w7', 'refresh_token': 'v1.-Qo2n-', 'id_token': '..---Hpqn---',...}
[D 2025-01-14 17:00:47.649 JupyterHub pages:210] Serving options form for username1
[D 2025-01-14 17:00:47.654 JupyterHub _xsrf_utils:155] xsrf id mismatch b'4OeZ84c58' != b'49ea86c58'
[I 2025-01-14 17:00:47.654 JupyterHub _xsrf_utils:125] Setting new xsrf cookie for b'49ea864c58' {'path': '/hub/'}
[I 2025-01-14 17:00:47.655 JupyterHub log:192] 200 GET /hub/spawn (username1@10.2.13.49) 2981.70ms
[D 2025-01-14 17:00:47.656 JupyterHub log:192] 200 GET /hub/health (@172.16.4.1) 4.55ms
[D 2025-01-14 17:00:47.656 JupyterHub log:192] 200 GET /hub/health (@172.16.4.1) 4.35ms
[D 2025-01-14 17:00:47.789 JupyterHub log:192] 200 GET /hub/static/css/style.min.css?v=0a0508e2523bfc72e8e56c635fbe1778b63eac079a8364da92a699f0ef51a6f817edf0ba3bf533fd8d8c5ab18ca25cd9bd9d3c546ef56d264724a95575a6fae8 (@10.2.13.49) 1.28ms
[D 2025-01-14 17:00:48.008 JupyterHub log:192] 200 GET /hub/static/components/requirejs/require.js?v=1ff44af658602d913b22fca97c78f98945f47e76dacf9800f32f35350f05e9acda6dc710b8501579076f3980de02f02c97f5994ce1a9864c21865a42262d79ec (@10.2.13.49) 1.22ms
[D 2025-01-14 17:00:48.009 JupyterHub log:192] 200 GET /hub/static/components/jquery/dist/jquery.min.js?v=bf6089ed4698cb8270a8b0c8ad9508ff886a7a842278e98064d5c1790ca3a36d5d69d9f047ef196882554fc104da2c88eb5395f1ee8cf0f3f6ff8869408350fe (@10.2.13.49) 1.03ms
[D 2025-01-14 17:00:48.077 JupyterHub log:192] 200 GET /hub/static/components/bootstrap/dist/js/bootstrap.min.js?v=a014e9acc78d10a0a7a9fbaa29deac6ef17398542d9574b77b40bf446155d210fa43384757e3837da41b025998ebfab4b9b6f094033f9c226392b800df068bce (@10.2.13.49) 0.96ms
[D 2025-01-14 17:00:48.174 JupyterHub log:192] 200 GET /hub/logo (@10.2.13.49) 0.90ms
[D 2025-01-14 17:00:48.353 JupyterHub log:192] 200 GET /hub/static/components/font-awesome/fonts/fontawesome-webfont.woff2?v=4.7.0 (@10.2.13.49) 1.25ms
[D 2025-01-14 17:00:48.468 JupyterHub log:192] 200 GET /hub/static/favicon.ico?v=fde5757cd3892b979919d3b1faa88a410f28829feb5ba22b6cf069f2c6c98675fceef90f932e49b510e74d65c681d5846b943e7f7cc1b41867422f0481085c1f (@10.2.13.49) 1.04ms
[D 2025-01-14 17:00:48.509 JupyterHub base:407] Refreshing auth for username12
[D 2025-01-14 17:00:48.514 JupyterHub scopes:884] Checking access to /hub/api/users/username12/activity via scope users:activity
[D 2025-01-14 17:00:48.514 JupyterHub scopes:697] Argument-based access to /hub/api/users/username12/activity via users:activity
[D 2025-01-14 17:00:48.517 JupyterHub users:903] Not updating activity for <User(username12 1/1 running)>: 2025-01-14T13:02:13.853083Z < 2025-01-14T13:03:30.752000Z
[D 2025-01-14 17:00:48.517 JupyterHub users:924] Not updating server activity on username12/: 2025-01-14T13:02:13.853083Z < 2025-01-14T13:03:30.752000Z
[I 2025-01-14 17:00:48.518 JupyterHub log:192] 200 POST /hub/api/users/username12/activity (username12@172.16.9.8) 19.95ms
[D 2025-01-14 17:00:50.147 JupyterHub scopes:884] Checking access to /hub/spawn via scope servers
[D 2025-01-14 17:00:50.147 JupyterHub scopes:697] Argument-based access to /hub/spawn via servers
[D 2025-01-14 17:00:50.148 JupyterHub pages:258] Triggering spawn with supplied form options for username1
[D 2025-01-14 17:00:50.151 JupyterHub base:1061] Initiating spawn for username1
[D 2025-01-14 17:00:50.151 JupyterHub base:1065] 0 concurrent spawns
[D 2025-01-14 17:00:50.151 JupyterHub base:1070] 231 active servers
[I 2025-01-14 17:00:50.195 JupyterHub provider:660] Creating oauth client jupyterhub-user-username1
[I 2025-01-14 17:00:50.215 JupyterHub log:192] 302 POST /hub/spawn?_xsrf=[secret] -> /hub/spawn-pending/username1?_xsrf=[secret] (username1@10.2.13.49) 73.83ms
-----------Auth State -------------------{'access_token': 'eyxww', 'auth0_user': {'sub': 'waad|dUEg', ...}}
[D 2025-01-14 17:00:50.741 JupyterHub user:797] Calling Spawner.start for username1
[D 2025-01-14 17:00:50.742 JupyterHub spawner:3050] Applying KubeSpawner override for profile 'profile1'
[D 2025-01-14 17:00:50.742 JupyterHub spawner:3062] .. overriding KubeSpawner value image=image:tag
[D 2025-01-14 17:00:50.750 JupyterHub scopes:884] Checking access to /hub/spawn-pending/username1 via scope servers
[D 2025-01-14 17:00:50.750 JupyterHub scopes:697] Argument-based access to /hub/spawn-pending/username1 via servers
[I 2025-01-14 17:00:50.758 JupyterHub spawner:2723] Pod is being modified via modify_pod_hook
---------------------------This is  modify_pod_hook-----------------------------
INFO:org.python_pkg.data_access_granter:Processing xyz as oracle ...
INFO:org.python_pkg.data_access_granter:Will check DB roles: xyz_READ_ROLE,xyz_WRITE_ROLE in DB: DBNAME
INFO:org.python_pkg.data_access_granter:User username1 is in these roles: []
ERROR:org.python_pkg.data_access_granter:ERROR: Oracle query has failed for role pqr. Error message: OperationalError('(cx_Oracle.OperationalError) ORA-01033: ORACLE initialization or shutdown in progress\nProcess ID: 0\nSession ID: 0 Serial number: 0')
ERROR:org.python_pkg.data_access_granter:ERROR: Oracle query has failed for role pqr_pat. Error message: OperationalError('(cx_Oracle.OperationalError) ORA-01033: ORACLE initialization or shutdown in progress\nProcess ID: 0\nSession ID: 0 Serial number: 0')
Current umask: 0o22
New umask: 0o2
[E 250114 17:02:18 ioloop:941] Exception in callback functools.partial(<function cull_idle at 0x7feaec7f3ca0>, url='http://localhost:8081/hub/api', api_token='2f57c9..1b9e', inactive_limit=72000, cull_users=False, remove_named_servers=True, max_age=0, concurrency=10, ssl_enabled=False, internal_certs_location='internal-ssl', cull_admin_users=False, api_page_size=0, cull_default_servers=True, cull_named_servers=True)
    Traceback (most recent call last):
      File "/usr/local/lib/python3.9/dist-packages/tornado/ioloop.py", line 939, in _run
        await val
      File "/usr/local/lib/python3.9/dist-packages/jupyterhub_idle_culler/__init__.py", line 171, in cull_idle
        resp = await fetch(HTTPRequest(url=f"{url}/", headers=auth_header))
      File "/usr/local/lib/python3.9/dist-packages/jupyterhub_idle_culler/__init__.py", line 124, in fetch
        return await client.fetch(req)
    tornado.simple_httpclient.HTTPTimeoutError: Timeout during request
Extra Environments are [{'name': 'static_url', 'value': 'https://org.com/static/'}]
[I 2025-01-14 17:02:26.505 JupyterHub spawner:2521] Attempting to create pod jupyter-username1, with timeout 3
[I 2025-01-14 17:02:26.851 JupyterHub pages:399] username1 is pending spawn
[I 2025-01-14 17:02:26.853 JupyterHub log:192] 200 GET /hub/spawn-pending/username1?_xsrf=[secret] (username1@10.2.13.49) 96110.44ms
[D 2025-01-14 17:02:26.911 JupyterHub utils:286] Waiting 1s for server at http://refresher:8081/services/refresher/
[D 2025-01-14 17:02:26.911 JupyterHub proxy:924] Proxy: Fetching GET http://proxy-api:8001/api/routes
[D 2025-01-14 17:02:26.923 JupyterHub base:407] Refreshing auth for username3
[D 2025-01-14 17:02:26.928 JupyterHub scopes:884] Checking access to /hub/api/users/username3/activity via scope users:activity
[D 2025-01-14 17:02:26.928 JupyterHub scopes:697] Argument-based access to /hub/api/users/username3/activity via users:activity
[D 2025-01-14 17:02:26.931 JupyterHub users:903] Not updating activity for <User(username3 0/1 running)>: 2025-01-12T19:00:36.995867Z < 2025-01-13T04:51:15.261682Z
[D 2025-01-14 17:02:26.932 JupyterHub users:924] Not updating server activity on username3/: 2025-01-12T19:00:36.995867Z < 2025-01-13T04:51:15.261682Z
[I 2025-01-14 17:02:26.933 JupyterHub log:192] 200 POST /hub/api/users/username3/activity (username3@172.16.11.54) 76.84ms

When the hub stops working, these are the logs:


[D 2025-01-14 19:59:18.396 JupyterHub scopes:884] Checking access to /hub/api/users/user6/activity via scope users:activity
[D 2025-01-14 19:59:18.396 JupyterHub scopes:697] Argument-based access to /hub/api/users/user6/activity via users:activity
[D 2025-01-14 19:59:18.399 JupyterHub users:903] Not updating activity for <User(user6 1/1 running)>: 2025-01-14T17:01:11.097286Z < 2025-01-14T17:01:11.098000Z
[D 2025-01-14 19:59:18.399 JupyterHub users:924] Not updating server activity on user6/: 2025-01-14T17:01:11.097286Z < 2025-01-14T17:01:11.098000Z
[I 2025-01-14 19:59:18.400 JupyterHub log:192] 200 POST /hub/api/users/user6/activity (user6@172.16.9.22) 21.03ms
[D 2025-01-14 19:59:18.447 JupyterHub utils:286] Waiting 1s for server at http://refresher:8081/services/refresher/
[D 2025-01-14 19:59:18.448 JupyterHub proxy:924] Proxy: Fetching GET http://proxy-api:8001/api/routes
[D 2025-01-14 19:59:18.450 JupyterHub utils:324] Server at http://refresher:8081/services/refresher/ responded in 0.00s
[D 2025-01-14 19:59:18.450 JupyterHub app:2515] External service refresher running at http://refresher:8081/
[D 2025-01-14 19:59:19.267 JupyterHub proxy:393] Checking routes
[D 2025-01-14 19:59:22.263 JupyterHub reflector:362] events watcher timeout
[D 2025-01-14 19:59:22.263 JupyterHub reflector:281] Connecting events watcher
[D 2025-01-14 19:59:24.610 JupyterHub reflector:362] pods watcher timeout
[D 2025-01-14 19:59:24.610 JupyterHub reflector:281] Connecting pods watcher
[D 2025-01-14 19:59:25.778 JupyterHub log:192] 200 GET /hub/health (@172.16.4.1) 1.11ms
[D 2025-01-14 19:59:25.778 JupyterHub log:192] 200 GET /hub/health (@172.16.4.1) 1.18ms
[D 2025-01-14 19:59:31.043 JupyterHub scopes:884] Checking access to /hub/api/users/user4/activity via scope users:activity
[D 2025-01-14 19:59:31.043 JupyterHub scopes:697] Argument-based access to /hub/api/users/user4/activity via users:activity
[D 2025-01-14 19:59:31.046 JupyterHub users:898] Activity for user user4: 2025-01-14T19:59:31.018451Z
[D 2025-01-14 19:59:31.046 JupyterHub users:916] Activity on server user4/: 2025-01-14T19:59:31.018451Z
[I 2025-01-14 19:59:31.053 JupyterHub log:192] 200 POST /hub/api/users/user4/activity (user4@172.16.8.7) 27.42ms
[D 2025-01-14 19:59:31.273 JupyterHub base:407] Refreshing auth for user3
[D 2025-01-14 19:59:31.279 JupyterHub scopes:884] Checking access to /hub/api/users/user3/activity via scope users:activity
[D 2025-01-14 19:59:31.279 JupyterHub scopes:697] Argument-based access to /hub/api/users/user3/activity via users:activity
[D 2025-01-14 19:59:31.281 JupyterHub users:903] Not updating activity for <User(user3 1/1 running)>: 2025-01-14T18:10:50.153264Z < 2025-01-14T18:11:39.683000Z
[D 2025-01-14 19:59:31.281 JupyterHub users:924] Not updating server activity on user3/: 2025-01-14T18:10:50.153264Z < 2025-01-14T18:11:39.683000Z
[I 2025-01-14 19:59:31.283 JupyterHub log:192] 200 POST /hub/api/users/user3/activity (user3@172.16.6.51) 22.32ms
[D 2025-01-14 19:59:32.287 JupyterHub reflector:362] events watcher timeout
[D 2025-01-14 19:59:32.287 JupyterHub reflector:281] Connecting events watcher
[D 2025-01-14 19:59:33.480 JupyterHub base:407] Refreshing auth for user12
[D 2025-01-14 19:59:33.485 JupyterHub scopes:884] Checking access to /hub/api/users/user12/activity via scope users:activity
[D 2025-01-14 19:59:33.485 JupyterHub scopes:697] Argument-based access to /hub/api/users/user12/activity via users:activity
[D 2025-01-14 19:59:33.487 JupyterHub users:903] Not updating activity for <User(user12 1/1 running)>: 2025-01-14T17:05:08.576028Z < 2025-01-14T17:05:08.576028Z
[D 2025-01-14 19:59:33.487 JupyterHub users:924] Not updating server activity on user12/: 2025-01-14T17:05:08.576028Z < 2025-01-14T17:05:08.576028Z
[I 2025-01-14 19:59:33.489 JupyterHub log:192] 200 POST /hub/api/users/user12/activity (user12@172.16.6.11) 22.18ms
[D 2025-01-14 19:59:34.355 JupyterHub scopes:884] Checking access to /hub/spawn/user1 via scope servers
[D 2025-01-14 19:59:34.355 JupyterHub scopes:697] Argument-based access to /hub/spawn/user1 via servers
-----------Auth State -------------------{'access_token': 'ey..U4nw', 'auth0_user': {'sub': 'waad|PDD2rN..JAMK4',..}}
[E 250114 20:02:18 ioloop:941] Exception in callback functools.partial(<function cull_idle at 0x7feaec7f3ca0>, url='http://localhost:8081/hub/api', api_token='2f57c9af4794465d92285fcbe4d31b9e', inactive_limit=72000, cull_users=False, remove_named_servers=True, max_age=0, concurrency=10, ssl_enabled=False, internal_certs_location='internal-ssl', cull_admin_users=False, api_page_size=0, cull_default_servers=True, cull_named_servers=True)
    Traceback (most recent call last):
      File "/usr/local/lib/python3.9/dist-packages/tornado/ioloop.py", line 939, in _run
        await val
      File "/usr/local/lib/python3.9/dist-packages/jupyterhub_idle_culler/__init__.py", line 171, in cull_idle
        resp = await fetch(HTTPRequest(url=f"{url}/", headers=auth_header))
      File "/usr/local/lib/python3.9/dist-packages/jupyterhub_idle_culler/__init__.py", line 124, in fetch
        return await client.fetch(req)
    tornado.simple_httpclient.HTTPTimeoutError: Timeout during request

Our hub DB is a cloud managed postgres DB.

Please let me know if there’s any other info that you need :slight_smile: Thank you!

I just looked up the events and pod watcher timeout, it seems like increasing the hub resource requests might help?

Increasing resources might help, I’m not sure. What does your modify_pod_hook do? It looks like this data_access_granter might be where the problem is, and it appears to block for about 90 seconds sometimes (from 17:00:50 to 17:02:26). I’d focus attention on that. You might want to make that function async if you can to avoid blocking all of JupyterHub, or perhaps work on the error handling, since it seems related to handling of this error that makes it hang:

ERROR:org.python_pkg.data_access_granter:ERROR: Oracle query has failed for role pqr. Error message: OperationalError('(cx_Oracle.OperationalError) ORA-01033: ORACLE initialization or shutdown in progress\nProcess ID: 0\nSession ID: 0 Serial number: 0')
ERROR:org.python_pkg.data_access_granter:ERROR: Oracle query has failed for role pqr_pat. Error message: OperationalError('(cx_Oracle.OperationalError) ORA-01033: ORACLE initialization or shutdown in progress\nProcess ID: 0\nSession ID: 0 Serial number: 0')