Idle culler not culling

I’m seeing a growing number of older servers (currently up to 22 days old, the age of the cluster). The culler seems to start and run at the expected 10 minute interval, but none of the containers get cleaned up:

[I 2023-09-07 14:41:11.218 JupyterHub app:3189] Starting managed service jupyterhub-idle-culler
[I 2023-09-07 14:41:11.218 JupyterHub service:385] Starting service 'jupyterhub-idle-culler': ['python3', '-m', 'jupyterhub_idle_culler', '--url=http://localhost:8081/hub/api', '--timeout=86400', '--cull-every=600', '--concurrency=10', '--max-age=1209600']
[I 2023-09-07 14:41:11.219 JupyterHub service:133] Spawning python3 -m jupyterhub_idle_culler --url=http://localhost:8081/hub/api --timeout=86400 --cull-every=600 --concurrency=10 --max-age=1209600
[D 2023-09-07 14:41:11.771 JupyterHub base:299] Recording first activity for <APIToken('7ba6...', service='jupyterhub-idle-culler', client_id='jupyterhub')>
[I 2023-09-07 14:41:11.790 JupyterHub log:191] 200 GET /hub/api/ (jupyterhub-idle-culler@127.0.0.1) 24.38ms
[I 2023-09-07 14:41:11.811 JupyterHub log:191] 200 GET /hub/api/users?state=[secret] (jupyterhub-idle-culler@127.0.0.1) 18.11ms
[I 2023-09-07 14:51:11.872 JupyterHub log:191] 200 GET /hub/api/ (jupyterhub-idle-culler@127.0.0.1) 20.24ms
[I 2023-09-07 14:51:11.886 JupyterHub log:191] 200 GET /hub/api/users?state=[secret] (jupyterhub-idle-culler@127.0.0.1) 7.60ms

I added the --max-age=1209600 to factor out if indeed there were an unusually large number of active connections, but servers older than 14 days are not culled so that doesn’t seem to be the case. I’ve tried some lower thresholds, down to 4800 seconds, though that doesn’t seem to clean up servers either.

When enabled the culler does seem to work on cleaning up old users from the DB, but not running servers.

The hub itself seems to identify that there are many servers that have not been active in the last 24 hours:

[I 2023-09-07 14:41:11.176 JupyterHub metrics:278] Found 28 active users in the last ActiveUserPeriods.twenty_four_hours
[I 2023-09-07 14:41:11.177 JupyterHub metrics:278] Found 43 active users in the last ActiveUserPeriods.seven_days
[I 2023-09-07 14:41:11.178 JupyterHub metrics:278] Found 69 active users in the last ActiveUserPeriods.thirty_days

I’m not seeing any errors from the hub container. Leaving me unsure of where to look.

I’m running zero-to-jupyterhub 3.0.0 where the hub container has jupyterhub-idle-culler==1.2.1

Can you try configuring the idle culler to output debug logs, this should provide information on what pods the culler can see:

In addition can you login to JupyterHub as an admin, and see whether the long running servers are still listed as running?

Can you also allow us your Z2jh config?

Thank you for your response! Sure enough the admin panel is not showing the old servers as running. As you divined I am seeing them through kubectl. So maybe the culler is working, but the servers aren’t shutting down?

Indeed, looking at the long running server logs now, showing a lot of:

[E 2023-09-07 21:49:57.798 JupyterHubSingleUser] Error notifying Hub of activity
    Traceback (most recent call last):
      File "/srv/paws/lib/python3.10/site-packages/jupyterhub/singleuser/extension.py", line 425, in notify
        await client.fetch(req)
    tornado.httpclient.HTTPClientError: HTTP 403: Forbidden
[E 2023-09-07 21:49:57.799 JupyterHubSingleUser] Error notifying Hub of activity
    Traceback (most recent call last):
      File "/srv/paws/lib/python3.10/site-packages/jupyterhub/singleuser/extension.py", line 450, in keep_activity_updated
        await self.notify_activity()
      File "/srv/paws/lib/python3.10/site-packages/jupyterhub/singleuser/extension.py", line 432, in notify_activity
        await exponential_backoff(
      File "/srv/paws/lib/python3.10/site-packages/jupyterhub/utils.py", line 237, in exponential_backoff
        raise asyncio.TimeoutError(fail_message)
    asyncio.exceptions.TimeoutError: Failed to notify Hub of activity

The z2jh config is defined in

With the values file at:

I’m currently running off of GitHub - toolforge/paws at T345838 where I was tinkering with the culler options.

I’m not sure how to enable the debug for the culler specifically, I don’t see an option for it in GitHub - jupyterhub/jupyterhub-idle-culler: JupyterHub service to cull idle servers and users, though I did for jupyterhub in values.yaml (debug.enabled: true) in the branch that I’m working off of. Not sure if that is relevant considering what you lead me to find. Though if it is could you describe how I can enable debug for the culler.

There’s a suspected bug in Kubespawner related to how the hub tracks pods, which might be fixed by this PR:

It’s merged but not yet released, however if you’re comfortable rebuilding the hub image to upgrade Kubespawner to the head of the main branch that’s worth a try.

Sorry, you’re right! There should be an undocumented logging option, but it’s not configurable in Z2jh. I’ve opened a new issue: