Coincidentally we were just talking about this so good timing on the thread.
I tend to agree that culling old notebook storage for culled users should be separate from the existing cull_idle_servers.py
script since as @betatim said that can get run pretty aggressively.
To throw a wrinkle into this, we don’t have a PVC per notebook pod, we have a single PVC per environment that is backed by object storage, so blindly deleting that single PVC would be…not good.
The current idea is to create a JupyterHub service similar to the “cull idle users” service. The service would check when a user was last active and remove their PV after a long period of inactivity.
We have our cull-idle
service setup to also cull idle users, so this probably wouldn’t work for us, unless I’m missing something. Consider a scenario where the per-notebook culler stops a pod after an hour of inactivity, and the hub-managed cull-idle
service deletes the user after let’s say 5 days of inactivity. We still might not want to delete their storage for like 30 days or longer, something like that. My point being, if we’ve culled the user record from the database then it seems we’d have to work backward from the storage and check to see if the user still exists for each and if not then delete the storage.