Based on users’ inactivity, we want to run some deprovisioning scripts.
In our case, “inactivity” means that the user did not visit or interact with the JupyterHub. So logins, accessing JupyterHub services, and spawning singleuser servers count as “activity”.
The database table users contains the column last_activity. Can we rely on this information, or does it have another meaning in JupyterHub’s context?
Looks like from the code, even the activity with single user servers are updated as user’s activity in the DB. So, if you are looking to find the activity only on the “hub”, it might be not so easy!!
That is precisely what user.last_activity is for, so please go ahead and use it. You can also use the GET /hub/api/users?sort=last_activity endpoint to fetch users, with the longest-idle users first.
What sort of time scale are you thinking? If it’s at least 15 minutes or so, this number should be trustworthy, but configuration can affect that.
Events that increment user.last_activity:
login
spawning the user’s server
any API request to JupyterHub
any cookie-authenticated request to JupyterHub
any successful authentication with a JupyterHub service, including the user’s server
To avoid database thrashing, it’s not updated on every request, only updated at most every JupyterHub.activity_resolution (default: 30 seconds) per user.
Events that usually increment activity, but can vary depending on configuration of the user’s server environment:
API requests to the user’s server, including opening/saving files, kernel start/stop
cell executions
Activity from the server is not updated immediately, so should be accurate on the scale of $JUPYTERHUB_ACTIVITY_INTERVAL in the user environment (default: 5 minutes).
Additionally, if the default configurable-http-proxy is used, any traffic to/from the user’s server is considered activity. This is also infrequently updated, and governed by JupyterHub.last_activity_interval (default: 5 minutes). That means leaving an idle jupyterlab tab open will typically register as activity (depends on the UI whether the client will poll or not, triggering activity).
Some examples that might not update last_activity:
the user only talks to a particular Service, and that service caches its auth information for longer than the timescale of your idle timeout.