Reliability of User.last_activity

Hey there,

Based on users’ inactivity, we want to run some deprovisioning scripts.
In our case, “inactivity” means that the user did not visit or interact with the JupyterHub. So logins, accessing JupyterHub services, and spawning singleuser servers count as “activity”.

The database table users contains the column last_activity. Can we rely on this information, or does it have another meaning in JupyterHub’s context?

Looks like from the code, even the activity with single user servers are updated as user’s activity in the DB. So, if you are looking to find the activity only on the “hub”, it might be not so easy!!

1 Like

I should have been more precise: That’s totally fine! From a user’s perspective, interacting with a JupyterHub includes accessing singleuser server.

Fair enough!! But this user activity does not capture long running kernels or programs running in the terminal.

That is precisely what user.last_activity is for, so please go ahead and use it. You can also use the GET /hub/api/users?sort=last_activity endpoint to fetch users, with the longest-idle users first.

What sort of time scale are you thinking? If it’s at least 15 minutes or so, this number should be trustworthy, but configuration can affect that.

Events that increment user.last_activity:

  • login
  • spawning the user’s server
  • any API request to JupyterHub
  • any cookie-authenticated request to JupyterHub
  • any successful authentication with a JupyterHub service, including the user’s server

To avoid database thrashing, it’s not updated on every request, only updated at most every JupyterHub.activity_resolution (default: 30 seconds) per user.

Events that usually increment activity, but can vary depending on configuration of the user’s server environment:

  • API requests to the user’s server, including opening/saving files, kernel start/stop
  • cell executions

Activity from the server is not updated immediately, so should be accurate on the scale of $JUPYTERHUB_ACTIVITY_INTERVAL in the user environment (default: 5 minutes).

Additionally, if the default configurable-http-proxy is used, any traffic to/from the user’s server is considered activity. This is also infrequently updated, and governed by JupyterHub.last_activity_interval (default: 5 minutes). That means leaving an idle jupyterlab tab open will typically register as activity (depends on the UI whether the client will poll or not, triggering activity).

Some examples that might not update last_activity:

  • the user only talks to a particular Service, and that service caches its auth information for longer than the timescale of your idle timeout.
3 Likes

Thank you for the detailed information. I’m looking for an inactivity of 365 days, so it should just work fine, thanks!