[Feature request] More useful metrics collected by JupyterHub

ktaletsk · June 16, 2022, 11:50pm

Hi everyone, I would like to create a feature request and help in developing it.

As a developer and maintainer of our organization JupyterHub, I am interested in collecting more useful data about our Hub, which can justify the development and cloud costs dedicated to the project. Currently, I found 2 very useful metrics collected by JupyterHub: total number of registered users and number of currently running servers. I wish to know, for example, the number of currently active (logged in) users, distribution of server lifetimes and many more. I can also see that there are plans on the roadmap to have more resource tracking inside servers as well.

In that vein, I would like to ask the community, if someone already working on that, and if not, to get some pointers to get me started developing that. In particular, I would be glad for any recommendations on how to start implementing the number of currently active users metric.

manics · June 17, 2022, 12:33pm

JupyterHub is very flexible and highly customisable, so some of those stats depend on what’s offered by the underlying platform. JupyterHub could gather some standard metrics, but for now I think the easiest solution is to leverage your base infrstructure.

For example, if you’re using Kubernetes you can use Prometheus/Grafana to build dashboards visualising the current pods/user, resource consumption, storage, etc, in much more detail than would be possible with a generic set of JupyterHub metrics. If you’re using JupyterHub on HPC you should be able to get stats from your cluster management system.

ktaletsk · June 17, 2022, 3:57pm

@manics thanks for the reply! If using kubespawner on k8s + Prometheus/Grafana stack, how would I get the gauge for the number of unique active users, given that they can run multilple servers at a time. I can only think of running some regexp on pod names and trying to count unique usernames in the list of Jupyter pods.
If someone have solved this problem, I would be interested to know the approach

manics · June 17, 2022, 8:06pm

Yes, going through the pod metrics and matching/grouping/aggregating by labels will get you the most useful information, especially as you’ll see the resource usage of each pod

There are some example Grafana dashboards you could extend in

Topic		Replies	Views
User Usage Reporting? JupyterHub how-to , help-wanted	1	653	December 9, 2021
Get total number of servers in jupyterhub Prometheus metrics Zero to JupyterHub on Kubernetes	9	171	June 24, 2024
Has anyone implemented JupyterHub dashboards to show usage / adoption / etc? JupyterHub	5	470	March 9, 2021
Suggested CPU/memory resource requests/limits? Zero to JupyterHub on Kubernetes help-wanted	0	515	May 26, 2021
Towards JupyterHub deployment insights JupyterHub earthcube	7	4078	February 15, 2022

[Feature request] More useful metrics collected by JupyterHub

Related topics