This topic: a feature exploration
This topic is meant to be a collaborative exploration of what could make sense to develop in order to provide usage related insights for the funders, administrators, and users of a JupyterHub deployment! Let’s embark on a balancing act to arrive at a viable feature suggestion.
Value of data based insight
JupyterHub admins often need to demonstrate value and estimate the cost of their JupyterHub deployment, they could benefit from having accessible information about its usage.
I believe that the typical administrator of a JupyterHub deployment currently only have a vague perception of how it is used, and I think that the users mostly know if they have a server running or not right now. Could we provide significant value by providing some more information about the collective usage to administrators and the individual usage to individual users?
Let’s consider an example where an institution funds a JupyterHub deployment.
- If the institution would have a measurable indication of the cost and value the deployment provide, it could help them motivate its continued funding and development, allocate costs appropriately, and help administrators optimize it further.
- If individual users would be better informed about their own usage and its implications on costs and such, they would likely gain agency to use the resources appropriately which would benefit everyone.
Related terminology
- Monitoring is typically about tracking the current status of various metrics such as currently running users. Its main purpose is probably a more technical and for the current operation.
- Events can be emitted and recorded to track discrete events, such as a spawn of a user pod.
- Key Performance Indicators (KPIs) are values that typically act as a statistical summary of a duration of time back. For a JupyterHub it could be weekly/monthly active users where an active user is a user that have started a server once, or weekly/monthly regular users where a regular user is a user that have been active for 8 hours a week on average.
Feature inspiration
I added some reference examples of related features, edit this wiki post to add more.
JupyterHub’s admin dashboard
JupyterHub currently provide a snapshot number of the total users in the JupyterHub database, and the total currently running servers. It is also possible to list users based on their latest activity.
GitLab’s admin dashboard
GitLab deployment provides some snapshot indicators of its usage as well as latest projects/users/groups that gives an indication on activity.
Discourse’s reports
Discourse takes it a bit further by presenting a dashboard with an opinionated selection of metrics presented in graphs together with suggestive help about the graphs. They also present a wide range of reports that can be exported either manually or through a REST API.
Provided info about DAU/MAU when hovering the questionmark
“Number of members that logged in in the last day divided by number of members that logged in in the last month – returns a % which indicates community ‘stickiness’. Aim for >30%.”
Grafana
Grafana is a tool dedicated to presenting dashboards of metrics. Grafana is typically relying on something like Prometheus which is able to repeatedly poll various services (example: https://hub.mybinder.org/hub/metrics) to build up a time series of their status. Prometheus can then provide historic information to Grafana that can then use be used to define dashboard’s with different graphs.
Grafana allow dashboards to be exported to JSON objects and also make it possible to publish these dashboard descriptions on grafana.com/grafana/dashboards.
This has has allowed admins of the mybinder.org deployment to create dashboards backed by the data from prometheus and collected by prometheus, these dashboards with graphs are publically available at https://grafana.mybinder.org.
Here is an example from a graph in the Node Activity dashboard. From this, an administrator could learn that they had configured to have more user placeholders than seems needed. User placeholders is a feature for Z2JH deployments acting as seat warmers to ensure users don’t end up waiting for nodes to start.
Here is another graph that provide fine grain insights about the usage of the deployment, but the data presented is coming from Kubernetes API rather than from JupyterHub, so the dashboard definition which can be extracted as JSON, is tightly coupled with Z2JH.
Since JupyterHub expose a /hub/metrics
endpoint (@GeorgianaElena and others ) about for example the total number of currently spawned servers, we could define a Grafana dashboard with common metrics independent on the kind of JupyterHub deployment.
Related:
Events with jupyter_telemetry
mybinder.org collect data and makes it available at https://archive.analytics.mybinder.org/. The published data describes the launches for users at mybinder. This is tech enabled by BinderHub itself rather to some degree rather than specifically for mybinder. It works by logging these events using code which I now this is extracted to the jupyter_telemetry package.
@yuvipanda provide a relevant discussion about the difference of events and metrics that I quote below. Note that grafana presents metrics above, while @choldgraf tweeted an analysis based on events.
BinderHub also exposes prometheus metrics. These are pre-aggregated, and extremely limited in scope. They can efficiently answer questions like ‘how many launches happened in the last hour?’ but not questions like ‘how many times was this repo launched in the last 6 months?’. Events are discrete and can be aggregated in many ways during analysis. Metrics are aggregated at source, and this limits what can be done with them during analysis. Metrics are mostly operational, while events are for analytics.
Additional reading:
Related:
Website analytics with Matomo
Matomo allow for website usage to be recorded and presented as I understand it, like google analytics, but open source. The mybinder deployment of BinderHub has deployed it alongside the BinderHub, and it can help track how many arrive to mybinder.org and navigation on various subpages that may didn’t need to be additional requests to the BinderHub backend.
Feature ideas
I list some feature ideas, edit this wiki post to add yours or edit existing entries.
/metrics endpoint additions
We already exposea /metrics endpoint on JupyterHub, but what metrics do we currently expose, and what metrics do we want to have exposed there?
Grafana dashboard
Since we have a /metrics endpoint, it would pair very well with a grafana dashboard that defines the graphs assuming we have such data collected by Prometheus over a period of time.
KPI reports for admins
Just like Discourse provided a predefined set of Key Performance Indicators (KPIs) such as DAU/MAU. I think it could be good to define some of these that are general enough for all JupyterHubs.
To get KPI reports, I think we need to:
- Define a set of KPIs to expose
- What would a admin want to see? (Operational insights)
- What would a investor want to see? (Usage / value / outcome insights)
- Enable collection of relevant data
JupyterHub is an extensible system where Spawners, Authenticators, and Proxy are base classes that can be overridden. Some KPIs may require us to add something to these base classes and implement some additional logic in the derivative classes like KubeSpawner. - Collect and store relevant data
We need to collect and store the raw data so we can analyze it later. - Process data into a KPI
We need to be able to process the data into a KPI. - Publish through a web UI and/or API (+ notebook)
We need to be able to expose the KPIs, either directly from the JupyterHub web UI, through a built in JupyterHub REST API, or through a JupyterHub service.
Usage reports for users
This regards the idea of providing usage reports to individual users. What would be beneficial if users got information about?
Events server (a sink for jupyter_telemetry)
We could provide a JupyterHub service (internal, external, or either) that could act as a sink that receives events from various sources which it could expose somehow. This could act as common place to send events that then can be exposed for analysis.
Summary and questions to you!
The JupyterHub ecosystem has some mechanisms to provide insights about its usage, such as JupyterHub’s /metrics endpoint and the jupyter_telemetry package, but we could likely benefit from some more pieces. I hope that we can define something a feature valuable enough to develop, sustainable to maintain, with early adopters ready to dogfood it during development.
- What feature do you think could make sense to develop?
- What insights would you benefit from as a project funder, administrator, or user?