I would like to monitor jupyterhub using the Prothemeus monitoring solution.
I read through the documentation here and enabled monitoring metrics: Monitoring — JupyterHub documentation
But, I have a question about the GPU metrics.
As I saw in the JupyterLab dashboard, we have a GPU resources usage tab
I tried to find the GPU metrics that appear in the image through the Prometheus endpoint, but it seem like it’s not exposed because I only saw the normal CPU and Memory usage metrics:
# HELP process_virtual_memory_bytes Virtual memory size in bytes.-:--:-- 0
10# TYPE process_virtual_memory_bytes gauge
# HELP process_resident_memory_bytes Resident memory size in bytes.
# TYPE process_resident_memory_bytes gauge
26k 100 126k 0 0 5739k 0 --:--:-- --:--:-- --:--:-- 5739k
# HELP total_memory_usage counter for total memory usage
# TYPE total_memory_usage gauge
# HELP max_memory_usage counter for max memory usage
# TYPE max_memory_usage gauge
So, did I need to enable some option to expose GPU metrics through Prometheus endpoint or those types of metrics are currently not exposed?
The JupyterHub metrics are related to the performance of the JupyterHub server, e.g.number of users, and the resource usage of the hub process. It doesn’t include any metrics for your singleuser servers- the metrics for these need to be scraped seperately.
If the metrics you need aren’t included you may need to isntall some extensions, or perhaps even write one. The GPU dashboard in your screenshot isn’t part of the standard JupyterLab, so you must’ve installed some customisations already.
Thanks for your response!
About the GPU metrics from my image, I need to investigate the extensions where the dashboard comes from.
Could you please give me some clues about the metrics from the documentation (List of Prometheus Metrics — JupyterHub documentation)?
It seems like the metrics that I gathered are not the same as the metrics name from the documentation, such as:
Here are some metrics that I gathered
# HELP terminal_currently_running_total counter for how many terminals are running
# TYPE terminal_currently_running_total gauge
I can not see the metric name “terminal_currently_running_total” in the documentation. Conversely, I can not find the metric name “jupyterhub_active_users” from the list metric above.
JupyterHub and Jupyter-server/JupyterLab are separate components. JupyterHub is designed to manage multiple jupyter server/lab/notebooks for multiple users, including managing logins, and creating/running/destroying jupyter servers. JupyterLab/server is what users use for running notebooks.
Based on your post I assume you’re only interested in the metrics for Jupyter server? If so you can ignore the JupyterHub documentation.