I have a GPU enabled Kubernetes cluster in which I can run pods that actually use the GPU. This has been verified via GPU test pods, running ML workflows that use the GPU etc.
In other words, I’m confident that the cluster is configured properly since when running a GPU enabled workload, I can monitor the GPU activity/processes, etc via nvidia-smi.
I have Jupyerhub installed on the cluster and I’ve followed the instructions in the zero 2 hero docs as shown below.
The problem is, when I launch a notebook and run the basic tests such as:
!pip install torch
import torch
torch.cuda.is_available()
It returns FALSE
On this particular custer, I have one NVIDIA GPU and when I launch a notebook/server and request a GPU . the notebook/server will spin up - but - the above tests fail.
If I try to launch a 2’nd notebook/server while the first is running I see an error stating that no GPU’s are available - which is expected since the cluster only has 1 GPU and it’s allocated to the first notebook/server that was spun up.
Therefore, due to the above, I believe my jupyterhub-config.yaml is accurate.
Once again, the problem is basic commands withing the notebook to detect a GPU return false.
Note, I’ve tried restarting the notebook kernal to no avail
I’m likely missing something simple and any help is GREATLY appreciated.
Once again, the kubernetes cluster is configured properly as my other pods that use GPU’s are able to do so… just not the notebook ??
This is what I used from the zero 2 hero guide
singleuser:
profileList:
- display_name: “GPU Server”
description: “Spawns a notebook server with access to a GPU”
kubespawner_override:
extra_resource_limits:
nvidia.com/gpu: “1”