Cannot to deploy nvidia gpu on JupyterHub

I installed nvidia gpu and microk8s 1.29 on ubuntu 24.04.

administer@ultimate-force:~$ nvidia-smi
Thu Mar 20 11:42:47 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.120                Driver Version: 550.120        CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce GTX 1080 Ti     Off |   00000000:C1:00.0 Off |                  N/A |
|  0%   31C    P8             11W /  280W |      16MiB /  11264MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A      2171      G   /usr/lib/xorg/Xorg                              9MiB |
|    0   N/A  N/A      4131      G   /usr/bin/gnome-shell                            3MiB |
+-----------------------------------------------------------------------------------------+
administer@ultimate-force:~$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Fri_Jan__6_16:45:21_PST_2023
Cuda compilation tools, release 12.0, V12.0.140
Build cuda_12.0.r12.0/compiler.32267302_0

But I cannot enable gpu by: microk8s enable gpu. So I do:

administer@ultimate-force:~$ microk8s helm3 install gpu-operator -n gpu-operator --create-namespace nvidia/gpu-operator --set driver.enabled=false --set toolkit.env[0].name=CONTAINERD_CONFIG --set toolkit.env[0].value=/var/snap/microk8s/current/args/containerd-template.toml --set toolkit.env[1].name=CONTAINERD_SOCKET --set toolkit.env[1].value=/var/snap/microk8s/common/run/containerd.sock --set toolkit.env[2].name=CONTAINERD_RUNTIME_CLASS --set toolkit.env[2].value=nvidia --set toolkit.env[3].name=CONTAINERD_SET_AS_DEFAULT --set-string toolkit.env[3].value=true
NAME: gpu-operator
LAST DEPLOYED: Wed Mar 19 13:10:20 2025
NAMESPACE: gpu-operator
STATUS: deployed
REVISION: 1
TEST SUITE: None 

GPU and cuda work fine when I check:

administer@ultimate-force:~$ microk8s kubectl logs -n gpu-operator -l app=nvidia-operator-validator -c nvidia-operator-validator
all validations are successful 
administer@ultimate-force:~$ microk8s kubectl apply -f - <<EOF
> apiVersion: v1
> kind: Pod
> metadata:
>   name: cuda-vector-add
> spec:
>   restartPolicy: OnFailure
>   containers:
>     - name: cuda-vector-add
>       image: "k8s.gcr.io/cuda-vector-add:v0.1"
>       resources:
>         limits:
>           nvidia.com/gpu: 1
> EOF
pod/cuda-vector-add created 
administer@ultimate-force:~$ microk8s kubectl logs cuda-vector-add
[Vector addition of 50000 elements]
Copy input data from the host memory to the CUDA device
CUDA kernel launch with 196 blocks of 256 threads
Copy output data from the CUDA device to the host memory
Test PASSED
Done 

Then after I deployed Jupyterhub successfully, I can sign in Jupyterhub and install tensorflow by:

python3 -m pip install ‘tensorflow[and-cuda]’

But I cannot import tensorflow from python3:

$ python3
Python 3.12.8 (main, Jan 14 2025, 02:29:13) [GCC 12.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
2025-03-20 09:31:17.182480: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1742463077.197775      74 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1742463077.202298      74 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1742463077.214892      74 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1742463077.214919      74 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1742463077.214922      74 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1742463077.214925      74 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
2025-03-20 09:31:17.218726: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. 

Any idea?

Seems like your jupyterhub GPU single server is well configured. This seems like a tensorflow installation issue. I’ve never had great success with just pip installing tensorflow. You may want to look into their conda packages or even better, use their docker images for your base notebook image:

1 Like