Tensorflow + Conda: GPU visible on CL, not in Jupyter Notebook

Hi,
after several other attempts, I have setup Tensorflow and Jupyterhub along the lines of Note: JupyterHub with JupyterLab Install using Conda.
As this is a prototype system (with NVIDIA 3080) for a GPU server (with 2x A100 GPUs), GPU support within the Jupyter Notebooks is essential. However, it is not working. After setting environment variables, the remaining error is:
E tensorflow/stream_executor/cuda/cuda_driver.cc:328] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected

In my view, the problem boils down to the following: On the command line, in the conda environment tensorflow2-gpu2, both CPU and GPU are seen:

(tensorflow2-gpu2) admin-nb@jupyter-test:~$ python3
Python 3.9.12 (main, Jun  1 2022, 11:38:51) 
[GCC 7.5.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from tensorflow.python.client import device_lib
2022-09-01 13:55:16.780984: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.1
>>> device_lib.list_local_devices()
2022-09-01 13:55:18.653413: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE4.1 SSE4.2 AVX AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-09-01 13:55:18.675498: I tensorflow/core/platform/profile_utils/cpu_utils.cc:112] CPU Frequency: 3792740000 Hz
2022-09-01 13:55:18.676053: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x555a2f539da0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2022-09-01 13:55:18.676082: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2022-09-01 13:55:18.677208: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
2022-09-01 13:55:19.591403: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x555a2f953d60 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2022-09-01 13:55:19.591426: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): NVIDIA GeForce RTX 3080, Compute Capability 8.6
2022-09-01 13:55:19.591638: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties: 
pciBusID: 0000:05:00.0 name: NVIDIA GeForce RTX 3080 computeCapability: 8.6
coreClock: 1.71GHz coreCount: 68 deviceMemorySize: 9.78GiB deviceMemoryBandwidth: 707.88GiB/s
2022-09-01 13:55:19.591658: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.1
2022-09-01 13:55:19.592499: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.10
2022-09-01 13:55:19.592518: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.10
2022-09-01 13:55:19.593350: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2022-09-01 13:55:19.593483: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2022-09-01 13:55:19.594213: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.10
2022-09-01 13:55:19.594603: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.10
2022-09-01 13:55:19.596076: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.7
2022-09-01 13:55:19.596269: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0
2022-09-01 13:55:19.596287: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.1
2022-09-01 13:55:20.194900: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix:
2022-09-01 13:55:20.194921: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267]      0 
2022-09-01 13:55:20.194925: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1280] 0:   N 
2022-09-01 13:55:20.195255: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1406] Created TensorFlow device (/device:GPU:0 with 9070 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce RTX 3080, pci bus id: 0000:05:00.0, compute capability: 8.6)
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 16710234344042336047
, name: "/device:XLA_CPU:0"
device_type: "XLA_CPU"
memory_limit: 17179869184
locality {
}
incarnation: 14476011734076499730
physical_device_desc: "device: XLA_CPU device"
, name: "/device:XLA_GPU:0"
device_type: "XLA_GPU"
memory_limit: 17179869184
locality {
}
incarnation: 16669855104248591829
physical_device_desc: "device: XLA_GPU device"
, name: "/device:GPU:0"
device_type: "GPU"
memory_limit: 9510955264
locality {
  bus_id: 1
  links {
  }
}
incarnation: 11115347933796639039
physical_device_desc: "device: 0, name: NVIDIA GeForce RTX 3080, pci bus id: 0000:05:00.0, compute capability: 8.6"
]

In the notebook on JupyterHub, which uses the same conda environment, only the CPU is seen:

from tensorflow.python.client import device_lib
device_lib.list_local_devices()
[name: "/device:CPU:0"
 device_type: "CPU"
 memory_limit: 268435456
 locality {
 }
 incarnation: 11793224412206947809,
 name: "/device:XLA_CPU:0"
 device_type: "XLA_CPU"
 memory_limit: 17179869184
 locality {
 }
 incarnation: 15586750839120726384
 physical_device_desc: "device: XLA_CPU device"]

Any ideas? Any help is appreciated!

To answer my own question: I restarted jupyterhub.service and now TF sees the GPU:

import tensorflow
from tensorflow.python.client import device_lib
device_lib.list_local_devices()
2022-09-01 15:17:29.577700: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.1
2022-09-01 15:17:30.110398: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE4.1 SSE4.2 AVX AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-09-01 15:17:30.131498: I tensorflow/core/platform/profile_utils/cpu_utils.cc:112] CPU Frequency: 3792740000 Hz
2022-09-01 15:17:30.131854: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55edf71784c0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2022-09-01 15:17:30.131864: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2022-09-01 15:17:30.132936: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
2022-09-01 15:17:31.064536: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55edf78a8f60 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2022-09-01 15:17:31.064552: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): NVIDIA GeForce RTX 3080, Compute Capability 8.6
2022-09-01 15:17:31.064746: I tensorflow/core/common_runtime/gpu/gpu_devi
[name: "/device:CPU:0"
 device_type: "CPU"
 memory_limit: 268435456
 locality {
 }
 incarnation: 9645536461093391315,
 name: "/device:XLA_CPU:0"
 device_type: "XLA_CPU"
 memory_limit: 17179869184
 locality {
 }
 incarnation: 3860495059288939179
 physical_device_desc: "device: XLA_CPU device",
 name: "/device:XLA_GPU:0"
 device_type: "XLA_GPU"
 memory_limit: 17179869184
 locality {
 }
 incarnation: 3135992680368892007
 physical_device_desc: "device: XLA_GPU device",
 name: "/device:GPU:0"
 device_type: "GPU"
 memory_limit: 9510955264
 locality {
   bus_id: 1
   links {
   }
 }
 incarnation: 10857906967441964380
 physical_device_desc: "device: 0, name: NVIDIA GeForce RTX 3080, pci bus id: 0000:05:00.0, compute capability: 8.6"]

Even better: it actually uses the GPU (with 10-fold speedup compared to CPU).

The only remaining problem is: I have to set variables in each notebook:

%env XLA_FLAGS=–xla_gpu_cuda_data_dir=/opt/conda/miniconda3/envs/tf/lib/
%env TF_XLA_FLAGS=--tf_xla_enable_xla_devices
1 Like

Perhaps you could set these flags in the first cell of a jupyterlab-template?

Since this is JupyterHub, you can set those environment variables in c.Spawner.environment so all servers are launched with them.

Alternately, you can use an entrypoint script like this one to source standard shell profile files, etc. prior to launching the server.

1 Like