How to share GPU to mutiple pods? Insufficient nvidia.com/gpu

LeafLikeApple · May 24, 2024, 12:10pm

Hello Community,

My JupyterHub runs on Kubernetes and I use the NVIDIA/k8s-device-plugin so that the pods can access the GPU.

I have the following problem, or maybe I’m misunderstanding something.
In the config for the profile for the GPU I have set an extra_resource_limits.

extra_resource_limits:
    nvidia.com/gpu: ‘1’

However, two or more users cannot use the GPU at the same time and the other pods gets the message “Insufficient nvidia .com/gpu”.

Is it possible for multiple users to use the GPU at the same time?
If so, how?

Thank you very much for your help.

manics · May 24, 2024, 3:04pm

According to the K8s docs:

You can specify GPU limits without specifying requests, because Kubernetes will use the limit as the request value by default.

So it sounds like the limit is also treated as the request.

consideRatio · May 25, 2024, 4:11pm

GPU sharing isn’t straight forward, this video clarifies where things are currently and what is planned into the future: https://youtu.be/Q2GuTUO170w?si=cmuG1rl_0WfM5eDq

Check 16 min and 15 seconds in for the relevant parts about sharing GPU.

LeafLikeApple · May 27, 2024, 6:49pm

Thanks for the tip, I found the time slicing feature. I just had to activate the feature in the NVIDIA/k8s-device-plugin.

If anyone has the same problem, this is the config for the plugin I’m using now. replicas can also be a different number.

version: v1
flags:
  migStrategy: "none"
  failOnInitError: true
  nvidiaDriverRoot: "/"
  plugin:
    passDeviceSpecs: false
    deviceListStrategy: "envvar"
    deviceIDStrategy: "uuid"
sharing:
  timeSlicing:
    resources:
    - name: nvidia.com/gpu
      replicas: 20

consideRatio · May 27, 2024, 9:40pm

Thank you for sharing this @LeafLikeApple!!

If a user is the sole user of the actual GPU currently, do you know if that user get to use all time slices and get ~full performance during that time?

LeafLikeApple · May 30, 2024, 9:05am

Unfortunately, I don’t know this exactly and I can’t test it because I don’t have the experience to use the GPU specifically.
There are two modes (time slicing and MPS) in the plugin.

"In the case of time-slicing, CUDA time-slicing is used to allow workloads sharing a GPU to interleave with each other. However, nothing special is done to isolate workloads that are granted replicas from the same underlying GPU, and each workload has access to the GPU memory and runs in the same fault-domain as of all the others (meaning if one workload crashes, they all do).

In the case of MPS, a control daemon is used to manage access to the shared GPU. In contrast to time-slicing, MPS does space partitioning and allows memory and compute resources to be explicitly partitioned and enforces these limits per workload."

It probably depends on the scenario which mode you use. We don’t have many GPU users here yet, so I assume that time slicing is better in this case.

tadeha · June 3, 2024, 10:16am

I’m interested in this case. Does anyone have more information on this matter?

Topic		Replies	Views
Insufficient gpu problem! Zero to JupyterHub on Kubernetes help-wanted	1	1240	October 24, 2021
Prevent pod to get GPUs Zero to JupyterHub on Kubernetes	1	251	March 19, 2024
Singleuser GPU limits Zero to JupyterHub on Kubernetes	1	972	March 24, 2022
GPu doesn't work with extra_config Zero to JupyterHub on Kubernetes help-wanted	1	712	February 8, 2022
Unwanted shared GPU Zero to JupyterHub on Kubernetes	2	701	June 1, 2021

How to share GPU to mutiple pods? Insufficient nvidia.com/gpu

Related topics