Hello, I’m using Azure and I’m noticing that two users are having access to the same node that has a single GPU. This is causing one of them to run out of GPU memory. Is there way to force having one user per node so they do not share GPUs?
It isn’t possible to do that as directly as “hey please 1 user per node!” but it is possible to say “hey please let this user have exactly one GPU” or “hey let this user have ~8 CPU no matter what” and then the node only have ~8 CPU.
When using GPUs, I’ve typically requested GPU. It has looked like this, but it will be specific to how it is implemented in Azure though.
singleuser: extraResource: limits: nvidia.com/gpu: "1"
Yikes that naming was quite convoluted, KubeSpawner adjusted a name based on the k8s non-aware Spawner base class, then z2jh adjusted that, and k8s native specification is something different still.
See https://zero-to-jupyterhub.readthedocs.io/en/latest/resources/reference.html#singleuser-extraresource for details on configuring such resource requests/limits. I suggest specifying limit only, it implies that you want it requested as well as limited to a single GPU.
Note though that this has worked on GCP where they have nvidia based GPUs. I have no idea about Azure.
Thanks @consideRatio , just to clarify to those who might be having the same issue with Azure. In my experience with Azure, setting
limits actually means “give me a GPU”, but it does not say that the GPU will be used by just a single user. You will have to do this indirectly by setting minimum amounts of CPU or RAM that are greater than half of a node’s. For example, if you use a NC6 VM that comes with a K80 and 6vCPU, then you should change you config file to make sure that each user will get at least 4 vCPU. This will force each user to have their own VM and hence, their own GPU.
Hope this helps as well.