GPUs can not be shared, but GPUs must be shared

We are using the gpushare-scheduler-extender from Alibaba in our Z2JH cluster.

I works reasonably well and allows over subscription of GPU memory. However, it relies on nvidia-docker2 which currently does not (yet) support NVidias MPS and therefore cant enforce any limits on resource usage.

My work around for this issue is by running a periodic Kubernetes job which terminates any pods which violate their quotas. This can be implemented by running nvidia-smi which provides us with PIDs for each GPU task. These PIDs can then be traced back to a Kubernetes pod by looking at the cgroup names in /proc/<pid>/cgroup (see also Heptio Lab’s pid2pod).

However, implementing this is probably not worth the time, since Alibaba is already working on MPS support.

There is also a nice medium article from Alibaba about the implementation giving some more background.