GPUs can not be shared, but GPUs must be shared

This was originally going to be a lightning talk, but I can’t be there so I think that writing is more efficient.

Some resources are easy to share - CPUs can be easily oversubscribed, memory can be oversubscribed but not over-used. But some resources are different, in particular GPUs. On a cluster, they are extremely expensive, and they can not be shared securely in a multi-tenant environment securely (if we’re wrong on this, please correct us).

For months people asked us for GPUs, but we couldn’t efficiently put them into Jupyter - the efficiency of interactive use would be too low. For research maybe we could solve this by throwing money at it, but for a 500-person course that’s not possible. Eventually, one of my colleagues came up with the catchphrase: “GPUs can not be shared, but they must be shared”. It made me realize that we have to think far outside the box in order to tackle this problem. GPUs can not be shared, so we have to redefine what sharing even means.

So, normal CPU-level multiprocess timesharing doesn’t work. So what are the other options?

We can share GPUs at the level of notebooks: have some sort of batch queue, entire notebooks are sent to the GPU to run. Or we can share GPUs at the levels of cells: try to send single cells to the GPU in some sort of batch/sharing system. Or is there something else?

One option is to share GPUs by whole notebooks: make an system to submit whole notebooks to a queue. It runs the notebook top to bottom, saves the output (maybe the state), and you can non-interactively examine those results an iterate. Huge disadvantage: non-interactive. But this is not so different than interactive debugging on dedicated machines then batch running after code works, so it’s a model that is similar to what people are used to. I’m working on an extension to make this easier (an incidentally, notebooks more like scripts): https://github.com/NordicHPC/nbscript

The other option I can think of is cell-level execution - there is a cell magic that serializes all notebook state, sends it to a remote execution environment (some sort of batch queue here, hopefully so fast that it’s not visible to users), runs, re-serializes the state, brings it back, and re-loads the new state into the notebook. Advantages: convenient, almost transparent. Disadvantages: not all state can be serialized. Proof of concept by colleague: https://github.com/murhum1/remote-notebook-cells (haven’t tested myself). Lightning talk by Russell Neches is also relevant here: github: ryneches/SummonUndead (can only post two links, sorry)

I heard a rumor from Open Source Summit in Edinburgh about some way to dynamically add GPUs to and dynamically remove them from containers. This would allow some sort of API (context manager or cell magic) to get the GPU just for the cells that need it. I haven’t seen the container API to do this and wonder there will be problems with the code releasing the GPU.

Are there any more options? None are that good. What do you do?

Other options: throw money at problem (or let users throw money at it), buy slower cheaper GPUs for interactive use, allow full GPUs but with really short culling time.

1 Like

I don’t think this is very feasible, but I think one way for this to work well is for libraries to acknowledge this, and acquire/release GPUs on a per-call basis much more proactively. A GPU API that allows being kicked off the GPU by someone else and being seamlessly restored as needed (as the OS does all the time with CPU state) would go a long way. I have no idea how to do this in practice, but code blocks like:

with gpu:
    melt_glaciers_with_machine_learning()

would ensure that the GPU is not claimed while the notebook is idle, allowing better multi-tenant behavior.

Implementing the same multi-tenant “it’s okay to kick me out, as long as you put me back” at a lower level would make it work more like OS context switches, but I don’t know who has the expertise for doing something like that.