I try to configure dockerspawner.SwarmSpawner to use nvidia GPU, but it didn't work. I would like to confirm whether this feature is supported.

I try to configure dockerspawner.SwarmSpawner to use nvidia GPU, but it didn’t work. I would like to confirm whether this feature is supported.

Proposed change

Use nvidia Gpus in dockerspawner.SwarmSpawner

Alternative options

Some parameters in the Docker API:

Low-level API — Docker SDK for Python 6.1.3 documentation

create_service(task_template, name=None, labels=None, mode=None, update_config=None, networks=None, endpoint_config=None, endpoint_spec=None, rollback_config=None)
class TaskTemplate(container_spec, resources=None, restart_policy=None, placement=None, log_driver=None, networks=None, force_update=None)
  • resources (Resources) – Resource requirements which apply to each individual container created as part of the service.
class Resources(cpu_limit=None, mem_limit=None, cpu_reservation=None, mem_reservation=None, generic_resources=None)
Configures resource allocation for containers when made part of a ContainerSpec.

Parameters:

  • cpu_limit (int) – CPU limit in units of 10^9 CPU shares.
  • mem_limit (int) – Memory limit in Bytes.
  • cpu_reservation (int) – CPU reservation in units of 10^9 CPU shares.
  • mem_reservation (int) – Memory reservation in Bytes.
  • generic_resources (dict or list) – Node level generic resources, for example a GPU, using the following format: { resource_name: resource_value }. Alternatively, a list of of resource specifications as defined by the Engine API.

Who would use this feature?

All the people who use gpus in the cluster

(Optional): Suggest a solution

For example, some configuration schemes are given in the tutorial

SwarmSpawner supports several additional arguments, including extra_resources_spec, is that what you’re trying to configure?
https://jupyterhub-dockerspawner.readthedocs.io/en/latest/api/index.html#swarmspawner

If you need to delve into the details the source code for the spawners is in dockerspawner/dockerspawner at main · jupyterhub/dockerspawner · GitHub

If you figure out whether it’s possible or not please share your solution here so the rest of the community can benefit!

I have never got extra_resources_spec to work, but I don’t understand why.

c.SwarmSpawner.extra_resources_spec.update({
        "generic_resources": {"gpu":1}
})

Everything looks great, but the container never starts. docker service inspect shows that the requirement is in place, but the container is stuck at no nodes available.

I can start a tensor-flow service with exactly the same requirement and it starts just fine. Docker service inspect says that the requirements are identical to the service that jupyterhub tried to start.

Has anyone had luck with extra_resources_spec?

Oh, it looks like a bug in docker swarm: docker service create doesn't work when network and generic-resource are both attached · Issue #44378 · moby/moby · GitHub
You can’t specify a generic-resource and a network at the same time. I’ll either have to wait until it’s fixed or figure out how to run without specifying the network.

1 Like