GPu doesn't work with extra_config

Hy everyone!

I’m trying to make a jupyterhub with restriction for some user to acces GPU:

Here my config.yaml:


proxy:
  service:
    loadBalancerIP: 10.0.0.150
singleuser:
  image:
    name: jupyter/tensorflow-notebook
    tag: latest
  defaultUrl: "/lab"

### THIS WORK ####
#  extraEnv:
#    JUPYTERHUB_SINGLEUSER_APP: "jupyter_server.serverapp.ServerApp"
#  profileList:
#    - display_name: "1 GPU Server"
#      description: "Notebook server with access to 1 GPU"
#      kubespawner_override:
#        extra_resource_limits:
#          nvidia.com/gpu: "1"
#    - display_name: "2 GPU Server"
#      description: "Notebook server with access to 2 GPU"
#      kubespawner_override:
#        extra_resource_limits:
#          nvidia.com/gpu: "2"
#    - display_name: "CPU Server"
#      description: "Notebook server with only CPU"
#      default: true
###############################
hub:
   extraConfig:
     auth: |
        c.Authenticator.auto_login = True
        c.GenericOAuthenticator.client_id = "client"
        c.GenericOAuthenticator.client_secret = "secret"
        c.GenericOAuthenticator.oauth_callback_url = "https://myurl/hub/oauth_callback"
        c.GenericOAuthenticator.authorize_url = "https://myurl/auth/realms/master/protocol/openid-connect/auth"
        c.GenericOAuthenticator.token_url = "https://myurl/auth/realms/master/protocol/openid-connect/token"
        c.GenericOAuthenticator.userdata_url = "https://myurl/auth/realms/master/protocol/openid-connect/userinfo"
        c.GenericOAuthenticator.login_service = "keycloak"
        c.GenericOAuthenticator.username_key = "preferred_username"
        c.GenericOAuthenticator.userdata_params.state = "state"
        c.JupyterHub.authenticator_class = "generic-oauth"
     options_form: |
        async def dynamic_options_form(self):

            acl = {
                "gpu" : ["user1"],
                "cpu" : ["user2"]
            }

            self.profile_list = [
                {
                    'default': True,
                    'display_name': 'CPU server',
                    'description': 'Basic CPU server.',
                },
            ]

            username = self.user.name
            if username in acl["gpu"]:
                self.profile_list.extend([
                    {
                        'display_name': '1 GPU',
                        'default': True,
                        'description': 'Notebook server with access to 1 GPU',
                        'kubespawner_override': { 'extra_resource_limits': {"nvidia.com/gpu": "1"} },
                    }
                ])
            return self._options_form_default()
        c.KubeSpawner.options_form = dynamic_options_form

The GPU is correctly display for user1 and not user2 no problems.
But when I try to launch the GPU notebook:

Server requested

2022-02-08T09:36:05.947872Z [Normal] Successfully assigned colabia/jupyter-yannf to colabia-gpu01

2022-02-08T09:36:06Z [Normal] Container image “jupyterhub/k8s-network-tools:1.2.0” already present on machine

2022-02-08T09:36:06Z [Normal] Created container block-cloud-metadata

2022-02-08T09:36:06Z [Normal] Started container block-cloud-metadata

2022-02-08T09:36:07Z [Normal] Container image “jupyter/tensorflow-notebook:latest” already present on machine

2022-02-08T09:36:07Z [Normal] Created container notebook

2022-02-08T09:36:08Z [Warning] Error: failed to create containerd task: failed to create shim: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli.real: initialization error: nvml error: driver/library version mismatch: unknown

2022-02-08T09:36:09Z [Warning] Back-off restarting failed container

When I try with the first config with extraenv everything work like a charm:

Just to be sure:

kubectl get nodes -o=custom-columns=NAME:.metadata.name,GPUs:.status.capacity.'nvidia\.com/gpu'
NAME                       GPUs
colabia-gpu01              3
colabia-gpu02              3

What I’m doing wrong here?

Thanks ! :slight_smile:

Ok edit.
Seems the docker image change during the nite and broke nvidia.
I have reboot all the node and everything work well!

Sory for the disturbance

I’m let this here just in case it can be usefull for someone