GPu doesn't work with extra_config

YFrendo · February 8, 2022, 9:54am

Hy everyone!

I’m trying to make a jupyterhub with restriction for some user to acces GPU:

Here my config.yaml:


proxy:
  service:
    loadBalancerIP: 10.0.0.150
singleuser:
  image:
    name: jupyter/tensorflow-notebook
    tag: latest
  defaultUrl: "/lab"

### THIS WORK ####
#  extraEnv:
#    JUPYTERHUB_SINGLEUSER_APP: "jupyter_server.serverapp.ServerApp"
#  profileList:
#    - display_name: "1 GPU Server"
#      description: "Notebook server with access to 1 GPU"
#      kubespawner_override:
#        extra_resource_limits:
#          nvidia.com/gpu: "1"
#    - display_name: "2 GPU Server"
#      description: "Notebook server with access to 2 GPU"
#      kubespawner_override:
#        extra_resource_limits:
#          nvidia.com/gpu: "2"
#    - display_name: "CPU Server"
#      description: "Notebook server with only CPU"
#      default: true
###############################
hub:
   extraConfig:
     auth: |
        c.Authenticator.auto_login = True
        c.GenericOAuthenticator.client_id = "client"
        c.GenericOAuthenticator.client_secret = "secret"
        c.GenericOAuthenticator.oauth_callback_url = "https://myurl/hub/oauth_callback"
        c.GenericOAuthenticator.authorize_url = "https://myurl/auth/realms/master/protocol/openid-connect/auth"
        c.GenericOAuthenticator.token_url = "https://myurl/auth/realms/master/protocol/openid-connect/token"
        c.GenericOAuthenticator.userdata_url = "https://myurl/auth/realms/master/protocol/openid-connect/userinfo"
        c.GenericOAuthenticator.login_service = "keycloak"
        c.GenericOAuthenticator.username_key = "preferred_username"
        c.GenericOAuthenticator.userdata_params.state = "state"
        c.JupyterHub.authenticator_class = "generic-oauth"
     options_form: |
        async def dynamic_options_form(self):

            acl = {
                "gpu" : ["user1"],
                "cpu" : ["user2"]
            }

            self.profile_list = [
                {
                    'default': True,
                    'display_name': 'CPU server',
                    'description': 'Basic CPU server.',
                },
            ]

            username = self.user.name
            if username in acl["gpu"]:
                self.profile_list.extend([
                    {
                        'display_name': '1 GPU',
                        'default': True,
                        'description': 'Notebook server with access to 1 GPU',
                        'kubespawner_override': { 'extra_resource_limits': {"nvidia.com/gpu": "1"} },
                    }
                ])
            return self._options_form_default()
        c.KubeSpawner.options_form = dynamic_options_form

The GPU is correctly display for user1 and not user2 no problems.
But when I try to launch the GPU notebook:

Server requested

2022-02-08T09:36:05.947872Z [Normal] Successfully assigned colabia/jupyter-yannf to colabia-gpu01

2022-02-08T09:36:06Z [Normal] Container image “jupyterhub/k8s-network-tools:1.2.0” already present on machine

2022-02-08T09:36:06Z [Normal] Created container block-cloud-metadata

2022-02-08T09:36:06Z [Normal] Started container block-cloud-metadata

2022-02-08T09:36:07Z [Normal] Container image “jupyter/tensorflow-notebook:latest” already present on machine

2022-02-08T09:36:07Z [Normal] Created container notebook

2022-02-08T09:36:08Z [Warning] Error: failed to create containerd task: failed to create shim: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli.real: initialization error: nvml error: driver/library version mismatch: unknown

2022-02-08T09:36:09Z [Warning] Back-off restarting failed container

When I try with the first config with extraenv everything work like a charm:

Just to be sure:

kubectl get nodes -o=custom-columns=NAME:.metadata.name,GPUs:.status.capacity.'nvidia\.com/gpu'
NAME                       GPUs
colabia-gpu01              3
colabia-gpu02              3

What I’m doing wrong here?

Thanks !

YFrendo · February 8, 2022, 11:12am

Ok edit.
Seems the docker image change during the nite and broke nvidia.
I have reboot all the node and everything work well!

Sory for the disturbance

I’m let this here just in case it can be usefull for someone

Topic		Replies	Views
Insufficient gpu problem! Zero to JupyterHub on Kubernetes help-wanted	1	1240	October 24, 2021
How to share GPU to mutiple pods? Insufficient nvidia.com/gpu JupyterHub jupyterhub , help-wanted	6	811	June 3, 2024
Prevent pod to get GPUs Zero to JupyterHub on Kubernetes	1	251	March 19, 2024
Unwanted shared GPU Zero to JupyterHub on Kubernetes	2	701	June 1, 2021
GPU not detected in Jupyternotebook on Kubernetes GPU enabled cluster discuss how-to	1	155	August 16, 2024

GPu doesn't work with extra_config

Related topics