Jupyter spawner can not spawn pods with GPU nvidia

metaIhead · January 20, 2026, 5:55am

I created a node with GPU. Overall, I configured everything correctly: when I simply create a static resource pod and add the limit nvidia.com/gpu: 1, everything works as expected.

But when I try to do the same thing in the profileList of JupyterHub (add GPU request/limit), I get the following error in describe:

   extra_resource_requests:
     nvidia.com/gpu: 1

text

Warning  FailedScheduling  2m14s  default-scheduler  0/8 nodes are available: 1 Insufficient nvidia.com/gpu. 
preemption: 0/8 nodes are available: 1 No preemption victims found for incoming pod, 7 Preemption is not helpful for scheduling.

To clarify: the resources are definitely there — as I already mentioned, creating pods directly works fine. This behavior only appears when the pod is spawned through JupyterHub.

P.S. I already tried the obvious thing — switched from jupyter-scheduler to default-scheduler, it didn’t help.

Paul2708 · January 20, 2026, 7:36am

Hey there, KubeSpawner does not have the option extra_resource_requests, but extra_resource_guarantees and extra_resource_limits instead. So trying to set these values may resolve your issue?

metaIhead · January 20, 2026, 3:27pm

Hi! Yeah, sorry, my fault!! I mean I try to do this:

- display_name: "ML with GPU"
  description: "Ubuntu 22.04"
  kubespawner_override:
    scheduler_name: default-scheduler
    image: some_image_with_nvidia_smi
    extra_resource_limits:
      nvidia.com/gpu: 1

Paul2708 · January 21, 2026, 8:04am

Hmm, that’s odd. Did you take a look at Customizing User Resources — Zero to JupyterHub with Kubernetes documentation and verified that the command to check GPU availability succeeds?

metaIhead · January 21, 2026, 8:53am

Yes! Command to check GPU availability is succeed

kubectl get nodes -o=custom-columns=NAME:.metadata.name,GPUs:.status.capacity.‘nvidia.com/gpu’
NAME GPUs
gpu-node 2

manics · January 21, 2026, 11:00am

Can you show us the full YAML for the pod?

metaIhead · January 21, 2026, 11:01am

Full yaml of test pod

apiVersion: v1
kind: Pod
metadata:
  name: gpu-test
spec:
  containers:
  - name: cuda
    image: nvidia/cuda:13.0.0-base-ubuntu22.04
    command: ["nvidia-smi", "-l", "5"]
    resources:
      limits:
        nvidia.com/gpu: 1

manics · January 21, 2026, 1:01pm

Sorry, I meant the full YAML of the failing jupyter pod (kubectl get pod jupyter-…. -oyaml)

metaIhead · January 22, 2026, 3:12am

apiVersion: v1
kind: Pod
metadata:
  annotations:
    hub.jupyter.org/jupyterhub-version: 5.4.3
    hub.jupyter.org/kubespawner-version: 7.0.0
    hub.jupyter.org/username: ikol006
  creationTimestamp: "2026-01-22T03:06:14Z"
  labels:
    app: jupyterhub
    app.kubernetes.io/component: singleuser-server
    app.kubernetes.io/instance: jupyterhub
    app.kubernetes.io/managed-by: kubespawner
    app.kubernetes.io/name: jupyterhub
    chart: jupyterhub-4.3.2
    component: singleuser-server
    helm.sh/chart: jupyterhub-4.3.2
    hub.jupyter.org/network-access-hub: "true"
    hub.jupyter.org/servername: ""
    hub.jupyter.org/username: ikol006
    release: jupyterhub
  name: jupyter-ikol006
  namespace: jupyterhub
  resourceVersion: "930369902"
  uid: ccc01307-09bb-4b2a-b870-88c82764d874
spec:
  affinity:
    nodeAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - preference:
          matchExpressions:
          - key: hub.jupyter.org/node-purpose
            operator: In
            values:
            - user
        weight: 100
  automountServiceAccountToken: false
  containers:
  - args:
    - jupyterhub-singleuser
    env:
    - name: JPY_API_TOKEN
      value: 66sdfe0e9e7af47a91b43205aa6dc8c4
    - name: JUPYTERHUB_ACTIVITY_URL
      value: http://hub:8081/hub/api/users/ikol006/activity
    - name: JUPYTERHUB_ADMIN_ACCESS
      value: "1"
    - name: JUPYTERHUB_API_TOKEN
      value: 66sdfe0e9e7af47a91b43205aa6dc8c4
    - name: JUPYTERHUB_API_URL
      value: http://hub:8081/hub/api
    - name: JUPYTERHUB_BASE_URL
      value: /
    - name: JUPYTERHUB_CLIENT_ID
      value: jupyterhub-user-ikol006
    - name: JUPYTERHUB_COOKIE_HOST_PREFIX_ENABLED
      value: "0"
    - name: JUPYTERHUB_DEBUG
      value: "1"
    - name: JUPYTERHUB_HOST
    - name: JUPYTERHUB_OAUTH_ACCESS_SCOPES
      value: '["access:servers!server=ikol006/", "access:servers!user=ikol006"]'
    - name: JUPYTERHUB_OAUTH_CALLBACK_URL
      value: /user/ikol006/oauth_callback
    - name: JUPYTERHUB_OAUTH_CLIENT_ALLOWED_SCOPES
      value: '[]'
    - name: JUPYTERHUB_OAUTH_SCOPES
      value: '["access:servers!server=ikol006/", "access:servers!user=ikol006"]'
    - name: JUPYTERHUB_PUBLIC_HUB_URL
    - name: JUPYTERHUB_PUBLIC_URL
    - name: JUPYTERHUB_SERVER_NAME
    - name: JUPYTERHUB_SERVICE_PREFIX
      value: /user/ikol006/
    - name: JUPYTERHUB_SERVICE_URL
      value: http://0.0.0.0:8888/user/ikol006/
    - name: JUPYTERHUB_USER
      value: ikol006
    - name: JUPYTER_IMAGE
      value: ml-with-gpu-notebook:x86_64-ubuntu-22.04
    - name: JUPYTER_IMAGE_SPEC
      value: ml-with-gpu-notebook:x86_64-ubuntu-22.04
    - name: MEM_GUARANTEE
      value: "1073741824"
    - name: PIP_INDEX_URL
      value: https://nexus.local/repository/pypi-proxy/simple
    - name: PIP_TIMEOUT
      value: "60"
    - name: PIP_TRUSTED_HOST
      value: nexus.local
    - name: SSL_CERT_FILE
      value: /etc/ssl/certs/ca-certificates.crt
    image: ml-with-gpu-notebook:x86_64-ubuntu-22.04
    imagePullPolicy: Always
    lifecycle:
      postStart:
        exec:
          command:
          - sh
          - -c
          - |
            cp /etc/ssl/certs/ca-certificates.crt /opt/conda/lib/python3.11/site-packages/certifi/cacert.pem
    name: notebook
    ports:
    - containerPort: 8888
      name: notebook-port
      protocol: TCP
    resources:
      limits:
        nvidia.com/gpu: "1"
      requests:
        memory: "1073741824"
        nvidia.com/gpu: "1"
    securityContext:
      allowPrivilegeEscalation: false
      runAsUser: 1000
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /etc/ssl/certs/ca-certificates.crt
      name: files
      subPath: ca-certificates.crt
    - mountPath: /home/jovyan
      name: volume-ikol006
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  initContainers:
  - command:
    - iptables
    - --append
    - OUTPUT
    - --protocol
    - tcp
    - --destination
    - 111.127.222.127
    - --destination-port
    - "80"
    - --jump
    - DROP
    image: quay.io/jupyterhub/k8s-network-tools:4.3.2
    imagePullPolicy: IfNotPresent
    name: block-cloud-metadata
    resources: {}
    securityContext:
      capabilities:
        add:
        - NET_ADMIN
      privileged: true
      runAsUser: 0
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
  preemptionPolicy: PreemptLowerPriority
  priority: 1000
  priorityClassName: develop
  restartPolicy: OnFailure
  schedulerName: default-scheduler
  securityContext:
    fsGroup: 100
  serviceAccount: default
  serviceAccountName: default
  terminationGracePeriodSeconds: 30
  tolerations:
  - effect: NoSchedule
    key: hub.jupyter.org/dedicated
    operator: Equal
    value: user
  - effect: NoSchedule
    key: hub.jupyter.org_dedicated
    operator: Equal
    value: user
  - effect: NoSchedule
    key: nvidia.com/gpu
    operator: Exists
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  - effect: NoSchedule
    key: node.kubernetes.io/memory-pressure
    operator: Exists
  volumes:
  - name: files
    secret:
      defaultMode: 420
      items:
      - key: ca-certificates.crt
        mode: 420
        path: ca-certificates.crt
      secretName: singleuser
  - name: volume-ikol006
    persistentVolumeClaim:
      claimName: claim-ikol006
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2026-01-22T03:06:15Z"
    message: '0/8 nodes are available: 1 Insufficient nvidia.com/gpu. preemption:
      0/8 nodes are available: 1 No preemption victims found for incoming pod, 7 Preemption
      is not helpful for scheduling..'
    reason: Unschedulable
    status: "False"
    type: PodScheduled
  phase: Pending
  qosClass: Burstable

manics · January 22, 2026, 7:35pm

Can you try creating your gpu-test pod with these requests as well as limits?

metaIhead · January 23, 2026, 5:43am

From jupyterhub or just single resource pod ?

manics · January 23, 2026, 3:23pm

The test pod that you said was working in Jupyter spawner can not spawn pods with GPU nvidia - #7 by metaIhead

Try creating that pod immediately after your JupyterHub pod fails to spawn

metaIhead · January 27, 2026, 6:02am

A test pod is created under any circumstances, even after a Jupyter pod attempts to create or crashes.

Creating a test pod without limits, but only with requests, won’t work.

The Pod “gpu-test” is invalid: spec.containers[0].resources.limits: Required value: Limit must be set for non overcommitable resources

If I specify only limits in a test pod, then the requests section will exist in the manifest obtained via kubectl get pods -o yaml.

consideRatio · January 27, 2026, 9:18am

I recall some complexity about this, only declare the nvidia limit, not request, and see if it works.

metaIhead · January 28, 2026, 2:58am

Yes, that’s correct — I specify only limits, and requests are added automatically in the pod manifest.

manics · January 28, 2026, 11:02am

This will be quite tedious, but can you manually (not via JupyterHub) create a pod based on the YAML in

but:

Change argsto jupyterlab, which should allow the pod to run as a standalone JupyterLab server
Delete the fields managed by Kubernetes (e.,g. creationTimestamp, status, etc)

Presumably that should fail with the same error as in your first post( Insufficient nvidia.com/gpu). Then try removing fields until eventually you approach your working example in

metaIhead · February 3, 2026, 4:59am

I removed metadata such as status, creationTime, and so on, and also removed the PVC attachment. After that, the pod started on the required node where GPU access is available.

So in the end, the manifest looked like this.

apiVersion: v1
kind: Pod
metadata:
  annotations:
    hub.jupyter.org/jupyterhub-version: 5.4.3
    hub.jupyter.org/kubespawner-version: 7.0.0
    hub.jupyter.org/username: ikol006
  labels:
    app: jupyterhub
    app.kubernetes.io/component: singleuser-server
    app.kubernetes.io/instance: jupyterhub
    app.kubernetes.io/managed-by: kubespawner
    app.kubernetes.io/name: jupyterhub
    chart: jupyterhub-4.3.2
    component: singleuser-server
    helm.sh/chart: jupyterhub-4.3.2
    hub.jupyter.org/network-access-hub: "true"
    hub.jupyter.org/servername: ""
    hub.jupyter.org/username: ikol006
    release: jupyterhub
  name: jupyter-ikol006
  namespace: test
spec:
  affinity:
    nodeAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - preference:
          matchExpressions:
          - key: hub.jupyter.org/node-purpose
            operator: In
            values:
            - user
        weight: 100
  automountServiceAccountToken: false
  containers:
  - args:
    - jupyterhub-singleuser
    env:
    - name: JPY_API_TOKEN
      value: Laiv5Jie0Fie9Ue3auf5ohnge
    - name: JUPYTERHUB_ACTIVITY_URL
      value: http://hub:8081/hub/api/users/ikol006/activity
    - name: JUPYTERHUB_ADMIN_ACCESS
      value: "1"
    - name: JUPYTERHUB_API_TOKEN
      value: Laiv5Jie0Fie9Ue3auf5ohnge
    - name: JUPYTERHUB_API_URL
      value: http://hub:8081/hub/api
    - name: JUPYTERHUB_BASE_URL
      value: /
    - name: JUPYTERHUB_CLIENT_ID
      value: jupyterhub-user-ikol006
    - name: JUPYTERHUB_COOKIE_HOST_PREFIX_ENABLED
      value: "0"
    - name: JUPYTERHUB_DEBUG
      value: "1"
    - name: JUPYTERHUB_HOST
    - name: JUPYTERHUB_OAUTH_ACCESS_SCOPES
      value: '["access:servers!server=ikol006/", "access:servers!user=ikol006"]'
    - name: JUPYTERHUB_OAUTH_CALLBACK_URL
      value: /user/ikol006/oauth_callback
    - name: JUPYTERHUB_OAUTH_CLIENT_ALLOWED_SCOPES
      value: '[]'
    - name: JUPYTERHUB_OAUTH_SCOPES
      value: '["access:servers!server=ikol006/", "access:servers!user=ikol006"]'
    - name: JUPYTERHUB_PUBLIC_HUB_URL
    - name: JUPYTERHUB_PUBLIC_URL
    - name: JUPYTERHUB_SERVER_NAME
    - name: JUPYTERHUB_SERVICE_PREFIX
      value: /user/ikol006/
    - name: JUPYTERHUB_SERVICE_URL
      value: http://0.0.0.0:8888/user/ikol006/
    - name: JUPYTERHUB_USER
      value: ikol006
    - name: JUPYTER_IMAGE
      value: registry.mycompany.com/internals/jupyterhub/ml-with-gpu-notebook:x86_64-ubuntu-22.04
    - name: JUPYTER_IMAGE_SPEC
      value: registry.mycompany.com/internals/jupyterhub/ml-with-gpu-notebook:x86_64-ubuntu-22.04
    - name: MEM_GUARANTEE
      value: "1073741824"
    - name: PIP_INDEX_URL
      value: https://repo.mycompany.com/repository/pypi-proxy/simple
    - name: PIP_TIMEOUT
      value: "60"
    - name: PIP_TRUSTED_HOST
      value: repo.mycompany.com
    - name: SSL_CERT_FILE
      value: /etc/ssl/certs/ca-certificates.crt
    image: jupyterhub/ml-with-gpu-notebook:x86_64-ubuntu-22.04
    imagePullPolicy: Always
    lifecycle:
      postStart:
        exec:
          command:
          - sh
          - -c
          - |
            cp /etc/ssl/certs/ca-certificates.crt /opt/conda/lib/python3.11/site-packages/certifi/cacert.pem
    name: notebook
    ports:
    - containerPort: 8888
      name: notebook-port
      protocol: TCP
    resources:
      limits:
        nvidia.com/gpu: "1"
      requests:
        memory: "1073741824"
        nvidia.com/gpu: "1"
    securityContext:
      allowPrivilegeEscalation: false
      runAsUser: 1000
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    # volumeMounts:
    # - mountPath: /etc/ssl/certs/ca-certificates.crt
    #   name: files
    #   subPath: ca-certificates.crt
    # - mountPath: /home/jovyan
    #   name: volume-ikol006
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  initContainers:
  - command:
    - iptables
    - --append
    - OUTPUT
    - --protocol
    - tcp
    - --destination
    - 111.222.333.444
    - --destination-port
    - "80"
    - --jump
    - DROP
    image: jupyterhub/k8s-network-tools:4.3.2
    imagePullPolicy: IfNotPresent
    name: block-cloud-metadata
    resources: {}
    securityContext:
      capabilities:
        add:
        - NET_ADMIN
      privileged: true
      runAsUser: 0
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
  preemptionPolicy: PreemptLowerPriority
  priority: 1000
  priorityClassName: develop
  restartPolicy: OnFailure
  schedulerName: default-scheduler
  securityContext:
    fsGroup: 100
  serviceAccount: default
  serviceAccountName: default
  terminationGracePeriodSeconds: 30
  tolerations:
  - effect: NoSchedule
    key: hub.jupyter.org/dedicated
    operator: Equal
    value: user
  - effect: NoSchedule
    key: hub.jupyter.org_dedicated
    operator: Equal
    value: user
  - effect: NoSchedule
    key: nvidia.com/gpu
    operator: Exists
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  - effect: NoSchedule
    key: node.kubernetes.io/memory-pressure
    operator: Exists
  # volumes:
  # - name: files
  #   secret:
  #     defaultMode: 420
  #     items:
  #     - key: ca-certificates.crt
  #       mode: 420
  #       path: ca-certificates.crt
  #     secretName: singleuser
  # - name: volume-ikol006
  #   persistentVolumeClaim:
  #     claimName: claim-ikol006

manics · February 3, 2026, 10:53am

What storage provider are you using? Are persistent volumes always mountable from all nodes?

consideRatio · February 3, 2026, 1:08pm

Is there a daemonset resource related to storage, that runs on many nodes, but not the GPU node because it doesnt tolerate a taint? Then make it tolerate it and try again!

metaIhead · February 4, 2026, 9:56am

I’m using the vSphere CSI driver, but I removed the PVC from the manifest since I didn’t create any PVCs for the pod.

Topic		Replies	Views
JupyterHub, DockerSpawner, podman and GPU JupyterHub jupyterhub , help-wanted	5	822	June 25, 2024
Cannot spawn on the master kubernetes node Zero to JupyterHub on Kubernetes jupyterhub	6	1149	July 4, 2023
GPu doesn't work with extra_config Zero to JupyterHub on Kubernetes help-wanted	1	755	February 8, 2022
Prevent pod to get GPUs Zero to JupyterHub on Kubernetes	1	324	March 19, 2024
Insufficient gpu problem! Zero to JupyterHub on Kubernetes help-wanted	1	1289	October 24, 2021

Jupyter spawner can not spawn pods with GPU nvidia

Related topics