Dear all,
I would like to thank you for the software you have produced and maintained.
I have a “bare-metal” cluster I set up with the jupyterhub guide using microk8s. It consists of two nodes: thinkpad-node (master node, the jhub is running on the node) and gpu-server (a computer with GPU).
I created the following profileList:
profileList:
- display_name: "thinkpad-node"
description: "thinkpad-node running on the laptop"
default: true
kubespawner_override:
image: ghcr.io/mbi-div-b/mbi-div-b-notebook-base
node_selector: {'node-role.kubernetes.io/cpu': 'cpu'}
- display_name: "gpu-server"
description: "gpu-node running on the real computer in the rack"
kubespawner_override:
image: ghcr.io/mbi-div-b/mbi-div-b-notebook-cuda
#node_selector: {'name': 'gpu-server'}
node_selector: {'node-role.kubernetes.io/gpu': 'gpu'}
where both nodes have the following labels. The gpu-server profile works as expected but the thinkpad-node profile cannot be spawned with the error:
2023-06-28T12:59:24.169862Z [Warning] 0/2 nodes are available: 1 node(s) didn't match Pod's node affinity/selector. preemption: 0/2 nodes are available: 1 No preemption victims found for incoming pod, 1 Preemption is not helpful for scheduling..
I suspected that the problem is the restriction of NoSchedule on the master node but I have removed it. I thought that maybe this is a general issue of my k8s setup but I was able to create and assign a pause (kinda dummy/hello-world) pod on the “thinkpad-node” with the following yaml config:
apiVersion: v1
kind: Pod
metadata:
name: pause-thinkpad
spec:
containers:
- name: pause
image: registry.k8s.io/pause:2.0
nodeSelector:
node-role.kubernetes.io/cpu : cpu
Here are the descriptions of the both nodes:
lunin@thinkpad-node:~/k8s-jhub-configs$ microk8s kubectl describe node thinkpad-node
Name: thinkpad-node
Roles: cpu
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/os=linux
feature.node.kubernetes.io/cpu-cpuid.ADX=true
feature.node.kubernetes.io/cpu-cpuid.AESNI=true
feature.node.kubernetes.io/cpu-cpuid.AVX=true
feature.node.kubernetes.io/cpu-cpuid.AVX2=true
feature.node.kubernetes.io/cpu-cpuid.FMA3=true
feature.node.kubernetes.io/cpu-cpuid.IBPB=true
feature.node.kubernetes.io/cpu-cpuid.MPX=true
feature.node.kubernetes.io/cpu-cpuid.RTM_ALWAYS_ABORT=true
feature.node.kubernetes.io/cpu-cpuid.STIBP=true
feature.node.kubernetes.io/cpu-cpuid.VMX=true
feature.node.kubernetes.io/cpu-cstate.enabled=true
feature.node.kubernetes.io/cpu-hardware_multithreading=true
feature.node.kubernetes.io/cpu-pstate.scaling_governor=powersave
feature.node.kubernetes.io/cpu-pstate.status=active
feature.node.kubernetes.io/cpu-pstate.turbo=true
feature.node.kubernetes.io/kernel-config.NO_HZ=true
feature.node.kubernetes.io/kernel-config.NO_HZ_IDLE=true
feature.node.kubernetes.io/kernel-version.full=5.15.0-75-generic
feature.node.kubernetes.io/kernel-version.major=5
feature.node.kubernetes.io/kernel-version.minor=15
feature.node.kubernetes.io/kernel-version.revision=0
feature.node.kubernetes.io/pci-8086.present=true
feature.node.kubernetes.io/storage-nonrotationaldisk=true
feature.node.kubernetes.io/system-os_release.ID=ubuntu
feature.node.kubernetes.io/system-os_release.VERSION_ID=22.04
feature.node.kubernetes.io/system-os_release.VERSION_ID.major=22
feature.node.kubernetes.io/system-os_release.VERSION_ID.minor=04
feature.node.kubernetes.io/usb-ef_04f2_b604.present=true
feature.node.kubernetes.io/usb-ff_06cb_009a.present=true
kubernetes.io/arch=amd64
kubernetes.io/hostname=thinkpad-node
kubernetes.io/os=linux
microk8s.io/cluster=true
name=thinkpad-node
node-role.kubernetes.io/cpu=cpu
node.kubernetes.io/microk8s-controlplane=microk8s-controlplane
topology.cstor.openebs.io/nodeName=thinkpad-node
topology.jiva.openebs.io/nodeName=thinkpad-node
Annotations: csi.volume.kubernetes.io/nodeid: {"cstor.csi.openebs.io":"thinkpad-node","jiva.csi.openebs.io":"thinkpad-node"}
nfd.node.kubernetes.io/extended-resources:
nfd.node.kubernetes.io/feature-labels:
cpu-cpuid.ADX,cpu-cpuid.AESNI,cpu-cpuid.AVX,cpu-cpuid.AVX2,cpu-cpuid.FMA3,cpu-cpuid.IBPB,cpu-cpuid.MPX,cpu-cpuid.RTM_ALWAYS_ABORT,cpu-cpui...
nfd.node.kubernetes.io/master.version: v0.10.1
nfd.node.kubernetes.io/worker.version: v0.10.1
node.alpha.kubernetes.io/ttl: 0
projectcalico.org/IPv4Address: 10.6.17.62/22
projectcalico.org/IPv4VXLANTunnelAddr: 10.1.102.192
volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp: Tue, 20 Jun 2023 12:18:43 +0000
Taints: <none>
Unschedulable: false
lunin@thinkpad-node:~/k8s-jhub-configs$ microk8s kubectl describe node gpu-server
Name: gpu-server
Roles: gpu
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/os=linux
feature.node.kubernetes.io/cpu-cpuid.AESNI=true
feature.node.kubernetes.io/cpu-cpuid.AVX=true
feature.node.kubernetes.io/cpu-cpuid.AVXSLOW=true
feature.node.kubernetes.io/cpu-cpuid.CPBOOST=true
feature.node.kubernetes.io/cpu-cpuid.FMA4=true
feature.node.kubernetes.io/cpu-cpuid.IBS=true
feature.node.kubernetes.io/cpu-cpuid.IBSBRNTRGT=true
feature.node.kubernetes.io/cpu-cpuid.IBSFETCHSAM=true
feature.node.kubernetes.io/cpu-cpuid.IBSFFV=true
feature.node.kubernetes.io/cpu-cpuid.IBSOPCNT=true
feature.node.kubernetes.io/cpu-cpuid.IBSOPCNTEXT=true
feature.node.kubernetes.io/cpu-cpuid.IBSOPSAM=true
feature.node.kubernetes.io/cpu-cpuid.IBSRDWROPCNT=true
feature.node.kubernetes.io/cpu-cpuid.IBSRIPINVALIDCHK=true
feature.node.kubernetes.io/cpu-cpuid.SSE4A=true
feature.node.kubernetes.io/cpu-cpuid.XOP=true
feature.node.kubernetes.io/cpu-hardware_multithreading=true
feature.node.kubernetes.io/kernel-config.NO_HZ=true
feature.node.kubernetes.io/kernel-config.NO_HZ_IDLE=true
feature.node.kubernetes.io/kernel-version.full=5.15.0-75-generic
feature.node.kubernetes.io/kernel-version.major=5
feature.node.kubernetes.io/kernel-version.minor=15
feature.node.kubernetes.io/kernel-version.revision=0
feature.node.kubernetes.io/pci-10de.present=true
feature.node.kubernetes.io/pci-10ec.present=true
feature.node.kubernetes.io/storage-nonrotationaldisk=true
feature.node.kubernetes.io/system-os_release.ID=ubuntu
feature.node.kubernetes.io/system-os_release.VERSION_ID=22.04
feature.node.kubernetes.io/system-os_release.VERSION_ID.major=22
feature.node.kubernetes.io/system-os_release.VERSION_ID.minor=04
kubernetes.io/arch=amd64
kubernetes.io/hostname=gpu-server
kubernetes.io/os=linux
microk8s.io/cluster=true
name=gpu-server
node-role.kubernetes.io/gpu=gpu
node.kubernetes.io/microk8s-worker=microk8s-worker
nvidia.com/cuda.driver.major=515
nvidia.com/cuda.driver.minor=43
nvidia.com/cuda.driver.rev=04
nvidia.com/cuda.runtime.major=11
nvidia.com/cuda.runtime.minor=7
nvidia.com/gfd.timestamp=1687266579
nvidia.com/gpu.compute.major=7
nvidia.com/gpu.compute.minor=5
nvidia.com/gpu.count=1
nvidia.com/gpu.deploy.container-toolkit=true
nvidia.com/gpu.deploy.dcgm=true
nvidia.com/gpu.deploy.dcgm-exporter=true
nvidia.com/gpu.deploy.device-plugin=true
nvidia.com/gpu.deploy.driver=pre-installed
nvidia.com/gpu.deploy.gpu-feature-discovery=true
nvidia.com/gpu.deploy.node-status-exporter=true
nvidia.com/gpu.deploy.operator-validator=true
nvidia.com/gpu.family=turing
nvidia.com/gpu.machine=System-Product-Name
nvidia.com/gpu.memory=8192
nvidia.com/gpu.present=true
nvidia.com/gpu.product=NVIDIA-GeForce-RTX-2070
nvidia.com/gpu.replicas=1
nvidia.com/mig.capable=false
nvidia.com/mig.strategy=single
topology.cstor.openebs.io/nodeName=gpu-server
topology.jiva.openebs.io/nodeName=gpu-server
Annotations: csi.volume.kubernetes.io/nodeid: {"cstor.csi.openebs.io":"gpu-server","jiva.csi.openebs.io":"gpu-server"}
nfd.node.kubernetes.io/extended-resources:
nfd.node.kubernetes.io/feature-labels:
cpu-cpuid.AESNI,cpu-cpuid.AVX,cpu-cpuid.AVXSLOW,cpu-cpuid.CPBOOST,cpu-cpuid.FMA4,cpu-cpuid.IBS,cpu-cpuid.IBSBRNTRGT,cpu-cpuid.IBSFETCHSAM,...
nfd.node.kubernetes.io/worker.version: v0.10.1
node.alpha.kubernetes.io/ttl: 0
projectcalico.org/IPv4Address: 10.6.5.53/24
projectcalico.org/IPv4VXLANTunnelAddr: 10.1.246.192
volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp: Tue, 20 Jun 2023 12:22:54 +0000
Taints: <none>
Unschedulable: false