I am using a custom GPU image and attempting to provision a pod onto a GPU node.
Ideally what I would like to do is force the GPU pod to be placed onto the GPU node pool while also maintaining a node affinity for other user pods onto just the user pool.
I have been able to get this to work with userPods.nodeAffinity.matchNodePurpose = prefer
along with setting some kubespawner_overrides in singleuser.profileList
of the GPU image.
node_selector:
node_pool: gpu-pool
tolerations:
- key: nvidia.com/gpu
operator: Equal
value: present
effect: NoSchedule
The issue that I’ve come across is when I set scheduling.userPods.nodeAffinity.matchNodePurpose=required
then the GPU pods receive that requirement also so they don’t end up being assigned to the GPU pool. Most likely has to do with the fact that If you specify multiple matchExpressions associated with nodeSelectorTerms, then the pod can be scheduled onto a node only if all matchExpressions is satisfied
. Link here.
I have also passed in c.kubespawner.node_affinity_required
and doing so ended up with the resulting YAML output on my GPU pod.
spec:
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- preference:
matchExpressions:
- key: hub.jupyter.org/node-purpose
operator: In
values:
- user
weight: 100
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: nvidia.com/gpu
operator: In
values:
- present
How can I remove the default value of userPod.nodeAffinity = prefer
from being applied to this pod while retaining that affinity for the other pods (user-placeholder, hook-image-puller, continuous-image-puller, jupyter-).