I’m having trouble understanding how I debug a cluster refusing to schedule on any more than 3 nodes for user pods. Its 3 all the time. At first i thought it was because i asked for 3 placeholders but asking for 5 didnt seem to help. Here are the relevent (i think…) parts of the config.yaml (this on us-east1-c), and I crested the user pool with
gcloud beta container node-pools create jhub-cpu-pool --machine-type n1-standard-2 --num-nodes 1 --enable-autoscaling --min-nodes 1 --max-nodes 34 --node-labels hub.jupyter.org/node-purpose=user --node-taints hub.jupyter.org_dedicated=user:NoSchedule --zone us-east1-c --cluster univai-jhub
prePuller:
continuous:
enabled: true
scheduling:
userScheduler:
enabled: true
podPriority:
enabled: true
userPlaceholder:
enabled: true
replicas: 5
userPods:
nodeAffinity:
matchNodePurpose: require
singleuser:
image:
# Get the latest image tag at:
# https://hub.docker.com/r/jupyter/datascience-notebook/tags/
# Inspect the Dockerfile at:
# https://github.com/jupyter/docker-stacks/tree/master/datascience-notebook/Dockerfile
name: gcr.io/univai-jupyterhub/tensorflow_pytorch_cpu
tag: ae6c90f96f5c025cc66317b27bac8722db7a4097
defaultUrl: "/lab"
storage:
capacity: 4Gi
memory:
guarantee: 1G
limit: 3.5G
cpu:
guarantee: 0.5
limit: 2
Who ordered the number 3? Its quite peculiar. I thought it was CPU quotas but my quota for CPU(all zones) is 64!
Would appreciate any thoughts on how i might figure whats happening. The error message is
which makes perfect sense since 2 nodes belong to the non-user part of the cluster and wont be scheduled on. The other message seems to reflect the inability of user pods to be scheduled on more than 3 nodes…
EDIT: should add I have one more cluster running in the same GCP project, but its not occupying more than 4 nodes…
Also, how could I artificially add “fake” user pods to debug?