tony
September 1, 2023, 2:16pm
1
Hello All,
Need help figuring out why my deployment is sending all pods to the same node. I tested the cluster by deploying 28 nginx pods and they were evenly spread across the cluster. I tested bith v2.0.0 and v3.0.1 My config is:
debug:
enabled: true
scheduling:
userScheduler:
enabled: true
podPriority:
enabled: true
userPlaceholder:
enabled: true
replicas: 4
userPods:
nodeAffinity:
matchNodePurpose: require
cull:
enabled: true
timeout: 3600
every: 300
prePuller:
continuous:
enabled: false
hook:
enabled: false
I also tested by changing scheduling things. Any ideas?
Thanks!
manics
September 1, 2023, 4:37pm
2
The Z2JH user schedule tries to pack as many user pods as possible into the smallest number of nodes so that they can autoscale down:
Try disabling it to use the default K8s scheduler.
tony
September 1, 2023, 6:36pm
3
Hi Manics,
Thank you for the link. I have tried all combinations that I see of those settings. I even deleted the deployment between changes. Any other ideas or information I can post?
Thank you!
Tony
tony
September 1, 2023, 7:23pm
4
Also wanted to add that the documentation seems to say that I can use this:
singleuser:
schedulerStrategy: spread
But when upgrading I get this error:
main.newUpgradeCmd.func2
helm.sh/helm/v3/cmd/helm/upgrade.go:209
github.com/spf13/cobra.(*Command).execute
github.com/spf13/cobra@v1.6.1/command.go:916
github.com/spf13/cobra.(*Command).ExecuteC
github.com/spf13/cobra@v1.6.1/command.go:1044
github.com/spf13/cobra.(*Command).Execute
github.com/spf13/cobra@v1.6.1/command.go:968
main.main
helm.sh/helm/v3/cmd/helm/helm.go:83
runtime.main
runtime/proc.go:250
runtime.goexit
runtime/asm_amd64.s:1598
manics
September 1, 2023, 9:54pm
5
Which documentation are you looking at? This isn’t mentioned on Optimizations — Zero to JupyterHub with Kubernetes documentation
Have you tried looking at your K8s logs and events for your singleuser and other (nginx) pods, and comparing them, there may be clues as to why K8s has chosen particular nodes.
Can you show us the output of kubectl get pod <podname> -o yaml
for a singleuser and nginx pod for comparison?
tony
September 1, 2023, 10:26pm
6
manics:
Thank you again for looking at this.
Regarding the “schedulerStrategy: spread”, in desperation I was reading this:
https://test-zerotojh.readthedocs.io/en/edit-awseks/optimization.html
The “none” working pod:
apiVersion: v1
kind: Pod
metadata:
annotations:
cni.projectcalico.org/containerID: 3308882e8c3234a7032b8f4687113888429e1470fa13313ab5f2688ef3d22cac
cni.projectcalico.org/podIP: 10.244.4.172/32
cni.projectcalico.org/podIPs: 10.244.4.172/32
hub.jupyter.org/username: tony_cricelli
creationTimestamp: "2023-09-01T21:39:04Z"
labels:
app: jupyterhub
chart: jupyterhub-2.0.0
component: singleuser-server
heritage: jupyterhub
hub.jupyter.org/network-access-hub: "true"
hub.jupyter.org/servername: ""
hub.jupyter.org/username: tony-5fcricelli
release: ugba88
name: jupyter-tony-5fcricelli
namespace: ugba88
resourceVersion: "268926"
uid: 7cf7ad9e-c2ca-4661-a7de-f3750e7e56ea
spec:
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- preference:
matchExpressions:
- key: hub.jupyter.org/node-purpose
operator: In
values:
- user
weight: 100
automountServiceAccountToken: false
containers:
- env:
- name: CPU_GUARANTEE
value: "0.5"
- name: CPU_LIMIT
value: "2.0"
- name: JPY_API_TOKEN
value: 0fc02e85cf6747da85e243be5d634d04
- name: JUPYTERHUB_ACTIVITY_URL
value: http://hub:8081/hub/api/users/tony_cricelli/activity
- name: JUPYTERHUB_ADMIN_ACCESS
value: "1"
- name: JUPYTERHUB_API_TOKEN
value: 0fc02e85cf6747da85e243be5d634d04
- name: JUPYTERHUB_API_URL
value: http://hub:8081/hub/api
- name: JUPYTERHUB_BASE_URL
value: /
- name: JUPYTERHUB_CLIENT_ID
value: jupyterhub-user-tony_cricelli
- name: JUPYTERHUB_DEBUG
value: "1"
- name: JUPYTERHUB_DEFAULT_URL
value: /tree/
- name: JUPYTERHUB_HOST
- name: JUPYTERHUB_OAUTH_ACCESS_SCOPES
value: '["access:servers!server=tony_cricelli/", "access:servers!user=tony_cricelli"]'
- name: JUPYTERHUB_OAUTH_CALLBACK_URL
value: /user/tony_cricelli/oauth_callback
- name: JUPYTERHUB_OAUTH_CLIENT_ALLOWED_SCOPES
value: '[]'
- name: JUPYTERHUB_OAUTH_SCOPES
value: '["access:servers!server=tony_cricelli/", "access:servers!user=tony_cricelli"]'
- name: JUPYTERHUB_SERVER_NAME
- name: JUPYTERHUB_SERVICE_PREFIX
value: /user/tony_cricelli/
- name: JUPYTERHUB_SERVICE_URL
value: http://0.0.0.0:8888/user/tony_cricelli/
- name: JUPYTERHUB_SINGLEUSER_APP
value: notebook.notebookapp.NotebookApp
- name: JUPYTERHUB_USER
value: tony_cricelli
- name: JUPYTER_IMAGE
value: montereytony/ugba88:jup8-23-fall-v16
- name: JUPYTER_IMAGE_SPEC
value: montereytony/ugba88:jup8-23-fall-v16
- name: MEM_GUARANTEE
value: "1073741824"
- name: MEM_LIMIT
value: "6442450944"
image: montereytony/ugba88:jup8-23-fall-v16
imagePullPolicy: IfNotPresent
lifecycle:
postStart:
exec:
command:
- sh
- -c
- |
mkdir -p my-work; # /bin/sh /tmp/fixer.sh
name: notebook
ports:
- containerPort: 8888
name: notebook-port
protocol: TCP
resources:
limits:
cpu: "2"
memory: "6442450944"
requests:
cpu: 500m
memory: "1073741824"
securityContext:
allowPrivilegeEscalation: true
runAsUser: 1000
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /home/jovyan
name: home
subPath: homes/tony-5fcricelli
- mountPath: /home/jovyan/shared
name: jupyterhub-shared
readOnly: true
dnsPolicy: ClusterFirst
enableServiceLinks: true
initContainers:
- command:
- iptables
- -A
- OUTPUT
- -d
- 169.254.169.254
- -j
- DROP
image: jupyterhub/k8s-network-tools:2.0.0
imagePullPolicy: IfNotPresent
name: block-cloud-metadata
resources: {}
securityContext:
capabilities:
add:
- NET_ADMIN
privileged: true
runAsUser: 0
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
nodeName: jup5
preemptionPolicy: PreemptLowerPriority
priority: 0
priorityClassName: ugba88-default-priority
restartPolicy: OnFailure
schedulerName: default-scheduler
securityContext:
fsGroup: 100
serviceAccount: default
serviceAccountName: default
terminationGracePeriodSeconds: 30
tolerations:
- effect: NoSchedule
key: hub.jupyter.org/dedicated
operator: Equal
value: user
- effect: NoSchedule
key: hub.jupyter.org_dedicated
operator: Equal
value: user
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
tolerationSeconds: 300
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 300
volumes:
- name: home
persistentVolumeClaim:
claimName: ugba88-pvc
- name: jupyterhub-shared
persistentVolumeClaim:
claimName: ugba88-shared-pvc
status:
conditions:
- lastProbeTime: null
lastTransitionTime: "2023-09-01T21:39:05Z"
status: "True"
type: Initialized
- lastProbeTime: null
lastTransitionTime: "2023-09-01T21:39:06Z"
status: "True"
type: Ready
- lastProbeTime: null
lastTransitionTime: "2023-09-01T21:39:06Z"
status: "True"
type: ContainersReady
- lastProbeTime: null
lastTransitionTime: "2023-09-01T21:39:04Z"
status: "True"
type: PodScheduled
containerStatuses:
- containerID: containerd://c143d4357d2f2871c594f9a003ea902d14287e8f7d250d1ef356f2c7ea9cafbf
image: docker.io/montereytony/ugba88:jup8-23-fall-v16
imageID: docker.io/montereytony/ugba88@sha256:021792c506eb22f4c3560ca1e2e1994814f6dc925a7413ff7400a1884ec424a8
lastState: {}
name: notebook
ready: true
restartCount: 0
started: true
state:
running:
startedAt: "2023-09-01T21:39:05Z"
hostIP: 192.168.2.145
initContainerStatuses:
- containerID: containerd://0d07504ddd47b4256c9d84f357d0deaa6e21016364165c2b57c5d65c3502cd39
image: docker.io/jupyterhub/k8s-network-tools:2.0.0
imageID: docker.io/jupyterhub/k8s-network-tools@sha256:ab4172a025721495c0c65bd2a6165a6cd625bae39e0e5231c06e149c2ffc5dab
lastState: {}
name: block-cloud-metadata
ready: true
restartCount: 0
state:
terminated:
containerID: containerd://0d07504ddd47b4256c9d84f357d0deaa6e21016364165c2b57c5d65c3502cd39
exitCode: 0
finishedAt: "2023-09-01T21:39:04Z"
reason: Completed
startedAt: "2023-09-01T21:39:04Z"
phase: Running
podIP: 10.244.4.172
podIPs:
- ip: 10.244.4.172
qosClass: Burstable
startTime: "2023-09-01T21:39:04Z"
I did label my worker nodes with hub.jupyter.org/node-purpose=user
Here is the working deployment where the pods are evenly distributed across the nodes:
apiVersion: v1
kind: Pod
metadata:
annotations:
cni.projectcalico.org/containerID: e48b4342343612406f065d682ae87b34d9dff64a8090c33df7bb8080a787c2bf
cni.projectcalico.org/podIP: 10.244.4.209/32
cni.projectcalico.org/podIPs: 10.244.4.209/32
creationTimestamp: "2023-09-01T22:10:56Z"
generateName: nginx-deployment-6595874d85-
labels:
app: nginx
pod-template-hash: 6595874d85
name: nginx-deployment-6595874d85-44sg6
namespace: ugba88
ownerReferences:
- apiVersion: apps/v1
blockOwnerDeletion: true
controller: true
kind: ReplicaSet
name: nginx-deployment-6595874d85
uid: 980f0bdc-f270-41df-9b73-058994c5402b
resourceVersion: "274383"
uid: 58063796-8bbb-4a8b-a40d-b5c58bb231f9
spec:
containers:
- image: nginx:1.14.2
imagePullPolicy: IfNotPresent
name: nginx
ports:
- containerPort: 80
protocol: TCP
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: kube-api-access-8zl52
readOnly: true
dnsPolicy: ClusterFirst
enableServiceLinks: true
nodeName: jup5
preemptionPolicy: PreemptLowerPriority
priority: 0
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: default
serviceAccountName: default
terminationGracePeriodSeconds: 30
tolerations:
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
tolerationSeconds: 300
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 300
volumes:
- name: kube-api-access-8zl52
projected:
defaultMode: 420
sources:
- serviceAccountToken:
expirationSeconds: 3607
path: token
- configMap:
items:
- key: ca.crt
path: ca.crt
name: kube-root-ca.crt
- downwardAPI:
items:
- fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
path: namespace
status:
conditions:
- lastProbeTime: null
lastTransitionTime: "2023-09-01T22:10:56Z"
status: "True"
type: Initialized
- lastProbeTime: null
lastTransitionTime: "2023-09-01T22:11:02Z"
status: "True"
type: Ready
- lastProbeTime: null
lastTransitionTime: "2023-09-01T22:11:02Z"
status: "True"
type: ContainersReady
- lastProbeTime: null
lastTransitionTime: "2023-09-01T22:10:56Z"
status: "True"
type: PodScheduled
containerStatuses:
- containerID: containerd://a647ead1c00fe93e4bb2656edd2da2c39a6bc803468119c5df5cbca8cfd760b3
image: docker.io/library/nginx:1.14.2
imageID: docker.io/library/nginx@sha256:f7988fb6c02e0ce69257d9bd9cf37ae20a60f1df7563c3a2a6abe24160306b8d
lastState: {}
name: nginx
ready: true
restartCount: 0
started: true
state:
running:
startedAt: "2023-09-01T22:11:02Z"
hostIP: 192.168.2.145
phase: Running
podIP: 10.244.4.209
podIPs:
- ip: 10.244.4.209
qosClass: BestEffort
startTime: "2023-09-01T22:10:56Z"
Here is a pending pod that should have been scheduled on a different node:
apiVersion: v1
kind: Pod
metadata:
annotations:
hub.jupyter.org/username: xxxxxxxxx
creationTimestamp: "2023-09-01T22:19:53Z"
labels:
app: jupyterhub
chart: jupyterhub-2.0.0
component: singleuser-server
heritage: jupyterhub
hub.jupyter.org/network-access-hub: "true"
hub.jupyter.org/servername: ""
hub.jupyter.org/username: xxxxxxxxx
release: ugba88
name: jupyter-xxxxxxxx
namespace: ugba88
resourceVersion: "275786"
uid: 62c3019b-18ce-4a62-b907-1c2fbff34551
spec:
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- preference:
matchExpressions:
- key: hub.jupyter.org/node-purpose
operator: In
values:
- user
weight: 100
automountServiceAccountToken: false
containers:
- env:
- name: CPU_GUARANTEE
value: "0.5"
- name: CPU_LIMIT
value: "2.0"
- name: JPY_API_TOKEN
value: fb7020912758439dbc9964babfde146b
- name: JUPYTERHUB_ACTIVITY_URL
value: http://hub:8081/hub/api/users/xxxxxxxxx/activity
- name: JUPYTERHUB_ADMIN_ACCESS
value: "1"
- name: JUPYTERHUB_API_TOKEN
value: fb7020912758439dbc9964babfde146b
- name: JUPYTERHUB_API_URL
value: http://hub:8081/hub/api
- name: JUPYTERHUB_BASE_URL
value: /
- name: JUPYTERHUB_CLIENT_ID
value: jupyterhub-user-sarikapasumarthy
- name: JUPYTERHUB_DEBUG
value: "1"
- name: JUPYTERHUB_DEFAULT_URL
value: /tree/
- name: JUPYTERHUB_HOST
- name: JUPYTERHUB_OAUTH_ACCESS_SCOPES
value: '["access:servers!server=xxxxxxx/", "access:servers!user=xxxxxxxxx"]'
- name: JUPYTERHUB_OAUTH_CALLBACK_URL
value: /user/xxxxxxx/oauth_callback
- name: JUPYTERHUB_OAUTH_CLIENT_ALLOWED_SCOPES
value: '[]'
- name: JUPYTERHUB_OAUTH_SCOPES
value: '["access:servers!server=xxxxxxx/", "access:servers!user=xxxxxxx"]'
- name: JUPYTERHUB_SERVER_NAME
- name: JUPYTERHUB_SERVICE_PREFIX
value: /user/xxxxxx/
- name: JUPYTERHUB_SERVICE_URL
value: http://0.0.0.0:8888/user/xxxxxx/
- name: JUPYTERHUB_SINGLEUSER_APP
value: notebook.notebookapp.NotebookApp
- name: JUPYTERHUB_USER
value: xxxxx
- name: JUPYTER_IMAGE
value: montereytony/ugba88:jup8-23-fall-v16
- name: JUPYTER_IMAGE_SPEC
value: montereytony/ugba88:jup8-23-fall-v16
- name: MEM_GUARANTEE
value: "1073741824"
- name: MEM_LIMIT
value: "6442450944"
image: montereytony/ugba88:jup8-23-fall-v16
imagePullPolicy: IfNotPresent
lifecycle:
postStart:
exec:
command:
- sh
- -c
- |
mkdir -p my-work; # /bin/sh /tmp/fixer.sh
name: notebook
ports:
- containerPort: 8888
name: notebook-port
protocol: TCP
resources:
limits:
cpu: "2"
memory: "6442450944"
requests:
cpu: 500m
memory: "1073741824"
securityContext:
allowPrivilegeEscalation: true
runAsUser: 1000
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /home/jovyan
name: home
subPath: homes/xxxxxx
- mountPath: /home/jovyan/shared
name: jupyterhub-shared
readOnly: true
dnsPolicy: ClusterFirst
enableServiceLinks: true
initContainers:
- command:
- iptables
- -A
- OUTPUT
- -d
- 169.254.169.254
- -j
- DROP
image: jupyterhub/k8s-network-tools:2.0.0
imagePullPolicy: IfNotPresent
name: block-cloud-metadata
resources: {}
securityContext:
capabilities:
add:
- NET_ADMIN
privileged: true
runAsUser: 0
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
preemptionPolicy: PreemptLowerPriority
priority: 0
priorityClassName: ugba88-default-priority
restartPolicy: OnFailure
schedulerName: default-scheduler
securityContext:
fsGroup: 100
serviceAccount: default
serviceAccountName: default
terminationGracePeriodSeconds: 30
tolerations:
- effect: NoSchedule
key: hub.jupyter.org/dedicated
operator: Equal
value: user
- effect: NoSchedule
key: hub.jupyter.org_dedicated
operator: Equal
value: user
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
tolerationSeconds: 300
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 300
volumes:
- name: home
persistentVolumeClaim:
claimName: ugba88-pvc
- name: jupyterhub-shared
persistentVolumeClaim:
claimName: ugba88-shared-pvc
status:
conditions:
- lastProbeTime: null
lastTransitionTime: "2023-09-01T22:19:53Z"
message: '0/8 nodes are available: 1 Insufficient cpu, 1 node(s) had untolerated
taint {node-role.kubernetes.io/master: }, 6 node(s) had volume node affinity
conflict. preemption: 0/8 nodes are available: 1 No preemption victims found
for incoming pod, 7 Preemption is not helpful for scheduling.'
reason: Unschedulable
status: "False"
type: PodScheduled
phase: Pending
qosClass: Burstable
All pods are in the same namespace.
Here is the output of kubectl get pods that shows the jupyterhub pods on jup5 and the nginx spread across other nodes:
jupyter-xxnran 1/1 Running 0 31m 10.244.4.199 jup5 <none> <none>
jupyter-xxuntsier 1/1 Running 0 31m 10.244.4.174 jup5 <none> <none>
jupyter-xxora05 1/1 Running 0 31m 10.244.4.200 jup5 <none> <none>
jupyter-siyabudddev 1/1 Running 0 31m 10.244.4.204 jup5 <none> <none>
jupyter-xxcratesj-2xxsorio 1/1 Running 0 31m 10.244.4.185 jup5 <none> <none>
jupyter-xxota 1/1 Running 0 31m 10.244.4.183 jup5 <none> <none>
jupyter-xxny-5fcxxcelli 1/1 Running 0 32m 10.244.4.172 jup5 <none> <none>
jupyter-xxnsh2004 1/1 Running 0 33m 10.244.4.171 jup5 <none> <none>
jupyter-xxxinia-2exu 1/1 Running 0 31m 10.244.4.208 jup5 <none> <none>
jupyter-xxxxyons10 1/1 Running 0 31m 10.244.4.195 jup5 <none> <none>
jupyter-xxxxisura 1/1 Running 0 31m 10.244.4.201 jup5 <none> <none>
jupyter-xxxxanli 1/1 Running 0 31m 10.244.4.198 jup5 <none> <none>
nginx-deployment-6595874d85-44sg6 1/1 Running 0 11s 10.244.4.209 jup5 <none> <none>
nginx-deployment-6595874d85-56r7x 1/1 Running 0 11s 10.244.5.71 jup6 <none> <none>
nginx-deployment-6595874d85-6bjtj 1/1 Running 0 12s 10.244.2.28 jup3 <none> <none>
nginx-deployment-6595874d85-6d78p 1/1 Running 0 11s 10.244.1.37 jup2 <none> <none>
nginx-deployment-6595874d85-77nwc 1/1 Running 0 11s 10.244.1.38 jup2 <none> <none>
nginx-deployment-6595874d85-7r8l7 1/1 Running 0 11s 10.244.5.68 jup6 <none> <none>
nginx-deployment-6595874d85-7sqqh 1/1 Running 0 11s 10.244.1.36 jup2 <none> <none>
nginx-deployment-6595874d85-88v5r 1/1 Running 0 11s 10.244.6.21 jup7 <none> <none>
nginx-deployment-6595874d85-8m79n 1/1 Running 0 11s 10.244.3.35 jup4 <none> <none>
nginx-deployment-6595874d85-9zwsw 1/1 Running 0 11s 10.244.7.31 jup9 <none> <none>
nginx-deployment-6595874d85-g2mpc 1/1 Running 0 11s 10.244.5.70 jup6 <none> <none>
nginx-deployment-6595874d85-gndmr 1/1 Running 0 11s 10.244.3.34 jup4 <none> <none>
nginx-deployment-6595874d85-gplrg 1/1 Running 0 11s 10.244.7.32 jup9 <none> <none>
nginx-deployment-6595874d85-kztvb 1/1 Running 0 11s 10.244.2.30 jup3 <none> <none>
nginx-deployment-6595874d85-mgx4p 1/1 Running 0 12s 10.244.6.19 jup7 <none> <none>
nginx-deployment-6595874d85-msqsl 1/1 Running 0 11s 10.244.4.210 jup5 <none> <none>
nginx-deployment-6595874d85-rbsbp 1/1 Running 0 11s 10.244.2.29 jup3 <none> <none>
nginx-deployment-6595874d85-rcvkw 1/1 Running 0 12s 10.244.7.30 jup9 <none> <none>
nginx-deployment-6595874d85-rk9bj 1/1 Running 0 11s 10.244.3.33 jup4 <none> <none>
nginx-deployment-6595874d85-tp9kw 1/1 Running 0 11s 10.244.6.20 jup7 <none> <none>
nginx-deployment-6595874d85-vdxpr 1/1 Running 0 11s 10.244.5.69 jup6 <none> <none>
proxy-6bc5f57fd7-9g6nf 1/1 Running 0 34m 10.244.3.32 jup4 <none> <none>
manics
September 3, 2023, 3:27pm
7
How many nodes are labelled with this?
The default value scheduling.userPods.nodeAffinity.matchNodePurpose="prefer"
The JupyterHub Helm chart is configurable by values in your config.yaml. In this way, you can extend user resources, build off of different Docker images, manage security and authentication, and mo...
should mean the pods are spread over all nodes with that label. Try setting it to ignore
, or alternatively remove the hub.jupyter.org/node-purpose=user
label from your node(s)?
tony:
status:
conditions:
- lastProbeTime: null
lastTransitionTime: "2023-09-01T22:19:53Z"
message: '0/8 nodes are available: 1 Insufficient cpu, 1 node(s) had untolerated
taint {node-role.kubernetes.io/master: }, 6 node(s) had volume node affinity
conflict. preemption: 0/8 nodes are available: 1 No preemption victims found
for incoming pod, 7 Preemption is not helpful for scheduling.'
This might also be a problem, how is your dynamic storage setup? Some storage controllers create volumes that are tied to a single node.
tony
September 3, 2023, 8:14pm
8
I have 7 nodes with that label. I just tested with ignore and also removing hub.jupyter.org/node-purpose=user with no joy. I think you are onto something with the storage. I will test that next. it is pretty much the only thing I have not looked at. Thanks again!
tony
September 4, 2023, 12:46am
9
I think you are correct it is the storage, but I am not able to figure it out. First I tried
singleuser:
storage:
type: none
I also tried
singleuser:
storage:
dynamic:
storageClass: ugba88-2-sc
Each time I started up 150 users and they always go to the same node.
I have NFS mounted common storage mounted on all the nodes. So I assume since the storage is “local” I can just point to it.
I defined a storage class:
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: ugba88-2-sc
provisioner: kubernetes.io/no-provisioner
parameters:
# The path to the local storage on the node.
local: /mnt/ist/jhub-stor/2023/fall/ugba88/
Then I defined a pv:
apiVersion: v1
kind: PersistentVolume
metadata:
name: ugba88-2-pv
spec:
capacity:
storage: 300Gi
volumeMode: Filesystem
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Delete
storageClassName: ugba88-2-sc
local:
path: /mnt/ist/jhub-stor/2023-fall/ugba88
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- jup2
- jup3
- jup4
- jup5
- jup6
- jup7
- jup8
- jup9
and I defined a pvc
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: ugba88-2-pvc
spec:
storageClassName: ugba88-2-sc
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
Is there a better way to do this? Many years ago I was able to use HostPath directory in the jupyterhub yaml, that was less complicated
tony
September 4, 2023, 4:54pm
10
I think the problem maybe related to the Access Modes. I am going to destroy everything and rebuild making sure the modes are all ReadWriteMany on the storage.
tony
September 5, 2023, 1:17pm
11
This issue is solved. Thank you for the advice and suggestions. I made the following changes:
Redefined the storageclass, PV and PVCs and made sure all storage was set to ReadWriteMany
Removed node label: hub.jupyter.org/node-purpose=user
Changed my config to:
scheduling:
userScheduler:
enabled: false
userPods:
nodeAffinity:
# matchNodePurpose valid options:
# - ignore
# - prefer (the default)
# - require
matchNodePurpose: ignore
corePods:
nodeAffinity:
matchNodePurpose: ignore
I then did a start all in the control panel and 150 hubs were launched and spread evenly across my nodes.
Thank you again!
Tony
1 Like