userPlaceholder Pods are stuck in pending state using Z2JK v1.2.0

Not able to get userPlaceholder pods are stuck in pending status. Not able to get them scheduled on any of the nodes in the node pool I designated for single user pods (e.g., jupyter notebook server pods).

Here is snippet of the values.yaml file. Help is greatly appreciated.

userPlaceholder:
enabled: true
image:
name: artifactory.us.bank-dns.com:5000/edai/data-science/jupyter/v1.20/usb/k8s.gcr.io/pause
# tag’s can be updated by inspecting the output of the command:
# gcloud container images list-tags k8s.gcr.io/pause --sort-by=~tags
#
# If you update this, also update prePuller.pause.image.tag
tag: “3.5-develop-20220204151423”
pullPolicy:
pullSecrets: []
replicas: 2
containerSecurityContext:
runAsUser: 65534 # nobody user
runAsGroup: 65534 # nobody group
allowPrivilegeEscalation: false
resources: {}
corePods:
tolerations:
- key: usb-edai-dedicated
operator: Equal
value: mlops-control-plane
effect: NoSchedule
nodeAffinity:
matchNodePurpose: prefer
userPods:
tolerations:
- effect: NoSchedule
key: hub.jupyter.org_dedicated
operator: Equal
value: user
- effect: NoSchedule
key: hub.jupyter.org/dedicated
operator: Equal
value: user
nodeAffinity:
matchNodePurpose: prefer

Here is a snippet of the generated userPlaceholder stateful set.

apiVersion: apps/v1
kind: StatefulSet
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{“apiVersion”:“apps/v1”,“kind”:“StatefulSet”,“metadata”:{“annotations”:{},“labels”:{“app”:“jupyterhub”,“chart”:“jupyterhub-1.2.0”,“component”:“user-placeholder”,“heritage”:“Helm”,“release”:“z2jk-v1.2.0-dev”},“name”:“user-placeholder”,“namespace”:“apl-edai-ml-ops-dev1”},“spec”:{“podManagementPolicy”:“Parallel”,“replicas”:2,“selector”:{“matchLabels”:{“app”:“jupyterhub”,“component”:“user-placeholder”,“release”:“z2jk-v1.2.0-dev”}},“serviceName”:“user-placeholder”,“template”:{“metadata”:{“labels”:{“app”:“jupyterhub”,“component”:“user-placeholder”,“release”:“z2jk-v1.2.0-dev”}},“spec”:{“affinity”:{“nodeAffinity”:{“preferredDuringSchedulingIgnoredDuringExecution”:[{“preference”:{“matchExpressions”:[{“key”:“hub.jupyter.org/node-purpose",“operator”:“In”,“values”:[“user”]}]},“weight”:100}]}},“automountServiceAccountToken”:false,“containers”:[{“image”:“artifactory.us.bank-dns.com:5000/edai/data-science/jupyter/v1.20/usb/k8s.gcr.io/pause:3.5-develop-20220204151423”,“name”:“pause”,“resources”:{“limits”:{“cpu”:2,“memory”:“8G”},“requests”:{“cpu”:1,“memory”:“2G”}},“securityContext”:{“allowPrivilegeEscalation”:false,“runAsGroup”:65534,“runAsUser”:65534}}],“imagePullSecrets”:[{“name”:“usb-artifactory-prod1”}],“nodeSelector”:{“usb-edai-dedicated”:“datascience-jupyter-user”},“schedulerName”:“z2jk-v1.2.0-dev-user-scheduler”,“terminationGracePeriodSeconds”:0,“tolerations”:[{“effect”:“NoSchedule”,“key”:“hub.jupyter.org_dedicated”,“operator”:“Equal”,“value”:“user”},{“effect”:“NoSchedule”,“key”:“hub.jupyter.org/dedicated”,“operator”:“Equal”,“value”:“user”},{“effect”:“NoSchedule”,“key”:“usb-edai-dedicated”,“operator”:“Equal”,“value”:"datascience-jupyter-user”}]}}}}
creationTimestamp: “2022-02-09T19:04:53Z”
generation: 3
labels:
app: jupyterhub
chart: jupyterhub-1.2.0
component: user-placeholder
heritage: Helm
release: z2jk-v1.2.0-dev
name: user-placeholder
namespace: apl-edai-ml-ops-dev1
resourceVersion: “241309308”
uid: e08cafa0-d59b-4863-9535-de07afea5f79
spec:
podManagementPolicy: Parallel
replicas: 2
revisionHistoryLimit: 10
selector:
matchLabels:
app: jupyterhub
component: user-placeholder
release: z2jk-v1.2.0-dev
serviceName: user-placeholder
template:
metadata:
annotations:
cattle.io/timestamp: “2022-02-09T19:12:49Z”
creationTimestamp: null
labels:
app: jupyterhub
component: user-placeholder
release: z2jk-v1.2.0-dev
spec:
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- preference:
matchExpressions:
- key: hub.jupyter.org/node-purpose
operator: In
values:
- user
weight: 100
automountServiceAccountToken: false
containers:
- image: artifactory.us.bank-dns.com:5000/edai/data-science/jupyter/v1.20/usb/k8s.gcr.io/pause:3.5-develop-20220204151423
imagePullPolicy: IfNotPresent
name: pause
resources:
limits:
cpu: “2”
memory: 8G
requests:
cpu: “1”
memory: 2G
securityContext:
allowPrivilegeEscalation: false
runAsGroup: 65534
runAsUser: 65534
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
dnsPolicy: ClusterFirst
imagePullSecrets:
- name: usb-artifactory-prod1
nodeSelector:
usb-edai-dedicated: datascience-jupyter-user
restartPolicy: Always
schedulerName: z2jk-v1.2.0-dev-user-scheduler
securityContext: {}
terminationGracePeriodSeconds: 0
tolerations:
- effect: NoSchedule
key: hub.jupyter.org_dedicated
operator: Equal
value: user
- effect: NoSchedule
key: hub.jupyter.org/dedicated
operator: Equal
value: user
- effect: NoSchedule
key: usb-edai-dedicated
operator: Equal
value: datascience-jupyter-user
updateStrategy:
rollingUpdate:
partition: 0
type: RollingUpdate
status:
collisionCount: 0
currentReplicas: 1
currentRevision: user-placeholder-8687f59746
observedGeneration: 3
replicas: 2
updateRevision: user-placeholder-755cfd58b5
updatedReplicas: 1

Seems like a bug somewhere in the Z2JK v1.2.0 Helm Charts.

I am able to run the Z2JK v1.2.0 docker images in a legacy helm chart (from Z2JK v0.10.6)

I am going to open a bug/issue on the Z2JK github site.

Figured out what the issue was.

The namespace where Z2JK is running has ISTIO side car injection turned on. (That is a organizational requirement).

ISTIO side car injection was enabled for the user-scheduler pods. This resulted in the user-scheduler pods being in an unhealthy state – which in turn prevented the user-placeholder pods and the single user pods from being scheduled.

Corrected the problem by adding the following annotation to the user-scheduler deployment Helm template.

sidecar.istio.io/inject: {{ .Values.custom.userScheduler.enableIstoSideCar | quote }}

And by adding the following Custom helm tags to the values.yaml file

custom:
userScheduler:
enableIstoSideCar: false