Hi @all,
Since few days now I’m facing this issue (Spawn failed: Could not create pod pod-name/jupyter-user) when launching pods. At time its works but most of the time it doesn’t work and it’s worrying me.
Here is me configuration:
# fullnameOverride and nameOverride distinguishes blank strings, null values,
# and non-blank strings. For more details, see the configuration reference.
fullnameOverride: "jupyter"
nameOverride: "jupyter"
# custom can contain anything you want to pass to the hub pod, as all passed
# Helm template values will be made available there.
custom: {}
# imagePullSecret is configuration to create a k8s Secret that Helm chart's pods
# can get credentials from to pull their images.
imagePullSecret:
create: false
automaticReferenceInjection: true
registry: #xxxxxxxxxxxxxxxxxx
username: #kubernetes
password: #xxxxxxxxxxxxx
email: #xxxxxxxxxxxxx
# imagePullSecrets is configuration to reference the k8s Secret resources the
# Helm chart's pods can get credentials from to pull their images.
imagePullSecrets: []
# hub relates to the hub pod, responsible for running JupyterHub, its configured
# Authenticator class KubeSpawner, and its configured Proxy class
# ConfigurableHTTPProxy. KubeSpawner creates the user pods, and
# ConfigurableHTTPProxy speaks with the actual ConfigurableHTTPProxy server in
# the proxy pod.
hub:
config:
Authenticator:
admin_users:
- chrys
JupyterHub:
admin_access: true
authenticator_class: dummy
service:
type: ClusterIP
annotations: {}
ports:
nodePort:
extraPorts: []
loadBalancerIP:
baseUrl: /
cookieSecret:
initContainers: []
fsGid: 1000
nodeSelector: {}
tolerations: []
concurrentSpawnLimit: 64
consecutiveFailureLimit: 5
activeServerLimit:
deploymentStrategy:
## type: Recreate
## - sqlite-pvc backed hubs require the Recreate deployment strategy as a
## typical PVC storage can only be bound to one pod at the time.
## - JupyterHub isn't designed to support being run in parallell. More work
## needs to be done in JupyterHub itself for a fully highly available (HA)
## deployment of JupyterHub on k8s is to be possible.
type: Recreate
db:
type: sqlite-pvc
upgrade:
pvc:
annotations: {}
selector: {}
accessModes:
- ReadWriteOnce
storage: 1Gi
subPath:
storageClassName:
url:
password:
labels: {}
annotations: {cluster-autoscaler.kubernetes.io/safe-to-evict: 'true'}
command: []
args: []
extraConfig: {}
extraFiles: {}
extraEnv: {}
extraContainers: []
extraVolumes: []
extraVolumeMounts: []
image:
name: jupyterhub/k8s-hub
tag: "1.2.0"
pullPolicy:
pullSecrets: []
resources: {}
containerSecurityContext:
runAsUser: 1000
runAsGroup: 1000
allowPrivilegeEscalation: false
lifecycle: {}
services: {}
pdb:
enabled: false
maxUnavailable:
minAvailable: 1
networkPolicy:
enabled: true
ingress: []
## egress for JupyterHub already includes Kubernetes internal DNS and
## access to the proxy, but can be restricted further, but ensure to allow
## access to the Kubernetes API server that couldn't be pinned ahead of
## time.
##
## ref: https://stackoverflow.com/a/59016417/2220152
egress:
- to:
- ipBlock:
cidr: 0.0.0.0/0
interNamespaceAccessLabels: ignore
allowedIngressPorts: []
allowNamedServers: false
namedServerLimitPerUser:
authenticatePrometheus:
redirectToServer:
shutdownOnLogout:
templatePaths: []
templateVars: {}
livenessProbe:
# The livenessProbe's aim to give JupyterHub sufficient time to startup but
# be able to restart if it becomes unresponsive for ~5 min.
enabled: true
initialDelaySeconds: 300
periodSeconds: 10
failureThreshold: 30
timeoutSeconds: 3
readinessProbe:
# The readinessProbe's aim is to provide a successful startup indication,
# but following that never become unready before its livenessProbe fail and
# restarts it if needed. To become unready following startup serves no
# purpose as there are no other pod to fallback to in our non-HA deployment.
enabled: true
initialDelaySeconds: 0
periodSeconds: 2
failureThreshold: 1000
timeoutSeconds: 1
existingSecret:
serviceAccount:
annotations: {}
extraPodSpec: {}
rbac:
enabled: true
# proxy relates to the proxy pod, the proxy-public service, and the autohttps
# pod and proxy-http service.
proxy:
secretToken:
annotations: {cluster-autoscaler.kubernetes.io/safe-to-evict: 'true'}
deploymentStrategy:
## type: Recreate
## - JupyterHub's interaction with the CHP proxy becomes a lot more robust
## with this configuration. To understand this, consider that JupyterHub
## during startup will interact a lot with the k8s service to reach a
## ready proxy pod. If the hub pod during a helm upgrade is restarting
## directly while the proxy pod is making a rolling upgrade, the hub pod
## could end up running a sequence of interactions with the old proxy pod
## and finishing up the sequence of interactions with the new proxy pod.
## As CHP proxy pods carry individual state this is very error prone. One
## outcome when not using Recreate as a strategy has been that user pods
## have been deleted by the hub pod because it considered them unreachable
## as it only configured the old proxy pod but not the new before trying
## to reach them.
type: Recreate
## rollingUpdate:
## - WARNING:
## This is required to be set explicitly blank! Without it being
## explicitly blank, k8s will let eventual old values under rollingUpdate
## remain and then the Deployment becomes invalid and a helm upgrade would
## fail with an error like this:
##
## UPGRADE FAILED
## Error: Deployment.apps "proxy" is invalid: spec.strategy.rollingUpdate: Forbidden: may not be specified when strategy `type` is 'Recreate'
## Error: UPGRADE FAILED: Deployment.apps "proxy" is invalid: spec.strategy.rollingUpdate: Forbidden: may not be specified when strategy `type` is 'Recreate'
rollingUpdate:
# service relates to the proxy-public service
service:
type: ClusterIP
labels: {}
annotations: {cluster-autoscaler.kubernetes.io/safe-to-evict: 'true'}
nodePorts:
http:
https:
disableHttpPort: false
extraPorts: []
loadBalancerIP:
loadBalancerSourceRanges: []
# chp relates to the proxy pod, which is responsible for routing traffic based
# on dynamic configuration sent from JupyterHub to CHP's REST API.
chp:
containerSecurityContext:
runAsUser: 65534 # nobody user
runAsGroup: 65534 # nobody group
allowPrivilegeEscalation: false
image:
name: jupyterhub/configurable-http-proxy
tag: 4.5.0 # https://github.com/jupyterhub/configurable-http-proxy/releases
pullPolicy:
pullSecrets: []
extraCommandLineFlags: []
livenessProbe:
enabled: true
initialDelaySeconds: 60
periodSeconds: 10
readinessProbe:
enabled: true
initialDelaySeconds: 0
periodSeconds: 2
failureThreshold: 1000
resources: {}
defaultTarget:
errorTarget:
extraEnv: {}
nodeSelector: {}
tolerations: []
networkPolicy:
enabled: true
ingress: []
egress:
- to:
- ipBlock:
cidr: 0.0.0.0/0
interNamespaceAccessLabels: ignore
allowedIngressPorts: [http, https]
pdb:
enabled: false
maxUnavailable:
minAvailable: 1
extraPodSpec: {}
# traefik relates to the autohttps pod, which is responsible for TLS
# termination when proxy.https.type=letsencrypt.
traefik:
containerSecurityContext:
runAsUser: 65534 # nobody user
runAsGroup: 65534 # nobody group
allowPrivilegeEscalation: false
image:
name: traefik
tag: v2.4.11 # ref: https://hub.docker.com/_/traefik?tab=tags
pullPolicy:
pullSecrets: []
hsts:
includeSubdomains: false
preload: false
maxAge: 15724800 # About 6 months
resources: {}
labels: {}
extraEnv: {}
extraVolumes: []
extraVolumeMounts: []
extraStaticConfig: {}
extraDynamicConfig: {}
nodeSelector: {}
tolerations: []
extraPorts: []
networkPolicy:
enabled: true
ingress: []
egress:
- to:
- ipBlock:
cidr: 0.0.0.0/0
interNamespaceAccessLabels: ignore
allowedIngressPorts: [http, https]
pdb:
enabled: false
maxUnavailable:
minAvailable: 1
serviceAccount: #
annotations: {}
extraPodSpec: {}
secretSync:
containerSecurityContext:
runAsUser: 65534 # nobody user
runAsGroup: 65534 # nobody group
allowPrivilegeEscalation: false
image:
name: jupyterhub/k8s-secret-sync
tag: "1.2.0"
pullPolicy:
pullSecrets: []
resources: {}
labels: {}
https:
enabled: false
type: letsencrypt
#type: letsencrypt, manual, offload, secret
letsencrypt:
contactEmail:
# Specify custom server here (https://acme-staging-v02.api.letsencrypt.org/directory) to hit staging LE
acmeServer: https://acme-v02.api.letsencrypt.org/directory
manual:
key:
cert:
secret:
name:
key: tls.key
crt: tls.crt
hosts: []
# singleuser relates to the configuration of KubeSpawner which runs in the hub
# pod, and its spawning of user pods such as jupyter-myusername.
singleuser:
podNameTemplate:
extraTolerations: []
nodeSelector: {}
extraNodeAffinity:
required: []
preferred: []
extraPodAffinity:
required: []
preferred: []
extraPodAntiAffinity:
required: []
preferred: []
networkTools:
image:
name: jupyterhub/k8s-network-tools
tag: "1.2.0"
pullPolicy:
pullSecrets: []
cloudMetadata:
# block set to true will append a privileged initContainer using the
# iptables to block the sensitive metadata server at the provided ip.
blockWithIptables: true
ip: 169.254.169.254
networkPolicy:
enabled: true
ingress: []
egress:
# Required egress to communicate with the hub and DNS servers will be
# augmented to these egress rules.
#
# This default rule explicitly allows all outbound traffic from singleuser
# pods, except to a typical IP used to return metadata that can be used by
# someone with malicious intent.
- to:
- ipBlock:
cidr: 0.0.0.0/0
except:
- 169.254.169.254/32
interNamespaceAccessLabels: ignore
allowedIngressPorts: []
events: true
extraAnnotations: {}
extraLabels:
hub.jupyter.org/network-access-hub: "true"
extraFiles: {}
extraEnv:
JUPYTERHUB_SINGLEUSER_APP: "jupyter_server.serverapp.ServerApp"
lifecycleHooks:
postStart:
exec:
command: ["/bin/sh", "-c", "echo Hello from overridden handler > $HOME/message"]
initContainers: []
extraContainers: []
uid: 1000
fsGid: 100
serviceAccountName:
storage:
type: dynamic
extraLabels: {}
extraVolumes: []
extraVolumeMounts: []
static:
pvcName:
subPath: "{username}"
capacity: 20Gi
#homeMountPath: /home/jovyan
homeMountPath: /home/{username}
dynamic:
storageClass:
pvcNameTemplate: claim-{username}{servername}
volumeNameTemplate: volume-{username}{servername}
storageAccessModes: [ReadWriteOnce]
image:
name: jupyterhub/k8s-singleuser-sample
tag: "1.2.0"
pullPolicy:
pullSecrets:
- "gitlab-docker-registry"
startTimeout: 300
cpu:
limit:
guarantee:
memory:
limit:
guarantee:
extraResource:
limits: {}
guarantees: {}
cmd: jupyterhub-singleuser
defaultUrl: "/lab"
extraPodConfig: {}
profileList:
- display_name: "Default environment"
description: "Image de base de jupyter-lab"
default: true
- display_name: "Wiremind mantis"
description: "Wiremind image"
kubespawner_override:
image: docker.wiremind.fr/wiremind/data-science/mantis/cpu-jupyter/master:latest
# - display_name: "Wiremind custom"
# description: "Wiremind custom image"
# kubespawner_override:
# image: docker.wiremind.fr/wiremind/data-science/mantis/cpu-jupyter/custom:latest
# scheduling relates to the user-scheduler pods and user-placeholder pods.
scheduling:
userScheduler:
enabled: true
replicas: 2
logLevel: 4
# plugins ref: https://kubernetes.io/docs/reference/scheduling/config/#scheduling-plugins-1
plugins:
score:
disabled:
- name: SelectorSpread
- name: TaintToleration
- name: PodTopologySpread
- name: NodeResourcesBalancedAllocation
- name: NodeResourcesLeastAllocated
# Disable plugins to be allowed to enable them again with a different
# weight and avoid an error.
- name: NodePreferAvoidPods
- name: NodeAffinity
- name: InterPodAffinity
- name: ImageLocality
enabled:
- name: NodePreferAvoidPods
weight: 161051
- name: NodeAffinity
weight: 14631
- name: InterPodAffinity
weight: 1331
- name: NodeResourcesMostAllocated
weight: 121
- name: ImageLocality
weight: 11
containerSecurityContext:
runAsUser: 65534 # nobody user
runAsGroup: 65534 # nobody group
allowPrivilegeEscalation: false
image:
# IMPORTANT: Bumping the minor version of this binary should go hand in
# hand with an inspection of the user-scheduelrs RBAC resources
# that we have forked.
name: k8s.gcr.io/kube-scheduler
tag: v1.19.13 # ref: https://github.com/kubernetes/website/blob/7a5dc9eb8d2df8c41f52280c3010dbc2bce19523/content/en/releases/patch-releases.md
pullPolicy:
pullSecrets: []
nodeSelector: {}
tolerations: []
pdb:
enabled: true
maxUnavailable: 1
minAvailable:
resources: {}
serviceAccount:
annotations: {}
extraPodSpec: {}
podPriority:
enabled: false
globalDefault: false
defaultPriority: 0
userPlaceholderPriority: -10
userPlaceholder:
enabled: true
image:
name: k8s.gcr.io/pause
# tag's can be updated by inspecting the output of the command:
# gcloud container images list-tags k8s.gcr.io/pause --sort-by=~tags
#
# If you update this, also update prePuller.pause.image.tag
tag: "3.5"
pullPolicy:
pullSecrets: []
replicas: 0
containerSecurityContext:
runAsUser: 65534 # nobody user
runAsGroup: 65534 # nobody group
allowPrivilegeEscalation: false
resources: {}
corePods:
tolerations:
- key: hub.jupyter.org/dedicated
operator: Equal
value: core
effect: NoSchedule
- key: hub.jupyter.org_dedicated
operator: Equal
value: core
effect: NoSchedule
nodeAffinity:
matchNodePurpose: prefer
userPods:
tolerations:
- key: hub.jupyter.org/dedicated
operator: Equal
value: user
effect: NoSchedule
- key: hub.jupyter.org_dedicated
operator: Equal
value: user
effect: NoSchedule
nodeAffinity:
matchNodePurpose: prefer
# prePuller relates to the hook|continuous-image-puller DaemonsSets
prePuller:
annotations: {cluster-autoscaler.kubernetes.io/safe-to-evict: 'true'}
resources: {}
containerSecurityContext:
runAsUser: 65534 # nobody user
runAsGroup: 65534 # nobody group
allowPrivilegeEscalation: false
extraTolerations: []
# hook relates to the hook-image-awaiter Job and hook-image-puller DaemonSet
hook:
enabled: true
pullOnlyOnChanges: true
# image and the configuration below relates to the hook-image-awaiter Job
image:
name: jupyterhub/k8s-image-awaiter
tag: "1.2.0"
pullPolicy:
pullSecrets: []
containerSecurityContext:
runAsUser: 65534 # nobody user
runAsGroup: 65534 # nobody group
allowPrivilegeEscalation: false
podSchedulingWaitDuration: 10
nodeSelector: {}
tolerations: []
resources: {}
serviceAccount:
annotations: {}
continuous:
enabled: true
pullProfileListImages: true
extraImages: {}
pause:
containerSecurityContext:
runAsUser: 65534 # nobody user
runAsGroup: 65534 # nobody group
allowPrivilegeEscalation: false
image:
name: k8s.gcr.io/pause
# tag's can be updated by inspecting the output of the command:
# gcloud container images list-tags k8s.gcr.io/pause --sort-by=~tags
#
# If you update this, also update scheduling.userPlaceholder.image.tag
tag: "3.5"
pullPolicy:
pullSecrets: []
ingress:
enabled: false
annotations: {cluster-autoscaler.kubernetes.io/safe-to-evict: 'true'}
hosts: []
pathSuffix:
pathType: Prefix
tls: []
# cull relates to the jupyterhub-idle-culler service, responsible for evicting
# inactive singleuser pods.
#
# The configuration below, except for enabled, corresponds to command-line flags
# for jupyterhub-idle-culler as documented here:
# https://github.com/jupyterhub/jupyterhub-idle-culler#as-a-standalone-script
#
cull:
enabled: true
users: false # --cull-users
removeNamedServers: false # --remove-named-servers
timeout: 3600 # --timeout
every: 600 # --cull-every
concurrency: 10 # --concurrency
maxAge: 0 # --max-age
debug:
enabled: false
global:
safeToShowValues: false
Here is my issue: