A bug in the z2jh 3.0 (The JupyterHub helm chart) using KubeSpawner 6.0 has resulted in running users being disrupted whenever the hub
pod starts up, for example when upgrading to z2jh 3.0 or re-configuring the chart causing the pod to restart.
Deployment impacted
This bug was introduced in the 3.0.0 release, and is patched in the 3.1.0 release. The bug was also part of the 3.0.0-alpha.1
pre-release, and development version releases 3.0.0-0.dev.git.6133.hbfc583f8
and later.
Symptoms
Users’ perspective
This kind of disruption as seen by a JupyterHub user is to get redirected from /user/<username>
where they typically work back to /hub
where they depending on the JupyterHub’s configuration would be prompted to start a server again or end up automatically start a server.
Admins’ perspective
Whenever the hub
pod restarts with a bug affected version, you may see that the /hub/admin
panel reports user servers as stopped even though you see running user server pods in Kubernetes.
When user servers are inactive, they are typically automatically stopped by jupyterhub-idle-culler
that is enabled by default in z2jh (cull.enabled
), but this won’t work in this case as JupyterHub considers the servers stopped already.
Cleanup orphaned user server pods
If you have been using z2jh 3.0.0-alpha.1 to 3.0.3, you should check for orphaned user server pods that JupyterHub doesn’t consider running.
Using @minrk’s Python script
@minrk has written a Python script found in this gist to cleanup user servers. It can be be run by a user with administrative access to Kubernetes and JupyterHub itself.
From a computer with Python and kubectl
with to access the Kubernetes cluster with a JupyterHub installed, do the following:
- Download Min’s script from kube_orphans.py · GitHub
- Visit https://your-jupyterhub.example.org/hub/token and request a token with short lived access duration
- Set the environment variable
JUPYTERHUB_API_TOKEN
to the token from the previous step. Note that requested permissions needed is to read information about all users, which admin has but not non-admin users. - Configure and verify access to the Kubernetes cluster where the JupyterHub is running
- Run the script and pass it the url to your hub and the k8s namespace via the
--namespace
flag
Practically on a mac or linux computer, this can look like this:
# 1. Download script
wget https://gist.githubusercontent.com/minrk/e15653520847746e643a6ca5e48d3949/raw/1698d0bd6949b16e0f99c84c305c84aa667c5e7f/kube_orphans.py
# 2. Request an API token from /hub/token
# 3. Set environment variable for use by script
export JUPYTERHUB_API_TOKEN=1234567890abcdef1234567890abcdef
# 4. Verify you can work against the k8s cluster and it seems to be the right namespace
kubectl get all --namespace <namespace>
# 5. Run the script
python kube_orphans.py --namespace <namespace> https://your-jupyterhub.example.org
The script should now have printed information and a kubectl delete pod
command you can run listing all orphaned pods. Copy it, add --namespace <namespace>
to it, and then run it to delete all the orphaned pods detected by the script.
Using a helm config
I’ve adjusted Min’s script to run inside the hub
pod by using a JupyterHub chart config file, and to not ask to delete the detected orphaned servers on startup. This can be useful if you manage several JupyterHub’s with shared configuration files for example.
# 1. Download JupyterHub chart config addition
wget https://gist.githubusercontent.com/consideRatio/7b5b8e65f0e90b3c56b5eff3a4038560/raw/fa9b314d78e85ea335847b3d38d698afa1173366/cleanup-service.values.yaml
# 2. Verify that the chart config file is nested correctly, its made to work
# assuming the jupyterhub chart isn't a chart dependency. If you have a
# helm chart that in turn depends on the jupyterhub chart, you would need
# to nest the configuration for example.
# 3. Perform an chart upgrade referencing the chart config addition
helm upgrade <...> --values cleanup-service.values.yaml
# 4. Get the hub pod's logs
kubectl logs deploy/hub
# 5. Look for log lines like these
# INFO:/tmp/cleanup-orphaned-pods.py:Found 1 active user servers according to JupyterHub
# INFO:/tmp/cleanup-orphaned-pods.py:Found 1 active user server pods according to Kubernetes
# INFO:/tmp/cleanup-orphaned-pods.py:0 user server pods are orphaned
# INFO:/tmp/cleanup-orphaned-pods.py:Cleanup of orphaned pods complete.
# 6. Perform a chart upgrade without the cleanup service
helm upgrade <...>