Hi folks, I raised a github issue (https://github.com/jupyterhub/jupyterhub/issues/3317 about a particular problem we’re seeing running Jupyterhub on our AKS cluster. Any help would be greatly appreciated!
Issue description
Single user pods sit Terminating for a long period of time (many hours) rather than actually terminating when users leave work in the evening. Users don’t appear to stop their jupyterhub session when they finish for the day, they generally close their tab and lock their workstations.
Expected behaviour
It looks like the Termination Grace Period defaults to 1, so I wouldn’t expect these workloads to continue running or hang when Terminating
Actual behaviour
They sit hanging for many hours which seems to cause issues when users start work the next day and try to login
How to reproduce
I’m unsure how to reproduce this but am happy to supply logs/configs - this seems to happen quite pretty much every night on our clusters.
When looking at the running pods every morning we see something similar to this:
kubectl get pod -n jupyter
NAME READY STATUS RESTARTS AGE
continuous-image-puller-2xlbd 1/1 Running 0 2d12h
continuous-image-puller-mb667 1/1 Running 0 2d12h
continuous-image-puller-zmtcf 1/1 Running 0 2d12h
hub-97b844b97-8xq9j 1/1 Running 0 2d12h
jupyter-<username> 0/1 Terminating 0 23h
jupyter-<username> 0/1 Terminating 0 2d
jupyter-<username> 1/1 Running 0 47h
jupyter-<username> 1/1 Running 0 27m
jupyter-<username> 0/1 Terminating 0 40h
jupyter-<username> 1/1 Running 0 46h
jupyter-<username> 0/1 Terminating 0 41h
jupyter-<username> 1/1 Running 0 45m
jupyter-<username> 1/1 Running 0 24h
proxy-97d9d8f67-8qzbv 1/1 Running 0 2d12h
user-placeholder-0 1/1 Running 0 7d12h
user-placeholder-1 1/1 Running 0 7d12h
user-placeholder-2 1/1 Running 0 7d12h
user-placeholder-3 1/1 Running 0 7d12h
user-placeholder-4 1/1 Running 0 7d12h
user-placeholder-5 1/1 Running 0 7d12h
user-placeholder-6 1/1 Running 0 7d12h
user-placeholder-7 1/1 Running 0 7d12h
user-placeholder-8 1/1 Running 0 7d12h
user-placeholder-9 1/1 Running 0 7d12h
If I describe one of the affected pods I can see this one of example has been stuck Terminating for 30hours and is still hanging around according to Kubernetes.
kubectl describe pod jupyter-<username> -n jupyter
Name: jupyter-<username>
Namespace: jupyter
Priority: 0
Priority Class Name: jupyterhub-default-priority
Node: aks-jhubusers-21446219-vmss000009/172.21.5.5
Start Time: Tue, 22 Dec 2020 16:04:25 +0000
Labels: app=jupyterhub
chart=jupyterhub-0.9.0
component=singleuser-server
heritage=jupyterhub
hub.jupyter.org/network-access-hub=true
release=jupyterhub
Annotations: hub.jupyter.org/username: <username>
Status: Terminating (lasts 30h)
Termination Grace Period: 1s
IP: 10.244.7.59
IPs:
IP: 10.244.7.59
Init Containers:
block-cloud-metadata:
Container ID: docker://20cb54e39c9622468bca46d236ca04510c9a1e3502978e446147c078506343bf
Image: <registry>/jupyterhub/k8s-network-tools:0.9.0
Image ID: docker-pullable://<registry>/jupyterhub/k8s-network-tools@sha256:120056e52fef309132697d405683a91ce6e2e484b67b7fd4bd7ca5e7d1937a34
Port: <none>
Host Port: <none>
Command:
iptables
-A
OUTPUT
-d
169.254.169.254
-j
DROP
State: Terminated
Exit Code: 0
Started: Mon, 01 Jan 0001 00:00:00 +0000
Finished: Mon, 01 Jan 0001 00:00:00 +0000
Ready: True
Restart Count: 0
Environment: <none>
Mounts: <none>
Your personal set up
- Kubernetes Version: AKS (Azure Kubernetes Service) 1.17.11
- JupyterHub Version: 2.1.4
- SingleUser image: jupyterhub/k8s-singleuser-sample:0.9.0
Configuration
Happy to share this but I’m not sure there’s anything too helpful inside and it’s very large…
Logs
Attempting to retrieve logs from the singleuser pods while they’re in the Terminating state returns the following:
kubectl logs jupyter-<username> -n jupyter
Error from server (BadRequest): container "notebook" in pod "jupyter-<username>" is terminated
Happy to post any hub container logs although obviously these are very verbose and it’s difficult to pinpoint a timeframe