Hi folks, I raised a github issue (https://github.com/jupyterhub/jupyterhub/issues/3317 about a particular problem we’re seeing running Jupyterhub on our AKS cluster. Any help would be greatly appreciated!
Single user pods sit Terminating for a long period of time (many hours) rather than actually terminating when users leave work in the evening. Users don’t appear to stop their jupyterhub session when they finish for the day, they generally close their tab and lock their workstations.
It looks like the Termination Grace Period defaults to 1, so I wouldn’t expect these workloads to continue running or hang when Terminating
They sit hanging for many hours which seems to cause issues when users start work the next day and try to login
How to reproduce
I’m unsure how to reproduce this but am happy to supply logs/configs - this seems to happen quite pretty much every night on our clusters.
When looking at the running pods every morning we see something similar to this:
kubectl get pod -n jupyter NAME READY STATUS RESTARTS AGE continuous-image-puller-2xlbd 1/1 Running 0 2d12h continuous-image-puller-mb667 1/1 Running 0 2d12h continuous-image-puller-zmtcf 1/1 Running 0 2d12h hub-97b844b97-8xq9j 1/1 Running 0 2d12h jupyter-<username> 0/1 Terminating 0 23h jupyter-<username> 0/1 Terminating 0 2d jupyter-<username> 1/1 Running 0 47h jupyter-<username> 1/1 Running 0 27m jupyter-<username> 0/1 Terminating 0 40h jupyter-<username> 1/1 Running 0 46h jupyter-<username> 0/1 Terminating 0 41h jupyter-<username> 1/1 Running 0 45m jupyter-<username> 1/1 Running 0 24h proxy-97d9d8f67-8qzbv 1/1 Running 0 2d12h user-placeholder-0 1/1 Running 0 7d12h user-placeholder-1 1/1 Running 0 7d12h user-placeholder-2 1/1 Running 0 7d12h user-placeholder-3 1/1 Running 0 7d12h user-placeholder-4 1/1 Running 0 7d12h user-placeholder-5 1/1 Running 0 7d12h user-placeholder-6 1/1 Running 0 7d12h user-placeholder-7 1/1 Running 0 7d12h user-placeholder-8 1/1 Running 0 7d12h user-placeholder-9 1/1 Running 0 7d12h
If I describe one of the affected pods I can see this one of example has been stuck Terminating for 30hours and is still hanging around according to Kubernetes.
kubectl describe pod jupyter-<username> -n jupyter Name: jupyter-<username> Namespace: jupyter Priority: 0 Priority Class Name: jupyterhub-default-priority Node: aks-jhubusers-21446219-vmss000009/172.21.5.5 Start Time: Tue, 22 Dec 2020 16:04:25 +0000 Labels: app=jupyterhub chart=jupyterhub-0.9.0 component=singleuser-server heritage=jupyterhub hub.jupyter.org/network-access-hub=true release=jupyterhub Annotations: hub.jupyter.org/username: <username> Status: Terminating (lasts 30h) Termination Grace Period: 1s IP: 10.244.7.59 IPs: IP: 10.244.7.59 Init Containers: block-cloud-metadata: Container ID: docker://20cb54e39c9622468bca46d236ca04510c9a1e3502978e446147c078506343bf Image: <registry>/jupyterhub/k8s-network-tools:0.9.0 Image ID: docker-pullable://<registry>/jupyterhub/k8s-network-tools@sha256:120056e52fef309132697d405683a91ce6e2e484b67b7fd4bd7ca5e7d1937a34 Port: <none> Host Port: <none> Command: iptables -A OUTPUT -d 169.254.169.254 -j DROP State: Terminated Exit Code: 0 Started: Mon, 01 Jan 0001 00:00:00 +0000 Finished: Mon, 01 Jan 0001 00:00:00 +0000 Ready: True Restart Count: 0 Environment: <none> Mounts: <none>
Your personal set up
- Kubernetes Version: AKS (Azure Kubernetes Service) 1.17.11
- JupyterHub Version: 2.1.4
- SingleUser image: jupyterhub/k8s-singleuser-sample:0.9.0
Happy to share this but I’m not sure there’s anything too helpful inside and it’s very large…
Attempting to retrieve logs from the singleuser pods while they’re in the Terminating state returns the following:
kubectl logs jupyter-<username> -n jupyter Error from server (BadRequest): container "notebook" in pod "jupyter-<username>" is terminated
Happy to post any hub container logs although obviously these are very verbose and it’s difficult to pinpoint a timeframe