Singleuser containers stuck Terminating

Hi folks, I raised a github issue (https://github.com/jupyterhub/jupyterhub/issues/3317 about a particular problem we’re seeing running Jupyterhub on our AKS cluster. Any help would be greatly appreciated!

Issue description

Single user pods sit Terminating for a long period of time (many hours) rather than actually terminating when users leave work in the evening. Users don’t appear to stop their jupyterhub session when they finish for the day, they generally close their tab and lock their workstations.

Expected behaviour

It looks like the Termination Grace Period defaults to 1, so I wouldn’t expect these workloads to continue running or hang when Terminating

Actual behaviour

They sit hanging for many hours which seems to cause issues when users start work the next day and try to login

How to reproduce

I’m unsure how to reproduce this but am happy to supply logs/configs - this seems to happen quite pretty much every night on our clusters.

When looking at the running pods every morning we see something similar to this:

kubectl get pod -n jupyter
NAME                                   READY   STATUS        RESTARTS   AGE
continuous-image-puller-2xlbd          1/1     Running       0          2d12h
continuous-image-puller-mb667          1/1     Running       0          2d12h
continuous-image-puller-zmtcf          1/1     Running       0          2d12h
hub-97b844b97-8xq9j                    1/1     Running       0          2d12h
jupyter-<username>                     0/1     Terminating   0          23h
jupyter-<username>                     0/1     Terminating   0          2d
jupyter-<username>                     1/1     Running       0          47h
jupyter-<username>                     1/1     Running       0          27m
jupyter-<username>                     0/1     Terminating   0          40h
jupyter-<username>                     1/1     Running       0          46h
jupyter-<username>                     0/1     Terminating   0          41h
jupyter-<username>                     1/1     Running       0          45m
jupyter-<username>                     1/1     Running       0          24h
proxy-97d9d8f67-8qzbv                  1/1     Running       0          2d12h
user-placeholder-0                     1/1     Running       0          7d12h
user-placeholder-1                     1/1     Running       0          7d12h
user-placeholder-2                     1/1     Running       0          7d12h
user-placeholder-3                     1/1     Running       0          7d12h
user-placeholder-4                     1/1     Running       0          7d12h
user-placeholder-5                     1/1     Running       0          7d12h
user-placeholder-6                     1/1     Running       0          7d12h
user-placeholder-7                     1/1     Running       0          7d12h
user-placeholder-8                     1/1     Running       0          7d12h
user-placeholder-9                     1/1     Running       0          7d12h

If I describe one of the affected pods I can see this one of example has been stuck Terminating for 30hours and is still hanging around according to Kubernetes.

kubectl describe pod jupyter-<username> -n jupyter
Name:                      jupyter-<username>
Namespace:                 jupyter
Priority:                  0
Priority Class Name:       jupyterhub-default-priority
Node:                      aks-jhubusers-21446219-vmss000009/172.21.5.5
Start Time:                Tue, 22 Dec 2020 16:04:25 +0000
Labels:                    app=jupyterhub
                           chart=jupyterhub-0.9.0
                           component=singleuser-server
                           heritage=jupyterhub
                           hub.jupyter.org/network-access-hub=true
                           release=jupyterhub
Annotations:               hub.jupyter.org/username: <username>
Status:                    Terminating (lasts 30h)
Termination Grace Period:  1s
IP:                        10.244.7.59
IPs:
  IP:  10.244.7.59
Init Containers:
  block-cloud-metadata:
    Container ID:  docker://20cb54e39c9622468bca46d236ca04510c9a1e3502978e446147c078506343bf
    Image:         <registry>/jupyterhub/k8s-network-tools:0.9.0
    Image ID:      docker-pullable://<registry>/jupyterhub/k8s-network-tools@sha256:120056e52fef309132697d405683a91ce6e2e484b67b7fd4bd7ca5e7d1937a34
    Port:          <none>
    Host Port:     <none>
    Command:
      iptables
      -A
      OUTPUT
      -d
      169.254.169.254
      -j
      DROP
    State:          Terminated
      Exit Code:    0
      Started:      Mon, 01 Jan 0001 00:00:00 +0000
      Finished:     Mon, 01 Jan 0001 00:00:00 +0000
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:         <none>

Your personal set up

  • Kubernetes Version: AKS (Azure Kubernetes Service) 1.17.11
  • JupyterHub Version: 2.1.4
  • SingleUser image: jupyterhub/k8s-singleuser-sample:0.9.0

Configuration

Happy to share this but I’m not sure there’s anything too helpful inside and it’s very large…

Logs

Attempting to retrieve logs from the singleuser pods while they’re in the Terminating state returns the following:

kubectl logs jupyter-<username> -n jupyter
Error from server (BadRequest): container "notebook" in pod "jupyter-<username>" is terminated

Happy to post any hub container logs although obviously these are very verbose and it’s difficult to pinpoint a timeframe

Bumping this post…