Proxy loses track of singleuser servers after k8s restarts them


#1

Hey, over at PAWS we had an weird issue. We use NFS for storage and an NFS outage cause the pods to error out.
K8S restarted the user pods but the chp proxy kept pointing to the old pods. Soon we had new users being able to user jupyterhub, but the ones that had active servers got 503s.
Restarting the user servers solved it, but restarting the hub or the proxy did not.
Am I expecting wrongly that those routes should be updated or do we have a bug?


#2

JupyterHub assumes that user servers don’t move while they are running. I think KubeSpawner.poll() only checks that the pod exists, it doesn’t check if the URL changed. I suspect the best way to ensure this is to set restartPolicy: Never to ensure that pods are never restarted by kubernetes in a new location. I think we need to figure out exactly why Kubernetes restarted the pods with a new URL, since that shouldn’t happen. If they had to be restarted, they should have been left as terminated for JupyterHub to deal with.


#3

Thanks for the reply. It seems this should be part of the z2jh chart (maybe it is, we’re using 0.6.0). I’ll create a task and keep this in mind for the next update.


#4

Ah, if you’re on 0.6 there’s a very good chance this is already fixed in the chart/kubespawner since then. I wouldn’t spend time debugging until you can reproduce it with current versions of things.