we have a Jupyterhub running with Kubernetes and the pods seems to die 3-4 dies a day on regular basis. It is not related to no-activity as pod often dies when I am in a middle of something. Please note that I don’t particularly use Jupyterhub directly, I just connect to the Jupyterhub pod using VSCode. One of the reason that I thought could cause this behavior: no activity of Jupyterhub when I connect my pod to directly to VSCode. Therefore, I just ran an infinite loop on the notebook in Jupyterhub. It seems to make things slightly better but doesn’t resolve things fully. Also, I would like to understand if this is an expected behavior.
Any help would be appreciated.
Have you tried enabled debug logging and checking the logs for the hub and singleuser-server in the lead up to the pod terminating? What version of Z2JH are you using, and can you show us your config? Do you have any monitoring of your K8s cluster to show resource usage?
Thanks. I haven’t given it a try as I am not the maintainer of the hub but I could suggest this to maintainers. Not sure how to check version of Z2JH but I have jupyterhub==2.3.1. Will get back to you with config after discussing with maintainers. Jupyterhub is deployed on a google-kubernetes-cluster, so we have some monitoring of resource usage.
but on the top of your head, what could lead to such behavior? Is it the excess memory usage. I read about the culling mechanism here (Pod containing spawned server dies regularly · Issue #1430 · jupyterhub/zero-to-jupyterhub-k8s · GitHub). Not sure if that’s the issue, but we have tried increased timeout of 3-4 hours.
It could be a lot of things! Do you see the same problem when using JupyterLab/notebook directly in a browser, without VSCode?
@manics Sorry for the late reply. Here is the error after seeing pod’s logs
[W 2023-07-13 10:43:58.900 SingleUserNotebookApp zmqhandlers:227] WebSocket ping timeout after 119991 ms.
There are many pods, is this issue that the pod named “hub-…” dies or a user pod named “jupyter-…” dies?
If its a user pod, then i suspect the jupyterhub-idle-culler could be involved, you would see that from logs in the “hub-…” named pod.
Including logs from the “hub-…” named pod when the issue has showed up is relevant.
You can disable jupyterhub-idle-culler running in the hub pod by configuring the chart “cull.enabled”, see Configuration Reference — Zero to JupyterHub with Kubernetes documentation