Hey there,
After running JupyterHub for several months now, we just noticed the following warnings with high (>5s) durations:
[W 2025-10-14 11:57:17.713 JupyterHub metrics:404] Event loop was unresponsive for at least 1.33s!
[W 2025-10-14 11:59:09.961 JupyterHub metrics:404] Event loop was unresponsive for at least 7.73s!
[W 2025-10-14 11:59:13.949 JupyterHub metrics:404] Event loop was unresponsive for at least 3.94s!
[W 2025-10-14 12:01:43.139 JupyterHub metrics:404] Event loop was unresponsive for at least 1.32s!
[W 2025-10-14 12:17:24.924 JupyterHub metrics:404] Event loop was unresponsive for at least 5.59s!
[W 2025-10-14 12:25:53.979 JupyterHub metrics:404] Event loop was unresponsive for at least 8.69s!
[W 2025-10-14 12:25:57.142 JupyterHub metrics:404] Event loop was unresponsive for at least 3.11s!
[W 2025-10-14 12:26:17.224 JupyterHub metrics:404] Event loop was unresponsive for at least 1.01s!
[W 2025-10-14 12:26:19.054 JupyterHub metrics:404] Event loop was unresponsive for at least 1.78s!
[W 2025-10-14 12:42:19.750 JupyterHub metrics:404] Event loop was unresponsive for at least 6.51s!
[W 2025-10-14 12:42:25.384 JupyterHub metrics:404] Event loop was unresponsive for at least 5.58s!
[W 2025-10-14 12:42:32.950 JupyterHub metrics:404] Event loop was unresponsive for at least 7.52s!
[W 2025-10-14 13:06:49.393 JupyterHub metrics:404] Event loop was unresponsive for at least 18.45s!
[W 2025-10-14 13:06:57.693 JupyterHub metrics:404] Event loop was unresponsive for at least 4.69s!
[W 2025-10-14 13:07:00.573 JupyterHub metrics:404] Event loop was unresponsive for at least 2.83s!
Due to durations ranging from 40 to 120 seconds, the hub crashed twice (reason: Error, exit code: 137). We already tried to investigate the logs. However, since multiple errors occurred (e.g., [W 2025-10-14 11:34:21.951 JupyterHub proxy:944] api_request to the proxy failed with status code 599, retrying..., API requests from the culler timed out, hub-managed services take several seconds to respond, …), most likely because the event loop was unresponsive, it’s hard to find the cause.
So what does it mean that the event loop is unresponsive? And which factors influence the responsiveness?
We also see increased hub response latency during the warnings, but don’t know if that’s the cause or impact.
Best regards,
Paul