Unresponsive event loop leads to crash

Hey there,

After running JupyterHub for several months now, we just noticed the following warnings with high (>5s) durations:

[W 2025-10-14 11:57:17.713 JupyterHub metrics:404] Event loop was unresponsive for at least 1.33s!
[W 2025-10-14 11:59:09.961 JupyterHub metrics:404] Event loop was unresponsive for at least 7.73s!
[W 2025-10-14 11:59:13.949 JupyterHub metrics:404] Event loop was unresponsive for at least 3.94s!
[W 2025-10-14 12:01:43.139 JupyterHub metrics:404] Event loop was unresponsive for at least 1.32s!
[W 2025-10-14 12:17:24.924 JupyterHub metrics:404] Event loop was unresponsive for at least 5.59s!
[W 2025-10-14 12:25:53.979 JupyterHub metrics:404] Event loop was unresponsive for at least 8.69s!
[W 2025-10-14 12:25:57.142 JupyterHub metrics:404] Event loop was unresponsive for at least 3.11s!
[W 2025-10-14 12:26:17.224 JupyterHub metrics:404] Event loop was unresponsive for at least 1.01s!
[W 2025-10-14 12:26:19.054 JupyterHub metrics:404] Event loop was unresponsive for at least 1.78s!
[W 2025-10-14 12:42:19.750 JupyterHub metrics:404] Event loop was unresponsive for at least 6.51s!
[W 2025-10-14 12:42:25.384 JupyterHub metrics:404] Event loop was unresponsive for at least 5.58s!
[W 2025-10-14 12:42:32.950 JupyterHub metrics:404] Event loop was unresponsive for at least 7.52s!
[W 2025-10-14 13:06:49.393 JupyterHub metrics:404] Event loop was unresponsive for at least 18.45s!
[W 2025-10-14 13:06:57.693 JupyterHub metrics:404] Event loop was unresponsive for at least 4.69s!
[W 2025-10-14 13:07:00.573 JupyterHub metrics:404] Event loop was unresponsive for at least 2.83s!

Due to durations ranging from 40 to 120 seconds, the hub crashed twice (reason: Error, exit code: 137). We already tried to investigate the logs. However, since multiple errors occurred (e.g., [W 2025-10-14 11:34:21.951 JupyterHub proxy:944] api_request to the proxy failed with status code 599, retrying..., API requests from the culler timed out, hub-managed services take several seconds to respond, …), most likely because the event loop was unresponsive, it’s hard to find the cause.

So what does it mean that the event loop is unresponsive? And which factors influence the responsiveness?
We also see increased hub response latency during the warnings, but don’t know if that’s the cause or impact.

Best regards,
Paul

Hello @Paul2708

I remember these sort of messages on our JupyterHub deployment at my previous job. Which variant of proxy are you using: CHP or Traefik? In our case, I remember this behaviour after few months of uninterrupted running of JupyterHub and JupyterHub proxy services. CHP has a known memory leak issue and I also noticed couple of times that CHP has too many open file descriptors due to unclosed sockets.

Event loop unresponsive means it is “blocked” by some function call. In a regular scenario it can be a blocking function that does some heavy computation work. However, that wont be the case for JupyterHub as there no “compute intensive” tasks here. I assume the blocking is caused by some socket stuff here.

I recommend you to look into memory usage and open file descriptors of JupyterHub and CHP when this problem occurs next time!!

1 Like

Thank you for sharing your experiences! Indeed, we are using CHP as a proxy.

I just stumbled upon Memory leak in proxy? · Issue #388 · jupyterhub/configurable-http-proxy · GitHub, so bumping the proxy version may already solve the issue. Since the problem does not occur reliably, I’ll get back to the thread once the issue occurs again after updating the proxy.

2 Likes