Socket hang up issue during jupyterhub load test

Team,

Bug description

I have been getting socket hang-up errors when launching jupyterhub during the load test. Jupyterhub is launching without issues for a maximum of 4 users at a time, but when we increase users to 8, we are seeing socket hang-up issue.

Attaching the error:
06:12:12.515 [ConfigProxy] error: 503 GET /jhub/hub/health socket hang up
06:12:16.966 [ConfigProxy] error: 503 GET /jhub/hub/health socket hang up
06:12:17.549 [ConfigProxy] error: 503 GET /jhub/hub/spawn-pending/jhubuser14_5758 socket hang up
06:12:17.561 [ConfigProxy] error: 503 GET /jhub/user/jhubuser16_5758/oauth_callback socket hang up
06:12:17.629 [ConfigProxy] error: 503 GET /jhub/hub/spawn-pending/jhubuser13_5758 socket hang up
06:12:18.214 [ConfigProxy] error: 503 GET /jhub/hub/api/oauth2/authorize socket hang up

[E 2022-09-21 06:13:31.749 SingleUserLabApp zmqstream:447] Uncaught exception in ZMQStream callback
Traceback (most recent call last):
File “/opt/conda/envs/jhub/lib/python3.7/site-packages/zmq/eventloop/zmqstream.py”, line 444, in _run_callback
callback(*args, **kwargs)
File “/opt/conda/envs/jhub/lib/python3.7/site-packages/zmq/eventloop/zmqstream.py”, line 203, in
self.on_recv(lambda msg: callback(self, msg), copy=copy)
File “/opt/conda/envs/jhub/lib/python3.7/site-packages/jupyter_server/services/kernels/handlers.py”, line 552, in _on_zmq_reply
super(ZMQChannelsHandler, self)._on_zmq_reply(stream, msg)
File “/opt/conda/envs/jhub/lib/python3.7/site-packages/jupyter_server/base/zmqhandlers.py”, line 243, in _on_zmq_reply
self.write_message(msg, binary=isinstance(msg, bytes))
File “/opt/conda/envs/jhub/lib/python3.7/site-packages/tornado/websocket.py”, line 337, in write_message
raise WebSocketClosedError()
tornado.websocket.WebSocketClosedError
[E 2022-09-21 06:13:31.849 SingleUserLabApp zmqstream:474] Uncaught exception in zmqstream callback
Traceback (most recent call last):
File “/opt/conda/envs/jhub/lib/python3.7/site-packages/zmq/eventloop/zmqstream.py”, line 462, in _handle_events
self._handle_recv()
File “/opt/conda/envs/jhub/lib/python3.7/site-packages/zmq/eventloop/zmqstream.py”, line 492, in _handle_recv
self._run_callback(callback, msg)
File “/opt/conda/envs/jhub/lib/python3.7/site-packages/zmq/eventloop/zmqstream.py”, line 444, in _run_callback
callback(*args, **kwargs)
File “/opt/conda/envs/jhub/lib/python3.7/site-packages/zmq/eventloop/zmqstream.py”, line 203, in
self.on_recv(lambda msg: callback(self, msg), copy=copy)
File “/opt/conda/envs/jhub/lib/python3.7/site-packages/jupyter_server/services/kernels/handlers.py”, line 552, in _on_zmq_reply
super(ZMQChannelsHandler, self)._on_zmq_reply(stream, msg)
File “/opt/conda/envs/jhub/lib/python3.7/site-packages/jupyter_server/base/zmqhandlers.py”, line 243, in _on_zmq_reply
self.write_message(msg, binary=isinstance(msg, bytes))
File “/opt/conda/envs/jhub/lib/python3.7/site-packages/tornado/websocket.py”, line 337, in write_message
raise WebSocketClosedError()
tornado.websocket.WebSocketClosedError
[E 2022-09-21 06:13:31.850 SingleUserLabApp ioloop:761] Exception in callback functools.partial(<function ZMQStream._update_handler.. at 0x7f80b4f1e8c0>)
Traceback (most recent call last):
File “/opt/conda/envs/jhub/lib/python3.7/site-packages/tornado/ioloop.py”, line 741, in _run_callback
ret = callback()
File “/opt/conda/envs/jhub/lib/python3.7/site-packages/zmq/eventloop/zmqstream.py”, line 548, in
self.io_loop.add_callback(lambda : self._handle_events(self.socket, 0))
File “/opt/conda/envs/jhub/lib/python3.7/site-packages/zmq/eventloop/zmqstream.py”, line 462, in _handle_events
self._handle_recv()
File “/opt/conda/envs/jhub/lib/python3.7/site-packages/zmq/eventloop/zmqstream.py”, line 492, in _handle_recv
self._run_callback(callback, msg)
File “/opt/conda/envs/jhub/lib/python3.7/site-packages/zmq/eventloop/zmqstream.py”, line 444, in _run_callback
callback(*args, **kwargs)
File “/opt/conda/envs/jhub/lib/python3.7/site-packages/zmq/eventloop/zmqstream.py”, line 203, in
self.on_recv(lambda msg: callback(self, msg), copy=copy)
File “/opt/conda/envs/jhub/lib/python3.7/site-packages/jupyter_server/services/kernels/handlers.py”, line 552, in _on_zmq_reply
super(ZMQChannelsHandler, self)._on_zmq_reply(stream, msg)
File “/opt/conda/envs/jhub/lib/python3.7/site-packages/jupyter_server/base/zmqhandlers.py”, line 243, in _on_zmq_reply
self.write_message(msg, binary=isinstance(msg, bytes))
File “/opt/conda/envs/jhub/lib/python3.7/site-packages/tornado/websocket.py”, line 337, in write_message
raise WebSocketClosedError()
tornado.websocket.WebSocketClosedError

Expected behavior

The user should be able to launch jupyterhub for 100+ users.

Version:

  • jupyterhub=1.4.1
  • configurable-http-proxy=4.3.2
  • jupyterlab=3.0.14
  • python=3.7.10

Hi! Could you try upgrading to the latest version of all packages, in particular jupyterhub 3.0.0, configurable-http-proxy 4.5.3, jupyterlab 3.5.0, and all their dependencies?

If you still see the errors could you turn on debug logging in JupyterHub, and tell us about your JupyterHub setup, ideally enough for someone to reproduce your setup if necessary? Thanks!

1 Like

To echo the above, and I’m hardly the expert @manics is, but all these versions indeed seem rather outdated for performance testing, and its unlikely if there is a real problem with these versions, that they would be fixed on .x releases for them. Free software doesn’t mean free updates on old versions forever.

jupyterhub=1.4.1

No telling what’s happened along the way, but I doubt the team has been spinning their wheels in the ensuing two major versions.

configurable-http-proxy=4.3.2

Trying the go-based jupyterhub-traefik-proxy instead of the nodejs-based configurable-http-proxy may be worth a look.

python=3.7.10

Upgrading to a more recent base python would also give you some performance benefits… python 3.11 in particular claims some notable improvements, but most notably, a 3.7-based install will be out of security updates in less than a year.

jupyterlab=3.0.14

This one is actually debatable, but probably irrelevant to your current use case, unless your load test uses browsers in the loop… one thing you can ensure is either not having nodejs installed, at all, or configuring jupyterlab not to talk look for it with build_available.

@bollwyvl can you elaborate what do you meant by " configuring jupyterlab not to talk look for it with build_available " ?

Sorry, it’s camelCase, not snake_case:
https://jupyterlab.readthedocs.io/en/stable/user/directories.html#disabling-rebuild-checks