Z2jh configurable-http-proxy blocking responses from remote kernel manager

Hi all, thanks for the great work on this project.
I am encountering issues using configurable-http-proxy (CHP) as a reverse proxy in the context of a z2jh deployment.
The high-level symptom is that singleuser pods are not able to connect via websocket to kernels managed by an instance of Jupyter Kernel Gateway hosted outside the cluster (separate EC2 instance).

I suspect that this is an instance of Cannot proxy websocket from Gtk Broadway server · Issue #244 · jupyterhub/configurable-http-proxy · GitHub, possibly caused by CHP’s handling of upgrade headers.
The kernel gateway service sends a 101 response with the Upgrade: websocket header to upgrade to a websocket connection with the client in the singleuser pod, and I suspect that CHP is blocking this request due to CORS, causing the remote kernel to disconnect.
In the kernel gateway logs, we see that it’s receiving messages from the client, responding with a 101 to upgrade the connection, and sending a series of "msg_type": "kernel_info_request" messages, as expected.

[I 240603 04:08:31 web:2348] 101 GET /api/kernels/00033da8-c158-45e3-ae84-525eab0459e7/channels (<z2jh_public_ip>) 1.72ms
[D 2024-06-03 04:08:31.795 KernelGatewayApp] Opening websocket /api/kernels/00033da8-c158-45e3-ae84-525eab0459e7/channels
[I 2024-06-03 04:08:31.795 KernelGatewayApp] Connecting to kernel 00033da8-c158-45e3-ae84-525eab0459e7.
[D 2024-06-03 04:08:31.795 KernelGatewayApp] Getting buffer for 00033da8-c158-45e3-ae84-525eab0459e7
[D 2024-06-03 04:08:31.795 KernelGatewayApp] Connecting to: tcp://127.0.0.1:59901
[D 2024-06-03 04:08:31.796 KernelGatewayApp] Connecting to: tcp://127.0.0.1:49513
[D 2024-06-03 04:08:31.802 KernelGatewayApp] Connecting to: tcp://127.0.0.1:35339
[D 2024-06-03 04:08:31.803 KernelGatewayApp] Connecting to: tcp://127.0.0.1:42987
[D 2024-06-03 04:08:31.803 KernelGatewayApp] Connecting to: tcp://127.0.0.1:49513
[D 2024-06-03 04:08:31.804 KernelGatewayApp] Connecting to: tcp://127.0.0.1:35339
[D 2024-06-03 04:08:31.805 KernelGatewayApp] Nudge: attempt 1 on kernel 00033da8-c158-45e3-ae84-525eab0459e7
[D 2024-06-03 04:08:31.815 KernelGatewayApp] Nudge: shell info reply received: 00033da8-c158-45e3-ae84-525eab0459e7
[D 2024-06-03 04:08:31.815 KernelGatewayApp] Nudge: resolving shell future: 00033da8-c158-45e3-ae84-525eab0459e7
[D 2024-06-03 04:08:31.815 KernelGatewayApp] Nudge: control info reply received: 00033da8-c158-45e3-ae84-525eab0459e7
[D 2024-06-03 04:08:31.816 KernelGatewayApp] activity on 00033da8-c158-45e3-ae84-525eab0459e7: status (busy)
[D 2024-06-03 04:08:31.823 KernelGatewayApp] Nudge: IOPub received: 00033da8-c158-45e3-ae84-525eab0459e7
[D 2024-06-03 04:08:31.823 KernelGatewayApp] Nudge: resolving iopub future: 00033da8-c158-45e3-ae84-525eab0459e7
[D 2024-06-03 04:08:31.825 KernelGatewayApp] activity on 00033da8-c158-45e3-ae84-525eab0459e7: status (idle)
[D 2024-06-03 04:08:31.828 KernelGatewayApp] activity on 00033da8-c158-45e3-ae84-525eab0459e7: status (busy)
[D 2024-06-03 04:08:31.831 KernelGatewayApp] activity on 00033da8-c158-45e3-ae84-525eab0459e7: status (idle)

In the CHP logs, I see what I presume are kernel_info_request WS messages being proxied to singleuser pod 8888.

04:08:27.427 [ConfigProxy] debug: PROXY WS /user/ioc-user/api/kernels/00033da8-c158-45e3-ae84-525eab0459e7/channels to http://100.96.13.209:8888
04:08:27.434 [ConfigProxy] debug: PROXY WEB /user/ioc-user/api/kernelspecs to http://100.96.13.209:8888
04:08:27.498 [ConfigProxy] debug: PROXY WEB /user/ioc-user/api/contents to http://100.96.13.209:8888
04:08:27.521 [ConfigProxy] debug: PROXY WS /user/ioc-user/api/kernels/00033da8-c158-45e3-ae84-525eab0459e7/channels to http://100.96.13.209:8888
04:08:27.619 [ConfigProxy] debug: PROXY WS /user/ioc-user/api/kernels/00033da8-c158-45e3-ae84-525eab0459e7/channels to http://100.96.13.209:8888
04:08:27.966 [ConfigProxy] debug: PROXY WEB /user/ioc-user/lab/api/workspaces/auto-n to http://100.96.13.209:8888

However, the client does not appear to be receiving messages such as kernel status, with the Lab frontend showing a “Disconnected” kernel. The Lab client is able to communicate with the kernel gateway over HTTP (start, stop, list remote kernels, none of which appear in the CHP logs).
The kernel gateway is working outside of the z2jh context; I can start up a Lab process on my workstation and use the kernel gateway (both WS and HTTP) without issue.

Also of note: I can successfully connect to the websocket directly using a tool like websocat within a singleuser pod SSH session:

$ kubectl -n jhub-test exec --stdin --tty jupyter-ioc-2duser -- /bin/bash
(base) jovyan@jupyter-ioc-2duser:~$ wget https://github.com/vi/websocat/releases/download/v1.13.0/websocat.x86_64-unknown-linux-musl && chmod +x websocat.x86_64-unknown-linux-musl
# ...
2024-06-04 20:16:26 (104 MB/s) - ‘websocat.x86_64-unknown-linux-musl’ saved [7245864/7245864]
(base) jovyan@jupyter-ioc-2duser:~$ ./websocat.x86_64-unknown-linux-musl ws://172.20.xxx.xx:80/api/kernels/fc823348-e831-421c-96cd-952e66bfadb5/channels
{"header": {"msg_id": "abeefa6f-6c21f132c653009c8f69966b_4416_63", "msg_type": "status", "username": "username", "session": "abeefa6f-6c21f132c653009c8f69966b", "date": "2024-06-04T20:17:56.859588Z", "version": "5.3"}, "msg_id": "abeefa6f-6c21f132c653009c8f69966b_4416_63", "msg_type": "status", "parent_header": {"msg_id": "1ef06c03-fa94b85f953c04de64481974_6_0", "msg_type": "kernel_info_request", "username": "username", "session": "1ef06c03-fa94b85f953c04de64481974", "date": "2024-06-04T20:17:56.857324Z", "version": "5.3"}, "metadata": {}, "content": {"execution_state": "idle"}, "buffers": [], "channel": "iopub"}
{"header": {"msg_id": "abeefa6f-6c21f132c653009c8f69966b_4416_64", "msg_type": "status", "username": "username", "session": "abeefa6f-6c21f132c653009c8f69966b", "date": "2024-06-04T20:17:56.869675Z", "version": "5.3"}, "msg_id": "abeefa6f-6c21f132c653009c8f69966b_4416_64", "msg_type": "status", "parent_header": {"msg_id": "1ef06c03-fa94b85f953c04de64481974_6_1", "msg_type": "kernel_info_request", "username": "username", "session": "1ef06c03-fa94b85f953c04de64481974", "date": "2024-06-04T20:17:56.857469Z", "version": "5.3"}, "metadata": {}, "content": {"execution_state": "busy"}, "buffers": [], "channel": "iopub"}
{"header": {"msg_id": "abeefa6f-6c21f132c653009c8f69966b_4416_66", "msg_type": "status", "username": "username", "session": "abeefa6f-6c21f132c653009c8f69966b", "date": "2024-06-04T20:17:56.870617Z", "version": "5.3"}, "msg_id": "abeefa6f-6c21f132c653009c8f69966b_4416_66", "msg_type": "status", "parent_header": {"msg_id": "1ef06c03-fa94b85f953c04de64481974_6_1", "msg_type": "kernel_info_request", "username": "username", "session": "1ef06c03-fa94b85f953c04de64481974", "date": "2024-06-04T20:17:56.857469Z", "version": "5.3"}, "metadata": {}, "content": {"execution_state": "idle"}, "buffers": [], "channel": "iopub"}
^C

which bypasses the CHP, and rules out NetworkPolicy and AWS security group issues. None of the websocat messages appear in the CHP logs.

I spent a few days scratching my head, until I discovered a promising lead related to CORS.

Possibly Related CORS Issue

I also tried configuring CHP as a forward proxy to kernel gateway, as a convenience so that singleuser pods could set a --GatewayClient.url without hardcoding the gateway IP.
Incidentally, I found that the CHP blocks HTTP responses from the kernel gateway when deployed on a different origin.
If I try to add a route to the kernel gateway (serving on http://172.20.xxx.xx:80/api) on a different origin, the CHP returns a 404 response; this is not the case when the kernel gateway is on the same origin as the CHP, e.g. in a docker compose stack.

$ kubectl -n jhub-test exec -it ${PROXY_POD_NAME} -- \
  curl -v -H 'Authorization: token xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx' \
  -X POST -d '{"target":"http://172.20.xxx.xx:80"}' \
  http://proxy-api:8001/api/routes/kernel_gateway
Note: Unnecessary use of -X or --request, POST is already inferred.
*   Trying 100.67.197.172:8001...
* Connected to proxy-api (100.67.197.172) port 8001
> POST /api/routes/kernel_gateway HTTP/1.1
> Host: proxy-api:8001
> User-Agent: curl/8.4.0
> Accept: */*
> Authorization: token xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
> Content-Length: 36
> Content-Type: application/x-www-form-urlencoded
>
< HTTP/1.1 201 Created
< Date: Wed, 05 Jun 2024 15:38:15 GMT
< Connection: keep-alive
< Keep-Alive: timeout=5
< Transfer-Encoding: chunked
<
* Connection #0 to host proxy-api left intact

$ kubectl -n jhub-test exec -it ${PROXY_POD_NAME} -- \
  curl -v -H 'Authorization: token xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx' \
  http://proxy-api:8001/api/routes
*   Trying 100.67.197.172:8001...
* Connected to proxy-api (100.67.197.172) port 8001
> GET /api/routes HTTP/1.1
> Host: proxy-api:8001
> User-Agent: curl/8.4.0
> Accept: */*
> Authorization: token xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
>
< HTTP/1.1 200 OK
< Content-Type: application/json
< Date: Wed, 05 Jun 2024 15:54:13 GMT
< Connection: keep-alive
< Keep-Alive: timeout=5
< Transfer-Encoding: chunked
<
* Connection #0 to host proxy-api left intact
{"/":{"hub":true,"target":"http://hub:8081","jupyterhub":true,"last_activity":"2024-06-04T19:53:45.136Z"},"/kernel_gateway":{"target":"http://172.20.xxx.xx:80","last_activity":"2024-06-05T15:38:15.574Z"}}

$ kubectl -n jhub-test exec -it ${PROXY_POD_NAME} -- \
  curl -v \
  http://172.20.xxx.xx:80/api
*   Trying 172.20.xxx.xx:80...
* Connected to 172.20.xxx.xx (172.20.xxx.xx) port 80
> GET /api HTTP/1.1
> Host: 172.20.xxx.xx
> User-Agent: curl/8.4.0
> Accept: */*
>
< HTTP/1.1 200 OK
< Server: TornadoServer/6.4
< Content-Type: application/json
< Date: Wed, 05 Jun 2024 15:38:21 GMT
< X-Content-Type-Options: nosniff
< Access-Control-Allow-Origin: *
< Access-Control-Allow-Credentials: true
< Access-Control-Allow-Headers: *
< Access-Control-Allow-Methods: *
< Access-Control-Expose-Headers: *
< Etag: "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
< Content-Length: 21
< Set-Cookie: username-172-20-xxx-xx=2|1:0|10:xxxxxxxxxx|22:username-172-20-xxx-xx|144:xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx|xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx; expires=Fri, 05 Jul 2024 15:38:21 GMT; HttpOnly; Path=/
<
* Connection #0 to host 172.20.xxx.xx left intact
{"version": "2.13.0"}

$ kubectl -n jhub-test exec -it ${PROXY_POD_NAME} -- \
  curl -v \
  http://proxy-public:80/kernel_gateway
*   Trying 100.69.59.1:80...
* Connected to proxy-public (100.69.59.1) port 80
> GET /kernel_gateway HTTP/1.1
> Host: proxy-public
> User-Agent: curl/8.4.0
> Accept: */*
>
< HTTP/1.1 404 Not Found
< server: TornadoServer/6.4
< content-type: application/json
< date: Wed, 05 Jun 2024 15:38:36 GMT
< content-length: 38
< connection: keep-alive
<
* Connection #0 to host proxy-public left intact
{"reason": "Not Found", "message": ""}

$ kubectl -n jhub-test exec -it ${PROXY_POD_NAME} -- \
  curl -v -H 'Authorization: token xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx' \
  http://proxy-api:8001/api/routes/kernel_gateway
*   Trying 100.67.197.172:8001...
* Connected to proxy-api (100.67.197.172) port 8001
> GET /api/routes/kernel_gateway HTTP/1.1
> Host: proxy-api:8001
> User-Agent: curl/8.4.0
> Accept: */*
> Authorization: token xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
>
< HTTP/1.1 200 OK
< Content-Type: application/json
< Date: Wed, 05 Jun 2024 15:42:09 GMT
< Connection: keep-alive
< Keep-Alive: timeout=5
< Transfer-Encoding: chunked
<
* Connection #0 to host proxy-api left intact
{"target":"http://172.20.xxx.xx:80","last_activity":"2024-06-05T15:38:15.574Z"}

$ kubectl -n jhub-test exec -it ${PROXY_POD_NAME} -- \
  curl -v \
  http://proxy-public:80/
*   Trying 100.69.59.1:80...
* Connected to proxy-public (100.69.59.1) port 80
> GET / HTTP/1.1
> Host: proxy-public
> User-Agent: curl/8.4.0
> Accept: */*
>
< HTTP/1.1 302 Found
< server: TornadoServer/6.3.3
< content-type: text/html
< date: Wed, 05 Jun 2024 15:53:43 GMT
< disable_check_xsrf: True
< access-control-allow-origin: *
< x-jupyterhub-version: 4.0.2
< access-control-allow-headers: accept, content-type, authorization
< location: /hub/
< content-length: 0
< connection: keep-alive
<
* Connection #0 to host proxy-public left intact

On the same origin (e.g. in a docker compose stack), the kernel gateway can communicate with a Lab client without issue, and the forward proxy works without 404 responses, which is why I think these issues are related.

I discovered that this is an instance of websocket subprotocol negotiation when using gateway · Issue #1310 · jupyter-server/jupyter_server · GitHub and is not related to CORS. The error was fixed by implementing the workaround described in the ticket --GatewayWebSocketConnection.kernel_ws_protocol="".

1 Like