Jhub Proxy fails after kernel crash on single user pod

Hi all,

We have a JupyterHub service on Kubernetes cluster. We are using a lightly modified version of the z2jh setup and it is working great :slight_smile:

Unfortunately yesterday one of our users were running a task that caused his python kernel to crash. After this crash his pod became unusable due to an error in the Proxy (I believe). After the kernel crashes it says it will restart the kernel (but the kernel is actually never restarted), and then we see this error in the proxy logs:

07:12:45.988 [ConfigProxy] info: 200 GET /api/routes
07:13:45.989 [ConfigProxy] info: 200 GET /api/routes
07:14:20.080 [ConfigProxy] debug: PROXY WEB /user/g02557/api/terminals?1624605140065 to http://192.168.70.223:8888
07:14:23.057 [ConfigProxy] error: 503 GET /user/g02557/api/kernels/13b909a5-4cf6-4746-8d22-c662dbf5f081/channels?session_id=c1c1a78e-4720-4592-9423-027f1c7f6347 Error: connect ETIMEDOUT 192.168.70.223:8888
at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1145:16) {
errno: ‘ETIMEDOUT’,
code: ‘ETIMEDOUT’,
syscall: ‘connect’,
address: ‘192.168.70.223’,
port: 8888
}
07:14:23.066 [ConfigProxy] error: Uncaught Exception: write after end
07:14:23.067 [ConfigProxy] error: Error [ERR_STREAM_WRITE_AFTER_END]: write after end
at writeAfterEnd (_stream_writable.js:266:14)
at Socket.Writable.write (_stream_writable.js:315:5)
at IncomingMessage. (/srv/configurable-http-proxy/lib/configproxy.js:458:30)
at IncomingMessage.emit (events.js:314:20)
at IncomingMessage.Readable.read (_stream_readable.js:508:10)
at flow (stream_readable.js:1008:34)
at resume
(_stream_readable.js:989:3)
at processTicksAndRejections (internal/process/task_queues.js:84:21)
07:14:23.067 [ConfigProxy] error: Uncaught Exception: write after end
07:14:23.067 [ConfigProxy] error: Error [ERR_STREAM_WRITE_AFTER_END]: write after end
at writeAfterEnd (_stream_writable.js:266:14)
at Socket.Writable.write (_stream_writable.js:315:5)
at IncomingMessage. (/srv/configurable-http-proxy/lib/configproxy.js:458:30)
at IncomingMessage.emit (events.js:314:20)
at addChunk (_stream_readable.js:298:12)
at readableAddChunk (_stream_readable.js:273:9)
at IncomingMessage.Readable.push (_stream_readable.js:214:10)
at HTTPParser.parserOnBody (_http_common.js:135:24)
at Socket.socketOnData (_http_client.js:475:22)
at Socket.emit (events.js:314:20)

After this error occurs, then the user can’t access his pod (it becomes unresponsive), and eventually throws a “Bad Gateway” error. We then need to delete the user’s pod, and delete the proxy pod for the setup to function again.

And I can’t seem to figure out why this happen :confused:
Any suggestions to this, or has anyone experienced the same?

It sounds like the problem is in your user’s pod, if it’s failed the proxy will fail to connect to it and will therefore return an error. Have you looked at the logs for the user’s pod?

Sorry for the delayed response @manics. I dont see any issues in the user pod’s logs :confused:
However, when I tried this morning the kernel restart was successful, so I couldn’t reproduce the error again.
I will continue to test this and update this question if I see the issue again :slight_smile: