We are running a Z2JH like setup on EKS with JupyterHub 1.5 and JupyterLab 3.0.16. Each users gets their own k8s node and are running often large research jobs on Lab.
The users have been getting an issue where we see in logs Starting buffering for <kernel_id>
followed by Websocket closed
and then Shutting down <n> kernels
and received signal 15, stopping
. This is happening occasionally but almost every week for a bunch of users and we can’t seem to figure out the exact root cause. Any idea on how to resolve this so JupyterLab instances do not die randomly.
Here we can see for some reason at at 10:43 things look fine and at 17:10 it starts this Starting bufferring
for kernels it seems and then closes the websocket randomly. This has been causing some problems for our users so any help is greatly appreciated.