We are running a Z2JH like setup on EKS with JupyterHub 1.5 and JupyterLab 3.0.16. Each users gets their own k8s node and are running often large research jobs on Lab.
The users have been getting an issue where we see in logs
Starting buffering for <kernel_id> followed by
Websocket closed and then
Shutting down <n> kernels and
received signal 15, stopping. This is happening occasionally but almost every week for a bunch of users and we can’t seem to figure out the exact root cause. Any idea on how to resolve this so JupyterLab instances do not die randomly.
Here we can see for some reason at at 10:43 things look fine and at 17:10 it starts this
Starting bufferring for kernels it seems and then closes the websocket randomly. This has been causing some problems for our users so any help is greatly appreciated.