Hello. I am running the The Littlest JupyterHub for my class. I am hosting this on Google Cloud. Usually, JupyterHub works fine. However, sometimes if a student has an error in their code, their kernel gets stuck, which eventually takes down the entire Google Cloud instance where I am hosting JupyterHub. Is there a way to prevent this from happening?
Sorry if the question is naive. I am a relative novice with JupyterHub.
It’s not in any way trivial, but here is one example based on a 4 CPU / 32 GB machine.
As a lower limit on what users need on average, I think you can go with 0.05 CPU / 250MB, which could enable 80 users to work on the same 4 CPU / 32 GB machine.
As an upper limit on what any individual user should be allowed to consume, I think you can go with 1 CPU and 1 GB of memory. No single user can then drain all CPU or memory, and users get informed if they peak in memory by a “OOMKilled” notification or similar and don’t end up waiting for the entire machine to start running out of memory.
I suggest:
a preliminary 1CPU/1GB limit for users
to setup TLJH on machines with relative high amounts of memory compared to CPU, such as a 1:8 ratio that a 4 CPU / 32 GB memory machine has, as typically users run out of memory rather than CPU