TLJH is awesome. It solves one of my persistent problems of needing a way to get up and running in a commercial cloud environment quickly and consistently. So, I have an instance of TLJH installed in GCP. Installation was breeze – thanks for the detailed instructions! But the server keeps timing out. That is, if I don’t interact with the UI on a regular basis (e.g., let a long-running process of 60 minutes or so run its course while I do something else), it times out and shuts the server down. When I come back, I get an error message and have to refresh and restart the server. Which brings me to the questions in my title: why would TLJH ever time out? I can understand why you would want to time-out on a system that scales up and down (e.g., on Kubernetes), but in this case, the computational resources are fixed, so really time-outs should never happen (or am I missing something?). Alternatively, is there a way to configure it so that it never times out? As always, thanks for all your work!
Hi,
Unless you changed it, the default behaviour in TLJH is to cull idle servers after 10 minutes. You should see if this is happening by inspecting /var/log/syslog. A check for idle servers happens every 60 seconds (by default), as shown by messages of the type:
200 GET /hub/api/users (cull-idle@127.0.0.1) 21.34ms
and the culling itself, after 600 seconds:
cull_idle_servers:154] Culling server (inactive for 00:10:35)
I’m not sure exactly how “inactive” is defined: it seems that servers can be culled even if the kernel is active but it is not writing to standard out (but this behaviour doesn’t seem to be 100% reproducible so perhaps it depends on other factors. If someone could clarify that would be great.).
You can turn off culling through the tljh-config config tool, as described here:
http://tljh.jupyter.org/en/latest/topic/idle-culler.html
i.e.,
sudo tljh-config set services.cull.enabled False
sudo tljh-config reload
Thanks! That makes sense. I guess that page also answers my original question: you want to cull idle servers so that different users of your TLJH won’t be hogging resources from each other. The default behavior assumes a somewhat different use-case than mine (naturally), which is that many different users will be logging into the hub at different times.