Prevent idle culler from culling active kernels?

I have a JupyterHub deployed using Kubernetes on Google Cloud.

If possible, I would like to configure the idle culler so that:

  1. User servers with browser activity within the past 60 minutes are not culled
  2. User servers running code (i.e. with active Python or R kernels) are not culled, even if the browser tab is closed
  3. If there is no browser activity and no kernel activity for more than 60 minutes, the user’s server is shutdown

The aim is to allow users to have long running simulations (up to several days), including allowing them to close their browser tab and shutdown their PC, but have their code continue to run on the Hub. However, because my Hub also offers access to more powerful (i.e. expensive) machine types, I also want to make sure that genuinely idle servers are culled, so we are not being charged unnecessarily.

Based on my reading of the documentation here, it sounds as though this should be possible just by setting

  enabled: true
  timeout: 3600

at the top level of config.yaml.

The docs say:

To help jupyterhub-idle-culler cull user servers, you should consider configuring the user servers’ kernel manager to cull idle kernels that would otherwise make the user servers report themselves as active which is part of what jupyterhub-idle-culler considers.

This makes me think that, by default, active kernels should not be culled, even if there is no browser activity. However, if I enter the the following code in a Jupyter notebook and set it running, then close the browser tab, the code runs successfully for one hour, then the server gets shutdown. In other words, it seems the idle culler does not consider kernel activity by default?

import datetime as dt
import time

fpath = "test.txt"
sleep = 60
while True:
    with open(fpath, "a") as file:
        now ="%d-%m-%Y %H:%M:%S")
        text = f"Still running at {now}.\n"

Can anyone suggest a combination of settings that will cull genuinely idle servers (i.e. no browser activity and no kernel activity), but otherwise leave them alone, please?

(As an aside, I am aware that there are issues with notebook output “disappearing” or becoming “disconnected” when users close their browser tab and then try to reconnect from a new tab later. This is not my issue here, since all outputs, plots etc. are being saved to disk. Right now, I’m just looking for a way to prevent the culler from shutting down actively running code, without turning off culling completely).

Thank you! :slight_smile:

If I am right, if there is no active open browser tab, your server will be culled even if you have long running kernel. So your second use case

User servers running code (i.e. with active Python or R kernels) are not culled, even if the browser tab is closed

wont work. For the cases 1 and 3, your config should be good!

Thanks. Yeah, getting option (2) working along with (1) and (3) is the issue I’m hoping to solve.

This feels like a fairly typical use case (i.e. allow code to run, but cull genuinely inactive servers), so I’m hoping it’s possible with the right config.

The alternative is to turn off the culler completely, but then we’ll need some other system in place to ensure users don’t leave expensive machines running doing nothing.

If your use case is to have a long running “kernel”, yeah, that might need some custom stuff. Check jupyter-slurm-provisioner which is made in the context of HPC Slurm scheduler. The idea is that you can run your kernel elsewhere which decouples you from the machine where JupyterLab server is running. In that case, you can safely cull your inactive servers and ensure that your kernel is running.

Even better, you can spawn your JupyterLab servers on less “expensive” nodes and launch the long running simulations elsewhere using such a kernel. The downside is that you will have to develop such kernel for Kubernetes.

1 Like