Which is the correct way to cull idle kernels and notebook?

After reading the docs. I found two ways to cull idle notebook and kernels.
One is:

# shutdown the server after no activity for 20 minutes
c.NotebookApp.shutdown_no_activity_timeout = 20 * 60
# shutdown kernels after no activity for 10 minutes
c.MappingKernelManager.cull_idle_timeout = 10 * 60
# Check every minute
c.MappingKernelManager.cull_interval = 60
# cull connected (e.g. Browser tab open)
c.MappingKernelManager.cull_connected = True

The other:

c.JupyterHub.services = [
    {
        'name': 'idle-culler',
        'admin': True,
        'command': [
            sys.executable,
            '-m', 'jupyterhub_idle_culler',
            '--timeout=600'
        ],
    }
]

I have tried the first way, but neither kernel nor notebook can be culled. Is there anything wrong with the first method?

So far I have only seen the second approach. I am not sure whether that means anything specific. Have you tried it out?

The first approach alone does not work. But when both are set, idle notebook without any open browser tab can be culled.
I will check whether the idle notebook with connected client(browser tab open) can be culled.

The idle-culler service should be enough. To keep your config clean, I guess you can remove the first approach.

I found the first approach on the jupyterhub official documentation: Configuring user environments — JupyterHub 1.3.0 documentation
It should be the right configuration. Now I am a little confused

Well, the idle-culler is an independent GitHub project which is often integrated in a JupyterHub installation. Depending on the version, you must install it (e.g. via pip) yourself. It uses the JupyterHub REST API and it runs in a separate process. Therefore, it does not have access to the configuration. It also sends of REST API commands to switch off a kernel. If this solution works, you don’t need an additional attempt that tries to do the same thing.

The other option is to use inbuilt functionalities inside the JupyterHub. This requires the correct version (notebook ≥ 5.4). I guess this is a newer approach - for me, it looks like a transition from one approach to the other is happening. Maybe you can do some research about that?

Anyways, both solutions try to solve the same thing. If one solution is working for you now, you don’t need the second approach as well. That is why I said you can safely remove it.

1 Like

While the two aim in general to solve the same category of problem (wasted resources), they have different metrics available and different levels of action to take.

The notebook-environment configuration should in general produce better, more fine-grained results because it can do things like cull unused kernels and be aware of things like idle/busy or connected status. This lets the notebook server make more intelligent choices like “shutdown a kernel if it’s been idle for 5 minutes BUT not if there’s an open tab currently connected to it and/or it’s in the middle of running a long computation”. Then, finally, it can shutdown the server itself if there have been no API requests and no running kernels for NotebookApp.shutdown_no_activity_timeout. This can be better, because it makes it easier for users to keep their kernels and/or servers running without being inappropriately culled (see various discussions on mybinder.org about how the current culling logic is deleting sessions that people feel like they are still using).

Critically, there is a shortcoming in the internal culling logic, which is that terminal activity is not measured. Open terminals are always considered active and never register as idle. If you leave a terminal running, the internal culler will never shutdown the server itself. This should be considered a missing feature we need to implement.

The external jupyterhub culler has much less granular information to act on: Whether there has been network traffic to the service, as measured by the proxy; and can only shutdown the whole server. This can simply measure “has anybody talked to me in the last X minutes?” Not any information about what operations were taken, is anything running, etc. It’s vulnerable to false-activity registered by left-open tabs. The external culler is also insensitive to the user’s environment - we wouldn’t want users on mybinder.org to set their own cull parameters, which they could if the internal culler were our only mechanism.

I would consider it a best practice in general to use both of these, because they can both be fooled in different ways. How exactly you configure them will depend on your relationship with your users and computational resources. Generally, I would usually say that the internal culler should have shorter timeouts because it can be smarter, especially if cull_connected and cull_busy are False. This is both because it’s less likely to have a false positive for shutdown, and because losing a kernel is less disruptive than losing a whole notebook server (no notebook data loss, only kernel state). Then the outer jupyterhub culler can make a more coarse-grained timeout (say, 1 hour).

On mybinder.org, we use the internal culler to be more aggressive, to prevent left-open websocket connections from preventing shutdown. I’d have to do some digging to find how often each culler is responsible for a given pod’s shutdown.

If the first method isn’t working at all, I’d first add --debug to the single-user launch command and make sure that the configuration is being loaded in the first place. Then you might be able to dig into why it doesn’t think things are idle if they should be.

5 Likes

Thank you very much for the clarification and much deeper insights than I could provide here.

Thank you both for throwing such an amount of light.

Hi folks - this is Jagane Sundar of InfinStor. I’m working on our free cloud hosted Jupyterhub as a Service offering and this issue of idle detect has popped up. Our spawner spins up a single cloud VM for each jupyterlab user and shuts it down upon idle. We want aggressive idle detect since cloud VMs, especially ones with a GPU, are expensive. Note that we run jupyterlab inside a container inside a VM (1-1 relationship between container and VM), and we preserve the contents of the container and VM by not removing the container and by stopping the VM instead of terminating it. This means that the contents of the jupyterlab such as notebooks, downloaded data etc. are not lost when jupyter server is stopped.

We are using the following settings in the jupyter notebook server:

c.ServerApp.shutdown_no_activity_timeout = 10 * 60
c.MappingKernelManager.cull_idle_timeout = 10 * 60
c.MappingKernelManager.cull_interval = 1 * 60
c.MappingKernelManager.cull_connected = True

That is to say, cull idle kernels after 10 minutes of inactivity, and stop the jupyter server 10 minutes of idle after the last kernel has been terminated. In addition, on the jupyterhub side, we have the following configuration snippet:

c.JupyterHub.services = [
    {
        'name': 'idle-culler',
        'admin': True,
        'command': [sys.executable, '-m', 'jupyterhub_idle_culler', '--timeout=1800'],
    }
]

This jupyterhub idle-culler is meant to be a fall back and stops jupyterlabs after 30 minutes of idle.

Now, here’s what I’m observing: unless I’m completely mistaken, the javascript code in the browser sends a poll of some kind whenever the jupyterlab browser tab is visible. Hence the idle culler never kicks in if the browser tab is visible. Am I correct in this observation?

I would like to configure it such that only the following are considered activity:

  1. mouse click in the jupyterlab browser tab, not mouse movement
  2. keystokes in the jupyterlab browser tab
  3. one or more kernels doing computation work

Is there a way for me to configure the jupyterlab/hub in this manner?

Cheers, and thank you folks for creating this awesome piece of software.

1 Like

Some more details. It would appear that the browser javascript is polling the following URLs:

  • GET /api/contents
  • GET /api/kernels
  • GET /api/sessions
  • GET /api/terminals
1 Like

Do you know the difference between c.ServerApp.shutdown_no_activity_timeout and c.NotebookApp.shutdown_no_activity_timeout?

I actually did not know of the existence of c.NotebookApp.shutdown_no_activity_timeout

I will experiment with this now and report later.

OK, so when I started the notebook server with

c.NotebookApp.shutdown_no_activity_timeout

I got the following message

[W 2021-03-11 10:22:14.623 LabApp] 'shutdown_no_activity_timeout' has moved from NotebookApp to ServerApp. This config will be passed to ServerApp. Be sure to update your config before our next release.

So I think that

c.ServerApp.shutdown_no_activity_timeout

is the one to use

Does the idle culler work with open terminal sessions (with no active processes)? I have tested with both JupyterHub 1.5 and 2.2.1 and idle culler works if a there is open browser/no terminal, but not if there is an open browser session/with terminal. I saw there was a PR to address related to this post. Thanks!

Working now. Please disregard. Added TerminalManager to the exising singleuser helm chart MappingKernelManager config. Timeout and interval values are really low for testing.

singleuser:
  extraFiles:
      mountPath: /etc/jupyter/jupyter_notebook_config.json
      data:
        TerminalManager:
          cull_inactive_timeout: 30
          cull_interval: 60
        MappingKernelManager:
          cull_idle_timeout: 30
          cull_interval: 60
          cull_connected: true
          cull_busy: false

@jagane I came across this post as I’ve noticed some strange behavior with my team’s latest installation of JupyterLab (v. 3.4.3) and a notebook’s detection of an idle kernel.

I have experienced many situations where the JupyterLab has labeled the kernel as idle, when it is actively busy.

This can be detected by correlating the running Python processes with the Jupyter notebook kernel IDs.

I’m not sure if this will impact you; but, it sounded relevant, since in your scenario, you’d terminate my VM when it was actively busy (but mislabeled as idle).