JupyterHub doesn't kill processes and threads when notebooks are closed or user log out

Hello,

on our new JupyterHub instance we noticed, that user processes and threads (sessions) don’t stop when notebooks are closed or when the user logs out. On the opening of each - e.g. - R notebook, a new process and several threads are spawned. Is that the intended behaviour that JupyterHub doesn’t clean up user processes?

We expect many active users on our system and would like to avoid that the process space fills up with thousands of processes and threads. No matter wether they consume resources or not.

Is there a safe way to clean up these unused processes?

Cheers
frank

Hi Frank - welcome to the Jupyter community discourse forum!

Because computations can take a long time (hours, sometimes days) the underlying kernel (and its associated ports) remain allocated even when the notebook is closed - this is by design. However, because of that requirement, distinguishing between a user intentionally leaving their kernel process running versus those forgetting to shutdown the kernel is difficult. To address these circumstances, culling can be configured based on inactivity (or idleness). In JupyterHub configurations, culling can be configured at two levels.

Level 1. Notebook server provides the ability to cull Notebook kernels after some period of inactivity. There are also options for whether the kernel should be culled if currently connected or even if busy (i.e., a cell is executing). The culling polling period is also configurable. See the configuration options relative to the MappingKernelManager class.

Level 2. JupyterHub provides the ability to cull Notebook servers. This is accomplished via an external cull_idle_service. I don’t think this service takes into account whether a given kernel of the Notebook server being checked is busy or not, so a cell executing for 10 hours may still appear as a Notebook server that’s been idle for 10 hours.

In either case, you should consider what might be the longest period you as an administrator would like a given cell to complete its execution (plus some period for possible analysis) and be sure to set the inactivity setting(s) greater than that calculation.

If you want to unconditionally shutdown inactive servers (regardless of busy state or not) after some period, then you’d only need to configure the cull_idle_service since stopping a Notebook server will also shutdown any active kernels.

1 Like

Hello Kevin,

thank you for the welcoming and thank you for the detailed, informative and most helpful answer. We will definitively go towards option one and are still discussing, if option two could be an option, too. (pun not intended…:slight_smile: ).

Cheers, frank

1 Like

Your detailed theoretical explanation is completely missing the practical bit. What actual actions should I take in the command line or in the UI in order to actually get rid of python3 processes left after abandoning sagemath notebooks? They are unkillable, not even with -9, because something apparently keeps restarting them immediately!

Hi @rulatir. Attempting to kill the kernel process directly will indeed trigger the auto-restarter to deem the kernel has died and it will restart the kernel (by design). Shutting down the notebook server instance should take any kernels it is managing down with it. If the running kernel is displayed in the Notebook/Lab UI, you could issue a shutdown request. Alternatively, you could also issue a DELETE request against the /api/kernels/<kernel_id> endpoint if you know the kernel’s id (and token for the notebook server instance).

To configure culling, at either level, you’ll need to consult the links I included 3 years ago. I’m not familiar with configuring notebook servers spawned within Hub, so perhaps someone from that community can assist if you find the documentation insufficient.

1 Like