How to avoid JupyterHub server (multi user) to be oom-killed by some kernel/specific user?

This question can be viewed as a specific instantiation of this general question on avoiding out of memory issues on other users’ jobs in multiuser servers.

In my case, I have a JupyterHub server on a Ubuntu 22.04 machine, but very often I got it killed by someone running a (R/Julia/Python ???) computationally intensive kernel:

$ systemctl status jupyterhub
× jupyterhub.service - Jupyterhub
     Loaded: loaded (/lib/systemd/system/jupyterhub.service; enabled; vendor preset: enabled)
     Active: failed (Result: oom-kill) since Mon 2025-08-04 06:50:18 CEST; 2h 23min ago
    Process: 1860085 ExecStart=/usr/local/bin/jupyterhub -f /etc/jupyterhub/jupyterhub_config.py (code=exited, status=0/SUCCESS)
   Main PID: 1860085 (code=exited, status=0/SUCCESS)
        CPU: 1d 22h 22min 51.877s

Aug 04 06:50:07 ncy-beta-compserver jupyterhub[1860085]: [I 2025-08-04 06:50:07.577 JupyterHub log:191] 200 POST /jupyter/hub/api/users/baue/activity (baue@127.0.0.1) 197.06ms
Aug 04 06:50:07 ncy-beta-compserver jupyterhub[1860085]: [I 2025-08-04 06:50:07.611 JupyterHub log:191] 200 POST /jupyter/hub/api/users/guinard/activity (guinard@127.0.0.1) 31.87ms
Aug 04 06:50:07 ncy-beta-compserver jupyterhub[1860085]: [C 2025-08-04 06:50:07.611 JupyterHub app:3336] Received signal SIGTERM, initiating shutdown...
Aug 04 06:50:07 ncy-beta-compserver jupyterhub[1860085]: [I 2025-08-04 06:50:07.611 JupyterHub app:2976] Cleaning up 1 services...
Aug 04 06:50:07 ncy-beta-compserver jupyterhub[1860085]: [I 2025-08-04 06:50:07.612 JupyterHub app:2981] Cleaning up single-user servers...
Aug 04 06:50:07 ncy-beta-compserver jupyterhub[1860085]: [I 2025-08-04 06:50:07.612 JupyterHub proxy:820] Cleaning up proxy[1860088]...
Aug 04 06:50:07 ncy-beta-compserver jupyterhub[1869547]: [I 2025-08-04 06:50:07.612 ServerApp] Interrupted...
Aug 04 06:50:10 ncy-beta-compserver jupyterhub[1860085]: [I 2025-08-04 06:50:10.044 JupyterHub app:3013] ...done
Aug 04 06:50:18 ncy-beta-compserver systemd[1]: jupyterhub.service: Failed with result 'oom-kill'.
Aug 04 06:50:18 ncy-beta-compserver systemd[1]: jupyterhub.service: Consumed 1d 22h 22min 51.877s CPU time.

At that time, I need to restart the server with sudo systemctl restart jupyterhub.

How can I:

  1. Find the “offending” user/script (jupyter kernel)
  2. Limit the total memory used by a single user
  3. Set the priority of the JupyterHub proces higger and the individual notebooks lower so that the notebooks are killed first

I have 256GB of RAM in that machine and I have created the file /etc/systemd/system/user-.slice.d/50-memory.conf the file with:

[Slice]
MemoryHigh=128G
MemoryMax=196G
CPUQuota=1200%
CPUWeigth=20

But this obviously is not working…

(crossref on ServerFault)

Try using a spawner that supports resource limits, for example systemdspawner or dockerspawner.

2 Likes

In addition to what @manics has suggested, this systemd conf that you posted is not going to help your JupyterHub service. This conf sets limits on every logged user into the server. You will have to setup those limits on the JupyterHub systemd service unit. You can check current limits using systemctl show jupyterhub.service and override them by placing a file in /lib/systemd/system/jupyterhub.service.d/override.conf.

2 Likes

Thank you.

Indeed my spawner is currently c.JupyterHub.spawner_class = ‘jupyterhub.spawner.LocalProcessSpawner and I am a bit afraid to change it.

Glad to see I can set resource limits using the “classical” spawner. Running the command `systemctl show jupyterhub.servic shows tons of options.. which is the one I need to set on /lib/systemd/system/jupyterhub.service.d/override.conf in order to limit a user (or a kernel??) to, say, 190 GB of RAM ? The same settings I put in the [Slice] stuff ?

I see that there is also an option OOMScoreAdjust=-500 .. if I increase this value, the oom killer tend to favour killing the individual notebook (that is what I would want) ?

Whatever I put on that override configuration file, or directly in the /lib/systemd/system/jupyterhub.service file, I then have:

MemoryAccounting=yes
DefaultMemoryLow=0
DefaultMemoryMin=0
MemoryMin=0
MemoryLow=0
MemoryHigh=infinity
MemoryMax=infinity
MemorySwapMax=infinity
MemoryLimit=infinity

when I then run the show command… also I think that the OOMScoreAdjust refers to the server itself rather than to the individual kernels…

If you dont use spawners like systemd or docker, there is a possibillity of one user taking up all the CPU and memory and effectively knocking all other users from their servers. Moreover, if you use systemd or docker spawners, there wont be any need to increase resource limits on JupyterHub as hub itself takes very few resources. It will be systemd/docker that will supervise the resource usage of individual single user servers and kill them if they go over limit without impacting JupyterHub and other single user servers.

3 Likes