This question can be viewed as a specific instantiation of this general question on avoiding out of memory issues on other users’ jobs in multiuser servers.
In my case, I have a JupyterHub server on a Ubuntu 22.04 machine, but very often I got it killed by someone running a (R/Julia/Python ???) computationally intensive kernel:
$ systemctl status jupyterhub
× jupyterhub.service - Jupyterhub
Loaded: loaded (/lib/systemd/system/jupyterhub.service; enabled; vendor preset: enabled)
Active: failed (Result: oom-kill) since Mon 2025-08-04 06:50:18 CEST; 2h 23min ago
Process: 1860085 ExecStart=/usr/local/bin/jupyterhub -f /etc/jupyterhub/jupyterhub_config.py (code=exited, status=0/SUCCESS)
Main PID: 1860085 (code=exited, status=0/SUCCESS)
CPU: 1d 22h 22min 51.877s
Aug 04 06:50:07 ncy-beta-compserver jupyterhub[1860085]: [I 2025-08-04 06:50:07.577 JupyterHub log:191] 200 POST /jupyter/hub/api/users/baue/activity (baue@127.0.0.1) 197.06ms
Aug 04 06:50:07 ncy-beta-compserver jupyterhub[1860085]: [I 2025-08-04 06:50:07.611 JupyterHub log:191] 200 POST /jupyter/hub/api/users/guinard/activity (guinard@127.0.0.1) 31.87ms
Aug 04 06:50:07 ncy-beta-compserver jupyterhub[1860085]: [C 2025-08-04 06:50:07.611 JupyterHub app:3336] Received signal SIGTERM, initiating shutdown...
Aug 04 06:50:07 ncy-beta-compserver jupyterhub[1860085]: [I 2025-08-04 06:50:07.611 JupyterHub app:2976] Cleaning up 1 services...
Aug 04 06:50:07 ncy-beta-compserver jupyterhub[1860085]: [I 2025-08-04 06:50:07.612 JupyterHub app:2981] Cleaning up single-user servers...
Aug 04 06:50:07 ncy-beta-compserver jupyterhub[1860085]: [I 2025-08-04 06:50:07.612 JupyterHub proxy:820] Cleaning up proxy[1860088]...
Aug 04 06:50:07 ncy-beta-compserver jupyterhub[1869547]: [I 2025-08-04 06:50:07.612 ServerApp] Interrupted...
Aug 04 06:50:10 ncy-beta-compserver jupyterhub[1860085]: [I 2025-08-04 06:50:10.044 JupyterHub app:3013] ...done
Aug 04 06:50:18 ncy-beta-compserver systemd[1]: jupyterhub.service: Failed with result 'oom-kill'.
Aug 04 06:50:18 ncy-beta-compserver systemd[1]: jupyterhub.service: Consumed 1d 22h 22min 51.877s CPU time.
At that time, I need to restart the server with sudo systemctl restart jupyterhub
.
How can I:
- Find the “offending” user/script (jupyter kernel)
- Limit the total memory used by a single user
- Set the priority of the JupyterHub proces higger and the individual notebooks lower so that the notebooks are killed first
I have 256GB of RAM in that machine and I have created the file /etc/systemd/system/user-.slice.d/50-memory.conf
the file with:
[Slice]
MemoryHigh=128G
MemoryMax=196G
CPUQuota=1200%
CPUWeigth=20
But this obviously is not working…
(crossref on ServerFault)