Hi,
We have a cluster where we have Jupyterhub running, we offer the users the option to either submit jupyter as a slurm job for which we use profilespawner or a local development, which runs natively on the node. However, we have observed that on the node the jupyterhub instance does not obey systemd limits set.
I checked with the systemd checker script available on systemdspawner and it returned Memory and CPU limiting as enabled, which should be the case (as it is enabled as a per user slice.
The config is as follow
import batchspawner
c.JupyterHub.cleanup_servers = False
c.Authenticator.allow_all = True
c.Spawner.env_keep = ['PATH', 'PYTHONPATH', 'CONDA_ROOT', 'CONDA_DEFAULT_ENV', 'VIRTUAL_ENV', 'LANG', 'LC_ALL', 'JUPYTERHUB_SINGLEUSER_APP']
c.Spawner.start_timeout = 120
c.JupyterHub.spawner_class = 'wrapspawner.ProfilesSpawner'
c.Spawner.cmd = ['jupyter-labhub']
c.Spawner.http_timeout = 120
c.SlurmSpawner.batch_script = '''#!/bin/bash
#SBATCH --output={{homedir}}/Jupyter/jupyterhub_slurmspawner_%j.log
#SBATCH --job-name=jupyterhub
#SBATCH --chdir={{homedir}}
#SBATCH --export={{keepvars}}
#SBATCH --constraint=bookworm
#SBATCH --get-user-env=L
{% if partition %}#SBATCH --partition={{partition}}
{% endif %}{% if runtime %}#SBATCH --time={{runtime}}
{% endif %}{% if memory %}#SBATCH --mem={{memory}}
{% endif %}{% if gres %}#SBATCH --gres={{gres}}
{% endif %}{% if nprocs %}#SBATCH --cpus-per-task={{nprocs}}
{% endif %}{% if reservation%}#SBATCH --reservation={{reservation}}
{% endif %}{% if options %}#SBATCH {{options}}{% endif %}
set -euo pipefail
trap 'echo SIGTERM received' TERM
module load git
module load jupyterhub/1.1
{{prologue}}
which jupyterhub-singleuser
{{cmd}}
echo "jupyterhub-singleuser ended gracefully"
{{epilogue}}
'''
### SystemdSpawner config
c.SystemdSpawner.mem_limit = '16G'
c.SystemdSpawner.cpu_limit = 4.0
c.SystemdSpawner.disable_user_sudo = True
c.ProfilesSpawner.ip = '0.0.0.0'
c.ProfilesSpawner.profiles = [
('Local server - Use it !*ONLY FOR DEVELOPMENT*! 16GB RAM, 8 CPUs', 'local_limited', 'systemdspawner.SystemdSpawner', {'ip':'0.0.0.0', 'limits':{'mem_limit':'16G', 'cpu_limit':'4.0'}}),
('mycluster - 1 CPU core, 4GB RAM, No GPU, 8 hours', 'mycluster1c4gb0gpu8h', 'batchspawner.SlurmSpawner', dict(req_nprocs='1', req_partition='default', req_runtime='8:00:00', req_memory='4G', req_gpu='0')),
('mycluster - 8 CPU core, 20GB RAM, No GPU, 48 hours', 'cluster8c20gb0gpu48h', 'batchspawner.SlurmSpawner', dict(req_nprocs='8', req_partition='default', req_runtime='48:00:00', req_memory='20G', req_gpu='0')),
('mycluster - 16 CPU core, 32GB RAM, No GPU, 48 hours', 'cluster8c32gb0gpu48h', 'batchspawner.SlurmSpawner', dict(req_nprocs='16', req_partition='default', req_runtime='48:00:00', req_memory='32G', req_gpu='0')),
('mycluster - 4 CPU core, 60GB RAM, No GPU, 48 hours', 'mycluster4c60gb0gpu48h', 'batchspawner.SlurmSpawner', dict(req_nprocs='4', req_partition='default', req_runtime='48:00:00', req_memory='60G', req_gpu='0')),
]
I don’t see anything in the logs apart from a few errors:
[I 2024-11-13 15:18:26.905 JupyterHub app:3352] Running JupyterHub version 5.2.0
[I 2024-11-13 15:18:26.905 JupyterHub app:3382] Using Authenticator: jupyterhub.auth.PAMAuthenticator-5.2.0
[I 2024-11-13 15:18:26.905 JupyterHub app:3382] Using Spawner: wrapspawner.wrapspawner.ProfilesSpawner
[I 2024-11-13 15:18:26.905 JupyterHub app:3382] Using Proxy: jupyterhub.proxy.ConfigurableHTTPProxy-5.2.0
[I 2024-11-13 15:18:26.907 JupyterHub app:1837] Loading cookie_secret from /etc/jupyterhub-test/jupyterhub_cookie_secret
[I 2024-11-13 15:18:26.954 JupyterHub proxy:556] Generating new CONFIGPROXY_AUTH_TOKEN
[W 2024-11-13 15:18:26.990 JupyterHub spawner:179]
The shared database session at Spawner.db is deprecated, and will be removed.
Please manage your own database and connections.
Contact JupyterHub at https://github.com/jupyterhub/jupyterhub/issues/3700
if you have questions or ideas about direct database needs for your Spawner.
[W 2024-11-13 15:18:27.003 JupyterHub spawner:179]
The shared database session at Spawner.db is deprecated, and will be removed.
Please manage your own database and connections.
Contact JupyterHub at https://github.com/jupyterhub/jupyterhub/issues/3700
if you have questions or ideas about direct database needs for your Spawner.
[I 2024-11-13 15:18:27.026 JupyterHub app:3059] user1 still running
[I 2024-11-13 15:18:27.026 JupyterHub app:3422] Initialized 2 spawners in 0.045 seconds
[I 2024-11-13 15:18:27.029 JupyterHub metrics:373] Found 2 active users in the last ActiveUserPeriods.twenty_four_hours
[I 2024-11-13 15:18:27.030 JupyterHub metrics:373] Found 2 active users in the last ActiveUserPeriods.seven_days
[I 2024-11-13 15:18:27.030 JupyterHub metrics:373] Found 4 active users in the last ActiveUserPeriods.thirty_days
[W 2024-11-13 15:18:27.030 JupyterHub proxy:625] Found proxy pid file: /etc/jupyterhub-test/jupyterhub-proxy.pid
[W 2024-11-13 15:18:27.030 JupyterHub proxy:642] Proxy still running at pid=1274690
[W 2024-11-13 15:18:29.031 JupyterHub proxy:662] Stopped proxy at pid=1274690
[I 2024-11-13 15:18:29.032 JupyterHub proxy:752] Starting proxy @ https://x.x.x.x:443/
[E 2024-11-13 15:18:29.039 JupyterHub proxy:949] api_request to proxy failed: HTTP 403: Forbidden
[E 2024-11-13 15:18:29.039 JupyterHub app:3921]
Traceback (most recent call last):
File "/mnt/nfs/clustersw/Debian/bookworm/jupyterhub/1.1/lib/python3.11/site-packages/jupyterhub/app.py", line 3919, in launch_instance_async
await self.start()
File "/mnt/nfs/clustersw/Debian/bookworm/jupyterhub/1.1/lib/python3.11/site-packages/jupyterhub/app.py", line 3706, in start
await self.proxy.get_all_routes()
File "/mnt/nfs/clustersw/Debian/bookworm/jupyterhub/1.1/lib/python3.11/site-packages/jupyterhub/proxy.py", line 989, in get_all_routes
resp = await self.api_request('', client=client)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/mnt/nfs/clustersw/Debian/bookworm/jupyterhub/1.1/lib/python3.11/site-packages/jupyterhub/proxy.py", line 953, in api_request
result = await exponential_backoff(
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/mnt/nfs/clustersw/Debian/bookworm/jupyterhub/1.1/lib/python3.11/site-packages/jupyterhub/utils.py", line 249, in exponential_backoff
ret = await maybe_future(pass_func(*args, **kwargs))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/mnt/nfs/clustersw/Debian/bookworm/jupyterhub/1.1/lib/python3.11/site-packages/jupyterhub/proxy.py", line 938, in _wait_for_api_request
return await client.fetch(req)
^^^^^^^^^^^^^^^^^^^^^^^
tornado.httpclient.HTTPClientError: HTTP 403: Forbidden
15:18:29.181 [ConfigProxy] ^[[32minfo^[[39m: Proxying https://x.x.x.x:443 to (no default)
15:18:29.182 [ConfigProxy] ^[[32minfo^[[39m: Proxy API at http://127.0.0.1:8001/api/routes
15:18:29.188 [ConfigProxy] ^[[31merror^[[39m: Uncaught Exception: listen EADDRINUSE: address already in use x.x.x.x:443
15:18:29.188 [ConfigProxy] ^[[31merror^[[39m: Error: listen EADDRINUSE: address already in use 10.36.98.39:443
at Server.setupListenHandle [as _listen2] (node:net:1897:16)
at listenInCluster (node:net:1945:12)
at doListen (node:net:2109:7)
at process.processTicksAndRejections (node:internal/process/task_queues:83:21)
15:18:29.188 [ConfigProxy] ^[[31merror^[[39m: Uncaught Exception: listen EADDRINUSE: address already in use 127.0.0.1:8001
15:18:29.189 [ConfigProxy] ^[[31merror^[[39m: Error: listen EADDRINUSE: address already in use 127.0.0.1:8001
at Server.setupListenHandle [as _listen2] (node:net:1897:16)
at listenInCluster (node:net:1945:12)
at doListen (node:net:2109:7)
at process.processTicksAndRejections (node:internal/process/task_queues:83:21)