Some popular machine learning packages, like ray (go bears!) make use of /dev/shm space and strongly recommend that /dev/shm allocation be at least 1/3rd of the RAM allocation on a container. Otherwise it will throw a warning like below:
024-01-23 03:24:21,039 WARNING services.py:1996 -- WARNING: The object store is using /tmp instead of /dev/shm because /dev/shm has only 67108864 bytes available. This will harm performance! You may be able to free up space by deleting files in /dev/shm. If you are inside a Docker container, you can increase /dev/shm size by passing '--shm-size=2.00gb' to 'docker run' (or add it to the run_options list in a Ray cluster config). Make sure to set this to more than 30% of available RAM.
2024-01-23 03:24:21,181 INFO worker.py:1724 -- Started a local Ray instance.
When running a jupyterlab in a standalone docker, --shm-size argument shown works perfectly. But, how do we accomplish this in a JupyterHub helm chart config.yaml when running with kubernetes?
but this configuration confuses me a bit – I don’t see any line in that config sets the size. e.g. surely there are different configurations for --shm-volume 30GB vs --shm-volume 10GB ? Or is this automatically set as a percent of the memory allocated to a container?
The volume shm-volume will be created when the user’s pod is created, and destroyed after the pod is destroyed. SHM usage by the pod will count towards its memory limit. When the memory limit is exceeded, the pod will be evicted.
Thanks @stebo85 , yes, sorry I wasn’t more clear with my question, but I’d read those docs before asking here. They state very clearly that the default shm size is a mere 64Mb,
The following configuration will increase the SHM allocation by mounting a tmpfs (ramdisk) at /dev/shm , replacing the default 64MB allocation.
Replaced with what size? I understand that shm use will count against the pod’s total RAM use, but I don’t see any size argument here. Is this saying that the user can now write up to the full RAM allocated to the pod to /dev/shm? Sorry if I’m being dense.
Agree - that’s not 100% clear in the documentation. On my installation, after following this instruction, /dev/shm in the pod shows as the size of the RAM of the underlying compute node, so in my case 250GB (I think it’s subtracting some RAM of the 256GB for Kubernetes overhead?). My understanding is (and please anyone correct me on this), that this shm size can be filled by the pod up to the RAM limit of the pod in your config. So, let’s say you have
singleuser:
memory:
limit: 7G
guarantee: 4G
then the pod will be killed when the RAM usage + the usage of shm is more than 7GB combined.