This is based on a question ask in the chat by Brooks Ambrose (no forum account?):
Is there a way for a program running inside a BinderHub pod to detect directly what limits are imposed on it? I basically want to write the program to automatically adapt its core usage depending on whether it’s being run under resource limits or not.
For those only here for the answer: For the specific case of mybinder.org checking the value of the CPU_LIMIT
environment variable will tell you how many cores you can use. MEM_LIMIT
is the corresponding variable for memory usage (in bytes).
For some context, this is how we arrived at wanting to be able to do this:
Say a program detects cores to try to take full advantage of parallel processing, which makes sense locally when the whole system is available, but perhaps not when limits are enforced on the remote. There’s a big gap between a 1 core limit and 16 detected cores. So say my program detects the 16 on the node, and within the pod starts a process on all of them.
When the combined load hits the limit, what happens? Do the processes wait in line, or are they all throttled to fit collectively under the limit? Something else entirely? Same question regarding memory allocation.
There does not seem to be a generic way to detect the number of actual “cores” when running inside a docker container or kubernetes pod. You will always be told the number of cores on the host machine.
There is no (additional) penalty for using more CPU than you have been allocated, your processes get throttled and that is that. However you might use up part of your allocated resources with the “overhead” of starting each new process or thread (namely RAM and CPU). So overall I think it makes sense not to start 16 processes when you only have 1 core available. Your process(es) won’t get killed or evicted for using more than their share of CPU. They will get throttled instead.
A BinderHub tells you the CPU limit enforced on you via the CPU_LIMIT
environment variable.
There does not seem to be a generic way to detect the memory limit enforced on you inside a docker container or kubernetes pod. Once the total memory used by all processes in your pod exceed the memory limit, they all become eligible for being “OOM killed” (OOM = out of memory).
I am unsure if the result of being OOM killed is that your whole pod gets removed (this is what the docs make me think) or if processes get killed individually, which might lead to your pod being killed as a side effect (I think this is the case because of how using to much memory manifests for BinderHub users). Often what will happen is that you allocate too much memory in your kernel, which then gets killed. The notebook server itself (and the pod) continue to run though. Users experience this as “kernel died” when they run the cell in the notebook that takes them past the limit.
A lot more details in the kubernetes documentation on memory limits and this tech deep dive.
A BinderHub tells you the memory limit enforced on you via the MEM_LIMIT
environment variable.
There are notebook extensions which will show you how much RAM and CPU you are using as well as how close you are to the limit:
- GitHub - jupyter-server/jupyter-resource-usage: Jupyter Notebook Extension for monitoring your own Resource Usage
- GitHub - jtpio/jupyterlab-system-monitor: JupyterLab extension to display system metrics
- GitHub - NERSC/jupyterlab-cpustatus: Show CPU usage in JupyterLab statusbar
There are other forum threads which touch on this topic:
If you know more about this topic or spot any errors please let me know or add a message to this thread.