MyBinder and multiprocessing

We have lots of tutorials about our glacier model on MyBinder. The model per default checks the number of CPUs available with multiprocessing.cpu_count() and then use all of them. On a standard MyBinder env that’d be 16.

But that’s not the true resources given the user, right? What would be the right number of processes I can start as a user in a MyBinder env? Or should we switch off multiprocessing altogether?

Context:
I’m asking because we went from a standard pool to one created with a context object (for obscure reasons you don’t want to know about), and creating the new pool (which stores plenty of variables to be shared across processes) now takes ages (or freezes the notebook, I never had the patience to wait for it to be created).

2 Likes

Here is the config of provided resource for each user: https://github.com/jupyterhub/mybinder.org-deploy/blob/57eb5777ade9a684a832a33b22e61bbfb5dbc574/config/prod.yaml#L51

To conclude, you are on (gke.)mybinder.org limited to using 1 CPU core on average every 100ms.

So while you can run something in parallell, the net use of CPU per 100ms on average needs to be max 1CPU, so it probably doesnt make sense given those restrictions.

2 Likes

Wow I’m so often amazed on what use mybinder.org end up having, the OGGM project seems amazing! I’m sorry to conclude you are limited like this when doing cool stuff like that open source!

2 Likes

All the clusters backing mybinder.org should give you 1 core. I don’t know a general way to ask the question “how many cores does the container I am running in actually have access to?”. Asking for the number of CPUs (while inside a container) mostly just tells you the number of cores the host has :-/

For the specific case of mybinder.org checking the value of the CPU_LIMIT environment variable will tell you how many cores you can use. MEM_LIMIT is the corresponding variable for memory usage (in bytes).

2 Likes

Thank you all for your replies! We can totally live with these limitations, we just need to set some environment variable which will prevent the model to use multiprocessing on MyBinder.

2 Likes

You may get a more accurate “cpu count” using len(os.sched_getaffinity(0)).

See this StackOverflow answer for more info.

3 Likes

@ostrokach in a (docker) container, I think this still gives you just the total number of cores, does it not?

@rokroskar Depends how the docker container is executed:

$ docker run -it --rm --cpus="0.5" python:3.8-buster python -c "import os; print(len(os.sched_getaffinity(0)))"
12
$ docker run -it --rm --cpuset-cpus=0 python:3.8-buster python -c "import os; print(len(os.sched_getaffinity(0)))"
1

Interesting, thanks! I didn’t think about binding cpus - I guess kubernetes doesn’t do this automatically and fractional values are by definition not supported?