we at LBNL’s Materials Project are running a SoftwareCarpentry-style workshop in about two weeks. We’ll have about 50 simultaneous users running the image built from our workshop repo on GitHub.
We’ve tested MyBinder as platform for our workshop in a dry-run yesterday and it’s been working fine. We’d like to make sure, though, that the system can smoothly handle our scale and lessons for the 3 days of Jul-31 to Aug-2.
A few potential issues we identified yesterday are
- premature session time-outs:
each session is an 1h15 and there might be inactivity during a session for more than 10 minutes.
- long server launch times:
I can improve the Dockerfile to significantly reduce build times but launching the image ranged from a few seconds to minutes. We’ll likely have all our attendees launch their servers at about the same time at the start of a lesson and would need to make sure it takes less than 30s - 1min.
- outgoing traffic:
There shouldn’t be a large amount of outgoing traffic but our lessons include calls to our API to download material/structure information. As attendees follow the instructor in live sessions, these ~50 requests will happen almost simultaneously.
Are there any other things (CPU, memory, …) we should be aware of before relying on MyBinder to run our workshop?
thanks for your help,
Thanks for putting in the effort to figure out if mybinder.org is for you and then posting about it!
I think the most important thing to remember is that mybinder.org is a shared resource and operated entirely by volunteers. This means several other people might decide to run large workshops at the same time (unlikely but you never know when you will get slashdotted) or decide to do evil stuff to it. The operators are distributed across Europe and the US, so we have good timezone coverage but it might still take 1-2h for someone to have time to react if something should go wrong. While total outages have become rare (touch wood), it is a good idea to have a backup plan!
Launch times (when the image is built) are determined largely by how big your image is. We will have to transfer it from our registry to the node(s) that launch it. If your image is 700MB this will take a lot less time than if it is 7GB. I had a quick look at your repository and you use a custom
Dockerfile. This means your image is unlikely to share layers with other images (already present on the node) and the base image will have to be fetched from docker hub (slower than our local registry and it is a large, rarely used base image).
You install a lot of stuff, if your sessions are 75min will you need all of it? I’d invest some time to check if you can do most of what you do manually in the
Dockerfile with the config files https://repo2docker.readthedocs.io/ recognises. Maybe you can reduce the number of dependencies/software/data you include as well as not everything will be covered? All in an attempt to shrink the image and increase layer sharing.
The timeout is 10-20min of inactivity, there is unfortunately not much that can be done about that, except to not be idle (that or find a donor who wants to massively increase the compute resources we have ). I find https://github.com/data-8/nbzip is a good extension to install that allows people to easily download the state of their work to their laptops before breaking for coffee. (In the long term there are plans for allowing people to push their work back to GitHub but no one has started work on that, …yet (hinthintwinkwink).)
Some other tips for image building:
- make a tag for the commit you are going to use for the workshop
- make a mybinder.org link that points to this particular tag and point people to that (not
- make sure to build that image at least once so when your workshop starts it has already been built
For outgoing network traffic: talk to the owner of the API to make sure you won’t get rate limited as all requests will appear to come from the same IP.
Thanks for the excellent feedback, Tim! Very helpful to decide which parts it’s best to focus on now. We do have a backup plan by also running a separate JupyterHub on our own Kubernetes cluster at NERSC. However, outsourcing some maintenance to MyBinder would take some workload off our plate, of course
Would it be possible for me to become an operator, too? It might enable us to deal with possible issues ourselves.
Of course. The best way to get started is to start hanging out on https://gitter.im/jupyterhub/mybinder.org-deploy, getting involved in the day to day of https://github.com/jupyterhub/mybinder.org-deploy/ and the upstream repositories. The way we run the team is based on trust between all those involved so in some sense the main (defined) task for becoming an operator is to build a relationship with the others and gain their trust. I think through doing that you’d learn all the things you’d need to know and gain experience, etc. This also tells you that this is a long term endeavour and not a matter of a two hour on-boarding session