Jovian.ml increased usage in Binder

Recently we noticed that jovian.ml usage of Binder was spiking (I believe @arnim mentioned they sent like 4000 sessions to Binder in a week). Moreover, many of these sessions are uniquely hashed and so they trigger builds every single time. This was clogging up Binder, and is in general an amount of usage that exceeds our ability to maintain the infrastructure.

In the meantime, we have restricted Binders from being launched for jovian.ml, and I’ve opened up an issue in the jovian.ml forums to let them know. Perhaps we can discuss here if / how to get to a point where we can un-ban jovian.ml.

3 Likes

Usage by jovian.ml for the last week

@choldgraf Sorry for the trouble. We have disabled traffic to mybinder.org and working on setting up Binderhub in our own cluster. We would love to explore the possibility of joining the federation in the future.

5 Likes

Hi,
@aakashns from Jovian.ml here. Thanks for dropping a note on our forum.

We’ve seen a surge in traffic to our platform over the past couple of months due to a data analysis course we’ve been running, and that naturally translated to the increased traffic to Binder. Our intention was never to abuse the service, and I believe we could have done a better job of communicating the potential surge in traffic sooner. Sorry about that. We understand you probably had no other option but to cut us off completely.

As @siddhant mentioned, we are working on setting up our own BinderHub instance. This is mainly because we don’t have direct control over the user traffic to our site, so we can’t reliably limit the traffic we send to Binder without compromising on the user experience for our users. We hope that helps reduce the maintenance overhead for the MyBinder team.

I’d like to thank the MyBinder team on behalf of our community for providing & maintaining mybinder.org as a free service. Over 20,000 people worldwide have taken their first steps with Python & data science using Jupyter notebooks hosted on Jovian and executed on Binder. I cannot overstate how powerful it is as a beginner to be able to simply click a “Run on Binder” button and experiment with the code without having to spend hours figuring out the installation for Conda, Python, Jupyter, a bunch of libraries, especially on Windows, which is the OS most people use. The first steps into programming or data science shouldn’t be this hard, and services like Binder offer a way of making them easier.

Based on interactions with our users, we can clearly see that there are millions of people out there - in India, South-East Asia, Africa, South America, the Middle East (and of course Europe and North America) who are looking to take these first steps, and they could really use a free service like MyBinder to help them in the process. We’d love to see MyBinder grow to support 10x, 100x or 1000x the amount of traffic it is serving today - and it is truly something the worldwide data science community needs.

Here are some concrete ideas for how this can be made possible, based on our experience of using Binder over the past year:

  • Support for running a single Jupyter notebook file, apart from full Git repositories. Most people who are new to the domain of data science aren’t familiar with Git and are just looking to run a notebook.
  • Decouple the environment Docker image from the source code files, so that a new build does not have to be triggered if there’s already a built image for an environment.yml file.
  • Provide a way to mix & match environment Docker images with source files. While what Binder currently offers is quite flexible & powerful, most people are really just looking to run a Juptyer notebook with a bunch of data science libraries pre-installed. At Jovian, we’d love to use the same Docker image for all our notebooks - not only will it avoid rebuilding but it will also save a lot of waiting time for our users.
  • Provide an easy way for companies & institutions to add their spare compute capacity to MyBinder . Consider listing Binder as an application on the AWS, Google & Azure marketplaces, so that companies can set up a Binder cluster for internal use with a few clicks, and opt-in to provide their spare capacity to MyBinder.
  • Provide an API for programmatically launching, monitoring and shutting down instances on Binder. We are currently creating Git repositories and redirecting users to MyBinder URLs constructed on our backend to support the “Run on Binder” functionality.
  • Support some way of user authentication - not only for privacy reasons but also for rate-limiting. Right now there doesn’t seem to be an easy way of preventing one user launching 100 instances on Binder.
  • Add some documentation regarding the capacity MyBinder supports and best practices for using the service for online courses etc. so companies/institutions can make a more informed choice while picking an execution platform and avoid causing disruptions to MyBinder.

I apologize if any of these have already been implemented, we haven’t been tracking new releases very closely. I’m sure I’m not the first to think of these, and it’s possible that most if not all of these are already on your roadmap.

We’re a small team ourselves, but we’d love to help in any way we can. Thanks again!

Peace! :v:t2:

4 Likes
3 Likes

Hey @aakashns and @siddhant - thanks for reaching out and I hope we can work out something that works for both of our communities :sparkles:

A few responses below:

<3

This is great - one of the goals of the JupyterHub/Binder projects is to make computation more accessible to people outside of the north america/europe continents!

Us too! Our challenge is that there’s no “scaling model” for mybinder.org as it’s “just” a technical demo. So our challenge here is not one of different visions / hopes, we just have a reality that Binder has no dedicated resources right now so have to be careful about too much usage.

Thanks for coming up with some tangible ideas - we appreciate you putting in the effort to think through this and reach out.

Check out some of the ideas in this post: Tip: speed up Binder launches by pulling github content in a Binder link with nbgitpuller perhaps it would help you reduce the number of builds for repositories :+1:

We’ve discussed this before - feel free to add your thoughts etc!

We’ve discussed this as well and came up with (for now) a recommended way to do this with pre-existing tools rather than changing BinderHub, see:

I believe you can authenticate a BinderHub if you roll your own, the mybinder.org service is meant as a public demo and service and for this reason we don’t do user auth

There is some ability to do this already. For example I believe the library that the SpaCy docs use has the ability to cache a binder session for use on another page: https://github.com/ines/juniper

I’d love to see this support added to Thebe as well

This is a good idea, perhaps you’d be willing to open up an issue in the documentation repository to suggest the information you’d like to see so we can track the issue?

Also just a final note - I appreciate all of these suggestions about new development, but again please keep in mind that nobody is paid to work on Binder dev, we are just a community of volunteers, and I welcome your contributions to discuss and tackle some of these issues as well!

5 Likes

I like this idea. Both the making it easier to setup and the “then opt-in” part.

I don’t think anyone has tried/investigated the “listing on $foobar Marketplace” approach. However there is a Deploy to Azure button and I think 80% of a “deploy to GCP” button also exist but I don’t recall where/what state it is in. The big limiter here is that we already “struggle” to keep the documentation guide current for various different platforms. So I think making and keeping the buttons working is a cool thing for “someone else” to do as it is fairly self-contained and requires a lot of knowledge of the particular cloud platform.


On the topic of “opt-in”/spare resources: the way the federation works right now is that we run a redirector which has a list of potential BinderHubs. It polls each every few seconds to ask “what is your capacity for new launches?” and “what version of Binder are you running?”. If the BinderHub announces it has capacity and is running the same version of BinderHub + repo2docker as our “prime” instance it will be considered as a place to send launch requests. Otherwise it is skipped.

The cool thing is that the “do you have capacity?” question can be answered with “nope we are full” towards the federation but users who talk directly to that particular BinderHub instance will not be blocked/stopped from launching. The declared capacity is only used by the redirector. This features is (I think) already in use by https://notebooks.gesis.org/binder/ who sometimes scale down their declared capacity because they are hosting events or courses (at least we planned for exactly this use-case).


Growing mybinder.org is something I am excited about as well. Recently we crossed the “11M binders launched” threshold. Crazy by itself but even more crazy because it feels like yesterday when we celebrated the 2M threshold :smiley: I see three challenges: finding organisations who want to host a BinderHub for the federation (either by paying cloud credits or self hosting), human brain power to diagnose and fix things that come with running at large scale and human brain power to maintain the infrastructure. But I think together we can do it, especially if we each persuade our friends to spend time on this as well :wink:

2 Likes

It is also under the Turing and its in a poor state :sweat_smile: https://github.com/alan-turing-institute/binderhub-deploy-gke

Maybe we should consider adding these repos to the BinderHub docs so they’re more findable?

1 Like

Hi Chris @choldgraf, Emilio here, a Data Science enthusiast that is part of the Jovian course.

I’m from Latin America, specifically from Ecuador and I want to express my deep thanks to all the members of Binder dev, which makes available - for free - such a great tool. Making this plug and play tool perfect for newbies in the field. In addition, I want to apologize for the overusing the platform.

I’m going also to make my thanks public to @aakashns for the effort to democratize the knowledge giving the course " From zero to Pandas" for free in partnership with freeCodeCamp. Please @aakashns extend my appreciation to all the Jovian’s team members.

Thanks to both of you for innovating the tools on the field and for making them free.

5 Likes