I updated the `binder-data` repository with a more usable launches dataset

Hey all - I’ve updated the binder data repository with a few enhancements:

  • I’ve removed old/outdated data sources (like google analytics)
  • I’ve added a new folder that scrapes and aggregates Binder data from archive.analytics.mybinder.org
  • I’ve added a github workflow that runs each week and publishes the aggregated data as a github release
  • I’ve added another folder that has a notebook to provide some simple visualizations of this data to show it off
  • I’ve also set this up as a MyST site so that people can view the Binder launch data quickly.

I updated the READMEs etc of the repository so that it’s easy enough to follow along, but please let me know if you have questions. Hopefully this makes the repository a little bit more useful for Binder!

Repository: GitHub - jupyterhub/binder-data: A place to store data for Binder
JB2/MyST site: MyBinder Analytics Data Analysis - MyBinder Analytics Report

In case anybody is curious why there’s a drop in traffic, here’s a very rough timeline of the launches:

Essentially, we’ve seen big drops in traffic because Google Cloud cut our credit allocation, and as a result we had to significantly scale back the capacity of mybinder.org. We moved many repositories over to JupyterLite, and we also capped the number of sessions in general, so this significantly shrunk the overall capacity of Binder, and also reduced its perceived reliability, thus lowering demand. In 2025 we spun up two Hetzner hubs and this has stabilized binder quite a lot

Also, SHOUT OUT TO GESIS who have been persistently supporting Binder with capacity for many years now. They are unsung heroes of this project.

I did this work because I wanted to write a little 2i2c blog post to highlight the launches on the 2i2c binder federation hub. It was hard to scrape this data with our current tooling, so I decided to improve this in the binder-data repository so others can benefit from it too!

4 Likes

Thank you for putting in the effort to make the binder-data repo more useful for everyone! The weekly workflow, simplified data sources, and easy-to-browse MyST site really lower the barrier for exploring launch trends. I also appreciate the clear context on why traffic dipped, with the Google Cloud credit changes and shift toward JupyterLite and Hetzner hubsit helps the community understand the bigger picture. And yes, big kudos to GESIS for their steady support. This work not only makes the data easier to access but also helps all of us tell Binder’s story better.

3 Likes