Graph of launches by provider per day (warning: takes a while for the graph to load). Not suprisingly it’s dominated by GitHub, I couldn’t find a way to display log10(count(provider)). You can easily distinguish weekday vs weekend though.
mybinder.org is ticking along at roughly 140,000 launches a week. So you still have some time (14-15weeks) to get your favourite drink chilled for the 10M celebration.
Hereoku is saying that “the app crashed” which is not super insightful Last deploy was in January so it seems weird that it would now start crashing.
I’ll look into it.
Longer term we need some smart ideas because the size of the database file is pushing various limits for free heroku instances. Maybe we can host it on mybinder.org infrastructure?
Right now I run datasette deploy heroku ... and it magically figures out all the things. Building the sqlite database from the analytics archive takes quite a while on my laptop (I start it and then leave it until a few hours later I notice it is done). Al this made me think what the easiest way would be to build and deploy it. Maybe with a bit of tweaking we can make the notebook in GitHub - betatim/binder-datasette: Tools to create a datasette for mybinder.org faster at creating the DB. Then we could build an image for this service like we do for the analytics archive and federation redirector image (via GH action on mybinder.org-deploy)?
Is this because it creates the whole db from scratch every time? I imagine it would be pretty quick to only do inserts on new data, since I would guess the bulk of the time is making an http request for every day since we started collecting data.
If you could:
fetch/open an existing db
retrieve the date of the last item
collect and insert only new events since the last item
then it seems like it wouldn’t be such a big job to run every day or so, since it would only be one or two http requests, a few thousand inserts.
every day or so runs an update to fetch only new events and add them to events.db (I’m not sure if datasette serve would need to restart after these updates. I wouldn’t think so.)
I think so. I don’t remember exactly why it is setup like this. My vague memory is that the file size got bigger if I appended instead of recreated? Or maybe I was too lazy to write the “figure out where to resume from” code.
TL;DR: append-instead-of-recreate is the thing to do
I just updated the DB and now the resulting heroku image is not only over their soft limit (300MB) but also above their hard limit (500MB). So moving to mybinder.org hosting is required if we want to add new events (We are at about 16M rows now)
Now that we’ve got chartpress all set up on mybinder.org-deploy, building an image with datasette and the scraper and adding it to the deployment like our analytics archive shouldn’t be a huge undertaking.