I just realized that nbestimate hasn’t been updated since 2020. Any chance that @parente knows what’s going on? I see it’s still running on Travis, maybe this is related to their new rate limits?
@ivanov reached out about this via email. To my shame, I still haven’t added my response (copied below) to the timeline in the project README, disabled the predictive model, and stared collecting the new counts.
The cronjob I run has an assert to ensure that the daily count is not drastically different from the prior day, to avoid collecting bad data when GitHub’s search index is having a bad day. In December 2020, the number of notebooks reported by GitHub search dropped from nearly 10 million back to 4.5 million, stayed there for a day or so, and then began climbing again from that new origin.
I don’t have an explanation for what happened (GitHub updated how they count ipynb files? They did a massive cleanup of repositories? They were accidentally counting private repos before and aren’t now?).
Ahh gotcha - is there anything that others could help you with there?
The repo should be collecting daily counts again in the CSV file. I’ve disabled the notebook execution until I have time to remove the predictive portions and have it render the historical count alone.
I should have googled sooner (ISHGS?): Changes to Code Search Indexing | GitHub Changelog posted December 17th, 2020
Starting today, GitHub Code Search will only index repositories that have had recent activity within the last year. Recent activity for a repository means that it has had a commit or has shown up in a search result [page, not count]. If the repository does not have any activity for an entire year, the repository will be purged from the Code Search index.
I’ve updated the timeline in the nbestimate README to include this info and have the notebook rendering the historical plot again.