Analysis on thousands of notebooks for JupyterCon 2020: what would you like to see?

I’ve collected a few tens of thousands of Jupyter notebooks upon which I’ve been doing analysis. I’ll be turning what I’ve learned into a short presentation for the upcoming JupyterCon.

Along with the presentation, I’ll be publishing the corpus of notebooks along with the code for running map-reduce jobs to do the analysis.

Ahead of that, I thought I’d write and ask for suggestions from this community. Are there things you’d like to learn, questions you’d like to answer, visualizations you’d like to see, or anything related to using Jupyter notebooks as a data source in and of themselves?

I’ve got enough tooling that I should be able to incorporate your suggestions into the presentation before the deadline.

Thank you,

Dan Grigsby

Maybe more integrations with FS like S3 and what not, also Jupyterdash stuff are pretty useful thing to analyze

I’d love to see some stats on which libraries people are importing, and which functions within those modules people are calling! I saw a study done where folks tried to figure out what kinds of charts people were making in their notebooks by figuring out which matplotlib functions are being called, and this is really interesting to me, although I’d want to know for Plotly :slight_smile:

1 Like