Hi, I’m teaching a programming course for students getting into data science. I’m trying to find a list of jupyter notebooks which are clean, follow best practices and are generally good enough for me to share with my students.
There are plenty of “tutorial” notebooks which teach pandas or sklearn. I’m looking for notebooks where the author has done actual analysis; either explored some data set or built a model. The notebooks should be formatted nicely (use headings, comments, visualizations). The data should be cleaned properly, perhaps using pipelines. I actually don’t care if their model (if there is one) produces great results.
Ideally I would have at least a dozen or so such notebooks so they can see actual, industrial use of notebooks and learn from them.
I guess this is a real challenge because of the different research domains. We quickly arrive at levels that are not common knowledge anymore, therefore the preprocessing might not be plausible for an outsider. Furthermore, Jupyter Notebooks that examine data in an industrial setting often use data those companies don’t want to share with third parties. This is why we supervise theses at companies - so that we get more insights. But we are not allowed to share them. If somebody knows a step-by-step-analysis, I am always interested!
A Jupyter Notebook that is less on an introductory level is e.g. https://github.com/1kastner/machine-learning-hype-or-hybris/blob/bc57d12f6a95ed06d9aeefbfa7960a18bd738a17/02%20einsatzszenarien/maschinelles-sehen--selbststudienzeit/03%20Klassifiziere%20Verkehrsschilder.ipynb – I am sorry that it is in German.
I remember there were some scientific platforms out there where researchers could submit Jupyter Notebooks and the data that belonged to a journal publication. I just forgot the name of those (I think there were maybe two or three larger ones?). I guess those people did put a lot of effort on style and comprehensibility.