Jupyter Community Statistics

Hi

A few times in the Governance meetings we have mentioned being able to get some statistics that could help us better understand the different communities. I know @choldgraf has done some work and produced some summary reports based on the data found in the github repositories.

https://nbviewer.jupyter.org/github/choldgraf/jupyter-activity-snapshot/blob/master/reports/summaries/time.ipynb

https://nbviewer.jupyter.org/github/choldgraf/jupyter-activity-snapshot/blob/master/reports/summaries/pony_factor.ipynb

https://nbviewer.jupyter.org/github/choldgraf/jupyter-activity-snapshot/blob/master/reports/summaries/retention.ipynb

I also work on some open source projects at Apache and one of the projects there I think could be very useful in gathering community statistics. The project is called Apache Kibble and even though it is in incubation and still being developed, we can still use it to gather some potentially interesting information.

So what I’d like to do as a Proof of Concept (PoC) is to load some of the Jupyter repos into Kibble and see what we get. So I selected notebook and have loaded the data from the github repository and also the discourse forum.

Here are a couple of examples of what it can give us (This is the Notebook Github repo - I ran this yesterday and the time period is over the last 6 months)

And this is analysis of the Notebook discourse forum that ran yesterday and once again the time period is over the last 6 months.

I haven’t checked anything so would really like to get some feedback on the information to see whether it could be useful. I’d also like to know whether here in Governance the best place to have the discussion or should I take it to the Notebook community as this it might be more relevant to them?

If anyone wants access to the PoC Kibble instance then let me know and also if anyone has any suggestions for any other area or areas we could use import data for as part of the PoC then also please let me know.

4 Likes

This is really cool! I’d be interested in accessing the dashboard as well :slight_smile:

On the question of dashboards etc for the community, my big question is always: what are the unintended consequences of publishing statistics like these. We should make sure that whatever we publish doesn’t create incentives we don’t wish to create, or emphasize certain kinds of contributions over others. Does that make sense?

Is it possible to have multiple repositories represented in a dashboard? I say this because I think most of the ā€œmajorā€ Jupyter projects are more like collections of repositories rather than a single repo.

2 Likes

Thanks for the feedback. I will send you a link and you just sign up.

To answer your last question first. Yes you can link multiple sources / collection of repositories together into a ā€˜View’ and then display the dashboard using that view.

A minor correction on my side - Kibble is a Top Level Project. It is currently used to provide these type of statistics for every single Apache project as part of its live demo so anyone can login as a guest and see the stats for any project they want to look at. The main objective is to get a view of what is happening and to try and see if there is anywhere that needs to be focussed on.

I understand your concern about unintended consequences so let’s continue to discuss things as we run through the PoC and see what makes sense and what doesn’t.

We have the potential to collect different information - code or github contributions, it can also look at how long people have been involved, it can also look at mailing lists or in the case for Jupyter - the discourse forum to see what people are talking about. Looking into the forum is about trying to capture the non coding related contributions - what are people talking about such as initiatives, announcements, events, marketing etc. Some people’s contribution is responding or helping out others.
It also tries to pull in the issue tracker and looks at the people contributing to creating and resolving issues
There is also a section that identifies the new contributors and when they made their first commit or post.

So we have several potential places where we are trying to identify contributors.

As well as Notebook, I’ve been asked to add JupyterLab to the Kibble demo so will do that and see what the statistics look like.

Both Jupyterlab and Jupyterhub have been added to the demo. I hope to do a brief demo of what the Jupyterhub statistics are looking like at their meeting this week.

As well as having the statistics via projects, we can also have the information at an aggregated level - so a high level view of Project Jupyter as a whole (to do that I’d need to load all the Jupyter related repositories which is completely possible, I just need to know whether to do it or not as part of this demo!)

If anyone else wants to take a look at the demo and the statistics being produced then please let me know. Also if you’d like me to add any other Jupyter repositories to the demo, just let me know. :slightly_smiling_face: