Announcing python-popopularity-contest - collect information on which libraries your users are actually using

yuvipanda · July 11, 2021, 2:43pm

In opened a [Request for Implementation] a while ago, about a privacy-preserving way to collect data on which installed libraries are being used by users on a JupyterHub. The primary goal is to helps find and remove unused libraries, but am sure that aggregated usage stats of libraries in a given community have other uses.

python-popularity-contest does just that! It collects pre-aggregated, anonymized data on which installed libraries are being actively used by your users. Privacy is very important here, so collect only exactly the amount of data needed.
We want to collect just enough data to help with the following tasks:

Remove unused library that have never been imported. These can probably be removed without a lot of breakage for individual users
Provide aggregate statistics about the ‘popularity’ of a library to add a data point for understanding how important a particular library is to a group of users. This can help with funding requests, better training recommendations, etc.

To collect the smallest amount of data possible, we aggregate this at source. Only overall global counts are stored, without any individual record of each source. This is much better than storing per-user or per-process records. It filters out standard library modules and any local modules the user has written. It also ignores all modules used by ipython when starting a kernel, so we can be as close to ‘just the libraries used by the user in a notebook’ as possible.

It also emits information about libraries (something you can pip or conda install) rather than modules (which you import). This offers some more privacy, and is also the more useful in answering our questions.

I deployed this in a Berkeley hub yesterday, and it provides data of this form:

Am excited to use this for slimming down my images!

Lots of inspiration from @rcthomas’ solution, although populartiy-contest is a bit simpler and reports libraries instead of modules.

I’ve tried to provide technical and deployment information in the project README. Please check it out and let me know what you think!

Jeremy_Ravenel · July 14, 2021, 9:04pm

Great insights here @yuvipanda

Topic		Replies	Views
[Request for Implementation] Instrument libraries actively used by users on a JupyterHub JupyterHub	6	786	February 18, 2021
Has anyone implemented JupyterHub dashboards to show usage / adoption / etc? JupyterHub	5	393	March 9, 2021
Potential collaboration on user research? JupyterHub	20	2516	May 23, 2019
Jupyter Annual Survey Results General	6	773	May 16, 2022
Discovering what the majority of users of a project are actually doing General	2	669	April 15, 2020

Announcing python-popopularity-contest - collect information on which libraries your users are actually using

Related Topics