Discovering what the majority of users of a project are actually doing

The universe is full of dark matter. It is “dark matter” because you can only tell that is is there by its effects on gravity. A bit like something that deflects your straight path across a large empty hall. However if you look around you don’t see anything out of the ordinary.

Lots of users, installations and use-cases for projects like JupyterHub and BinderHub are like dark matter. They only weakly interact with the project at large. They do not interact via the strong force of open-source which is opening issues, contributing to the project, writing blogs and giving talks about what they are doing. This is a shame.

Dark matter vastly outnumbers ordinary matter (maybe 85:15?). I guess the ratio of users/installations out there that only interact weakly with the project to the number of “strong forces” is similar. This is a shame because we miss out on a lot of ideas and wisdom from this group. It is also a shame because you don’t know what the majority of your users are actually doing with what you’ve built.

Those who come complaining about a (obscure?) feature being removed (or added) are the vocal ones. Are they the vocal minority or the voice of the masses? We have no idea really.

The question at the end of this post is: what can and should we be doing to bring some light into this? Lots of software these days phones home. Some of it continuously, some of it only when you install it, some of it sends private details, some only well anonymised summaries.

What are tools you respect and how do they report usage stats? Have you seen someone do something really clever to surface statistics about their install/use base?

What are ways of learning about your users that doesn’t involve spying on them via software?

Maybe the best way to observe these users is like observing dark matter. Indirectly. Though I am not sure what the equivalent of the bullet cluster would be…

I’d love to know your thoughts.


Hi @betatim, looking at analytics of a project’s documentation can give a lot of insights (where do people come from, which OS are they using, how much time they spend on each page etc.). I think it’s a good trade-off in terms of respecting users privacy.

1 Like

I like the idea. We already have some google analytics on the documentation pages so we could look at that and see what we see.

Do you think that would work with finding out which features of BinderHub or JupyterHub people use when they install it “at home”? The “install it for many users as a server” seems different from a library people use every day. Mostly because you have less setups. Could definitely learn how people setup things.

Does anyone have opinions on “phone home” things like (now retired)?