So there’s some real grokking here and also maybe some misconceptions…
What Gigantum currently does / can do
Currently, we DON’T collect any telemetry. @betatim nails it when he says “the only easy thing is to screw it up.”
We do store a rich history of user behavior. I linked above about this, but since I can’t expect everyone to read everything I link to: this is similar to bash or R or IPython history, or a Jupyter notebook, but more “complete”. We store inputs, outputs, and environment in a git-savvy metadata record that let’s you know exactly what happened, and in what order, with approximately 100% reproducibility.
But again, we DON’T collect this as telemetry because that would be invasive. Unfortunately, we don’t know how to prove this, but I’d love to hear ideas - differential privacy doesn’t work here, because the granular data is part of the service we provide to users. Obviously, anyone can look at this history in public projects, but we’re not even scraping that before figuring out the ethics, etc.
We do provide a comment-based approach where if you include # gtm:ignore
in a cell, it won’t be recorded (though an empty record will be created so you know something happened). Initially, we were working on a cell metadata based approach, but unfortunately, you can’t currently send (arbitrary) cell metadata to the kernel.
To put the above in another way: we keep a fairly complete record from the perspective of the kernel, the filesystem, and the computational environment (via Docker). We provide a control surface based on cell comments, but would like something more fluent. As it is, this could be a great tool for research on what people are typing into cells.
The design for this was in the interests of reproducibility (cf. critiques of how notebooks obliterate history) and “building a tool for ourselves.” But in discussion we’ve realized that such a record might be useful for lots of things.
Regarding Front-end UI
Here, we’re only in the conceptual and in-person stage. We are presenting interactive mock-ups to users and our UX lead takes notes for summary and review. So, this isn’t turned on or off, it’s a completely distinct context. Currently, we use InVision, which is a really nice tool for this kind of mock-up, including the ability for a team to comment on different parts of the UI. We just hired our UX lead, and it’s still a somewhat social process of reviewing her findings and also looking at direct user feedback (which is linked from within the app and goes right into our roadmapping software, ProdPad, where we also gather results of discussions, etc.). I’ve spent a LOT of time teaching folks how to use GitHub and trying to get folks to use the issue tracker - but I find that for a full-time team, having these other tools usefully separates out different levels of discussion and makes the broader conversation more accessible within the team… but probably less accessible from the public perspective. I’m trying to answer @yuvipanda’s questions here… please let me know if I’m falling short.
I sat down at JupyterCon with a PhosphorJS UI/UX researcher, and it seemed that process was run quite well. Unfortunately, I don’t recall who ran that, and it was clear that various folks were engaged in various ways.
I’m sure some level of cross-pollinating would be useful here. If there are more specific questions for us, I’ll try to get them answered!
Moving forwards
I want to be clear that we’re a small start-up and we need to develop a path to sustainability (sound familiar?). But given that, I discussed the above with our leadership, and we’re very positive on the idea that @betatim suggests - that we “join forces with/lead an effort to build a Jupyter wide telemetry system.” There are clear benefits to us to work in good faith with the community to develop norms and approaches around telemetry. We also want to contribute to the Jupyter community since we’re obviously highly dependent on it!
This thread is developing into a decent collection of prior work that can help inform what we do next. The only design + engineering task so far is the issue I highlighted above about improving communication from the front-end to the kernel. This could obviously be extended to include arbitrary font-end telemetry. Or, that could be a separate engineering effort.
Probably first, though, we should figure out what we want to do (and subsequently, what data we want), with a particular eye towards respecting users needs and desires. I very much appreciate the suggestion that we develop an approach that Netflix/Bloomberg/etc. would not only permit, but would encourage their employees to enable.
Right now, I’d classify our interests into three buckets:
- Are users able to understand and use the tool? This is currently done in (semi-)scripted contexts in which users have specific tasks to perform.
- Are users using certain features? For this, it’d be great to have broader telemetry for the front-end.
- Where are users apparently getting stuck? This is harder to do, but one motivating example would be to identify a cell execution that produces output, and is run repeatedly with small changes. This would imply that a user is trying multiple times to “get something right”. This could inform designers of plotting libraries, and also the way outputs and messages get produced directly by Jupyter and/or Gigantum. To some extent, we can do this with the Gigantum activity record as-it-is, modulo putting some “research process” into place.
Having said all that, please let us know what would be useful from perspectives in the core Jupyter community.