OK - thank you, this is actually the kind of help I was asking for and I am less confused. Moreover, I’m pretty sure I laid the groundwork for that confusion… anyway:
- Instrumenting front-end telemetry already appears to be a movement inside the Jupyter community
- I’m honestly interested in figuring out how to be a good contributor
- We don’t have a good way at Gigantum to instrument front-end telemetry either
And so I’m happy to focus on front-end telemetry. In this case, I think my concerns about a new schema aren’t valid - there is no existing “message format” for client application events, and we don’t have a schema for them in Gigantum either.
And then regarding a few things from @betatim:
While IANAL, I’ve worked in corporate research consulting and it’s fascinatingly the case that unless you are a FIRPA or HIPAA covered entity, or you are receiving federal funds, you have no additional ethical constraints apart from the legal system proper.
That said, norms have been discovered by organizations like Facebook! Again, IANAL, but it’s my understanding that Facebook’s social manipulation experiments were completely legal (and internally considered benign), but were deemed unethical by at least some voices in the community.
While I agree with the general spirit of this norm: “get ethical review before covertly manipulating attitudes / beliefs”, this norm doesn’t seem to extend to product development, user interface testing, etc. and I’m not worried about it personally / from a business perspective.
BUT, to keep things complicated, as U.S.-based universities have become ever more risk-averse, the offices and boards involved are expanding what they want review for (at least based on my knowledge of computer scientists designing interfaces at UC Berkeley and Stanford Med).
I know some folks have done more traditional A/B testing on things like PhosphorJS, and I assume if there was someone who needed ethical review, they would have already at least emailed their local IRB. Can we figure out if anyone at a federally funded university or research org has been engaged like this? My guess is that the IRB won’t have anything to say until a specific experiment is posed, and we just have to be as conscientious as we can for now.
I’m more concerned about discovering community norms for what’s reasonable or even pleasant. I think @yuvipanda has done a great job collecting various organizations thinking on data collection and privacy, and it’d be great if we figure out approaches that could be used broadly across different tools. For now, there are easy questions, like: if we install a Jupyter plugin, are folks using it?
I’m most interested in the meta-questions at this point - like designing a good user experience for opt-in to telemetry, etc. From my perspective at Gigantum (held over from D-Lab) - trying to support a long tail of researchers of varying degrees of sophistication - I’m interested in some basic questions like whether users use the following at all:
- Tab docking
- Panel hiding
- The text editor
- The terminal
Is there a way to present questions / messages to users within Jupyter? Ideally in an extensible way so that a calling process could configure that?
That sounds right. Once you’re in Docker, it’s pretty easy to subscribe to an IP endpoint, or mount a file across containers. Writing results to a JSON file seems like a good way to make results accessible… and such records could easily be scooped up and embedded elsewhere.
Regarding the work Gigantum has done, the raw data are already available from Jupyter’s IOPub API. I’m happy to walk folks through how we do that, or any of our code - and see if it would be useful more broadly.
In my experience, it’s hard to get folks to thread through demos to understand something like automated data collection (though we have a demo up at https://try.gigantum.com). Probably a video that illustrates interaction with the tool and how data is captured is a good way to do it? I’ll do this for Gigantum anyway, and thanks for helping me think through this! I think the Binder demo @yuvipanda put up is nice in the way the output file is right there in the Jupyter “home”. Perhaps easy demo-ability should be a design goal? In other words - how quickly / easily can I teach a researcher how to use jupyter-telemetry?
And if you’ve made it this far, I now see how it makes sense to move (some of) this conversation over to the jupyter-telemetry issue / PR for. I’m of mixed minds here, as I think Discourse is more accessible (and I value this highly) - but that said, the next steps here may be focused on a more technical crowd.
So, I’ll leave things there and in the meantime work internally with my team to figure out what’s on our wish-list.