Hello! My name is Tyler and I’m interested in doing a broad survey to answer some basic questions about how academic data science is done and what is needed to optimize the end user experience. My hope is to organize a periodic survey that delivers sufficient information to properly segment user needs and experiences.
Please be forewarned, I’m not an expert on doing this sort of thing and I’m aware that I’m stepping out here without fully know the landscape of such efforts.
The overall intent is to stay somewhat ahead of the curve wrt what people are using and what they want. Part of it would be about currently use and pain points from a “workflow” aspect. Part would be on basic UI/UX questions. Another part would be on understanding what sort of hardware resources people use, i.e laptop, workstation, on prem cluster, cloud, etc.
Both the formulation of the survey and the results of the survey would be open. Due to the potential “herding cats aspect”, I’m thinking that the formulation of the survey would be less open to input but the results would be a community resource for anybody developing tools for the community.
Anyhow. @davclark let me know that this might be the appropriate place to start a discussion on this.
If so, let me know!
Erm… actually I started a topic here already:
First, I want to acknowledge that there are multiple studies in various stages of completion:
My colleague Tyler and I started discussing collaborations with folks at the recent SSI Collaborations workshop, and learned about this effort:
https://www.software.ac.uk/what-do-we-know-about-rses-results-our-international-surveys
Also spoke with Dan Katz and IIRC, there’s a survey ongoing with his group. Though that won’t be released until the initial paper comes out.
And then there’s the work tha…
Well, this fits so well with my point that I don’t fully know what I’m stepping into…Derp!
2 Likes
Also relevant for UX research with JupyterLab:
Hi everyone. We have recently created a team-compass repo for JupyterLab that will be used for JupyterLab org-wide process, decisions, meeting notes, coordination, etc. (similar to the team-compass of JupyterHub).
As a first issue I have created a proposal for the creation of a new JupyterLab repo that will enable us to begin (actually continue) a JupyterLab telemetry extension.
The issue is here:
We would love anyone with experience in telemetry to join the effort!
opened 05:04AM - 30 Apr 19 UTC
## Background
For the past ~2 years we have had a number of conversations abo… ut building a telemetry system for JupyterLab. By a telemetry system, we mean a system for collecting data about what actions users are performing in JupyterLab, how they are taking those actions (mouse, keyboard shortcut, button), and when. Telemetry data serves a number of purposes for organizations deploying Jupyter: i) operational and security monitoring, ii) intrusion detection, iii) compliance auditing, and iv) a-posteriori analysis of the platform’s usage and design–i.e. as a UX research tool.
## Tenets
There are certainly ethical and legal questions around telemetry systems. To address these, I propose the following tenets of the telemetry system:
* **Data protection by design and default.** JupyterLab and its telemetry system should come with builtin tools that enable safe and secure deployments with all types of data. See [Art. 25 of GDPR](https://gdpr-info.eu/art-25-gdpr/) for details about this tenet.
* **Make it easy to do the right things by default.** There are many ways to collect and use telemetry data that are illegal and/or irresponsible. JupyterLab's telemetry system should encode best practices and make it easy for operators to be responsible and comply with relevant laws.
* **Choice in balancing tradeoffs.** There are two types of data JupyterLab: 1) the actual datasets users are working with in notebooks, and 2) telemetry data about the Jupyter users. At times, protecting these two types of data at the same time will require tradeoffs. For example, if a deployment is doing research with sensitive HIPAA or FERPA data, the operators need to closely monitor every action taken by its researchers using JupyterLab to ensure the sensitive data is used appropriately. At the same time, in some jurisdictions (EU) Jupyter users may be protected by local laws (GDPR) about what telemetry data can be recorded, how it can be used, the terms of that usage.
* **Don't ignore the need for telemetry**. Organizations deploying Jupyter need to collect telemetry for a range of purposes. If we ignore this need, they will route around the project, with potential legal and ethical complications. By being proactive, we can establish best practices and guardrails.
## Current status
@ian-r-rose has created an exploratory JupyterLab telemetry extensions here:
https://github.com/ian-r-rose/jupyterlab-telemetry
This extension shows some of the major pieces requires:
* Hooks into JupyterLab's command system to record telemetry data when commands are run.
* Metadata added to the points in the code base where commands are triggered to indicate how they are triggered (mouse, keyboard, click, menu, etc.).
* Some sort of API/Interface for sending telemetry to an appropriate data store.
* Interfaces and UI components pertaining to the user experience of telemetry. This includes enabling users to approve or revoke permission to collect telemetry data, displaying telemetry related statuses, inspect and download it, etc.
It may also make sense to build differential privacy into our telemetry collection code to protect users. This isn't appropriate for all deployments, but in many cases it would be extremely helpful.
## Next steps
@Zsailer and @jaipreet-s have cycles in the coming months to work on the telemetry system. I propose that we create a new repo in the JupyterLab org for this work, possibly seeding it with Ian's previous work. We would love to see others contributing to this work as well–especially folks from organizations that 1) need to collect telemetry and 2) are experienced with doing so in a responsible and legal manner. I will also post on the discourse channel to let more people know about this.
Full disclosure - I work for AWS now so it is useful to sync on how AWS think about user data. Customer data privacy and protection is a top priority–more details can be found here:
https://aws.amazon.com/compliance/data-privacy-faq/