Potential collaboration on user research?

First, I want to acknowledge that there are multiple studies in various stages of completion:

My colleague Tyler and I started discussing collaborations with folks at the recent SSI Collaborations workshop, and learned about this effort:

https://www.software.ac.uk/what-do-we-know-about-rses-results-our-international-surveys

Also spoke with Dan Katz and IIRC, there’s a survey ongoing with his group. Though that won’t be released until the initial paper comes out.

And then there’s the work that’s informed the nice behavior of tabs in PhosphorJS and so forth.

But none of those get to the core of what we want to figure out - which is generally how people are currently doing data science, with what tools, and what’s taking up time and cognitive effort. At Gigantum we’re particularly focused on collaboration and reproducibility (or “SciOps”) for the largest set of people possible. Getting into a twittery conversation with @choldgraf, he suggested I start a topic here.

I’m asking our lead researcher (Jen) to introduce herself here, and make any corrections to what I’m suggesting!

I also want to be clear that Gigantum offers a potentially useful way to inspect the behavior of users via our activity record, which captures the state of the filesystem, environment, and execution history at the moment of each Jupyter cell execution.

The main question at this point is: are you interested in a larger survey that would inform the whole community about what people are currently doing and how it could be improved? Is there already an ongoing effort we should join? Are there others we should seek out?

And if anyone is interested in using our activity records to look at Jupyter usage (perhaps more focused on the coding aspect), that’d be neat too.

Many thanks!

Thanks for reaching out! Could you go into detail on what kind of user research you’d like to do? I think I may have mis-read your original goal (I thought you wanted to do UI/UX research, but it sounds like this is more general than that)

I know that JupyterHub folks have spoken a bunch about collecting information about user behavior etc (we also considered doing this a while back on mybinder.org, but IIRC decided not to to respect user privacy), so I think there’s interest in this. I believe that @Zsailer may be working on these kinds of things w/ JupyterHub, maybe he is interested?

We would like to create two different things that would be public resources for everybody to use. The first project is a robust taxonomy of “users, activities and pain points” that people could use to do some initial validation of an idea they have.

The second project is more UI/UX part, and I can’t speak to that but I’m sure that Jen will when she gets on here.

The thing that I would like to see is a form based survey coupled with some actual interviews that collects information on the current state of practices, preferences, and problems in academic data science. This would address what tools and frameworks people are using for what purposes, what the pain points are around those activities and tools, as well as what works well for people. It would also address things like what sorts of resources people are using, i.e. cloud vs laptop vs some sort of on prem infrastructure.

It would be something that gets updated every so often in order to stay relevant and give information about shifts over time.

One purpose would be using this information to create a data set that could be used to develop high level persona and story frameworks in support of developing new “products” aimed at the academic community. There are other purposes too, but I think this mentioned one is probably the most primary.

It may not be realistic, but it seems we should develop a taxonomy of users that is based on actual data (not just anecdotal) that would help people validate the need for a particular solution or idea they have. It would save time and shape stuff to better fit what people need. At least it seems like it would.

Anyways, if this is of interest, then let’s discuss. We are willing to devote resources towards this and to make sure that the type of information gathered would be useful to other people, not just ourselves.

very cool! I think the best way to go about this is to find others who are interested in this process, and build a coalition of folks who’d like to make a push on this (e.g. there’s an accessibility thread to mobilize folks around a11y). Jupyter isn’t really a single-actor, it’s more like a coalition of organizations that together make up the community.

Perhaps @saulshanabrook and the quansight folks have thought about this? At Berkeley I think this is something we’d love to see, though don’t have much resources at the moment (hopefully soon we’ll have more capacity for this kind of thing)

i’m happy that it seems cool to you! i think that it will be a lot of work but also a lot of fun, and hopefully a lot of value to people.

it will be 100% better the more people that get involved, and i think coalition is the way to do this.

in terms of resources, one thing that i’ve got is time to work on getting the form survey into the proper shape based on input and ideas.

if we can come up with a proper interview that grabs the qualitative information to help create the stories around the collected data, then i can also allocate a budget towards “rewards in appreciation of” for participants. i’ve been giving folks that participate in our interviews an Amazon gift certificate or a gift card that lets them donate to the charity of their choice.

i was able to do about 25 surveys just on my own, with a lot of legwork put into recruiting participants. so if there were like another 5 to 10 people that would also do the interviews, we could get a pretty decent set of interviews together.

i also think that if the interviews were done as part of a coalition, not just on the part of a private company, it would help to boost the number of positive responses for participation and incline people to agree to having the results be openly available (anonymized of course).

I want to add that our biggest need is help even with making our own data collection efforts public.

Is there a likely collaborator who’d be willing to work with us on IRB and stewardship of the dataset? Like Tyler said, we’re collecting data - we just don’t have the resources to share it.

http://www.opendatarepository.org/about-us/ might fit the bill. There’s other similar services, but I have no reliable working knowledge.

On the collecting side, https://github.com/opendatakit might be helpful (especially regarding mobile support).

1 Like

@jhermann, I had a quick look at the Open Data Repository, and it doesn’t seem to have any sensitivity to social science issues - so while it may be an OK place to deposit, I don’t know if they’re an appropriate “partner.” Given economic constraints, the default will be that we will do our research and keep it to ourselves - we are trying to find partners to work with us on a broader initiative that will benefit the whole community, even if all the partner does is work on the ethics and consent side. I’ve done this kind of thing in the past, and it’s a non-trivial amount of work. Am I missing something?

Organizations that I know are a bit more tuned to data use restrictions are Databrary and the OSF.

Regarding the appropriate use of telemetry, I’m copying a link from @ellisonbg on another thread to keep ideas in one place:

I want to call out a link from @yuvipanda in the above-linked Jupyter telemetry issue: https://m.mediawiki.org/wiki/Extension:EventLogging/Guide

This seems like a great, well-thought-out approach to ethical and helpful telemetry, and I’m curious to know more about what Jupyter is going from there. I’m trying to keep conversation consolidated here, so it will be more accessible to folks not initiated to GitHub… but I’ll drop a line there too.

1 Like

anothing person who might be interested is @alexmorley (that’s a github handle, I don’t think he’s on the Discourse) who recently showed off some cool Jupyter UI design he’s been playing around with.

Thank you for opening this, @davclark! I’m very interested in the following:

  1. How you collect user behavior telemetry data (links clicked, buttons used, time on site, etc)? I guess this is probably most used for UI / UX experiments, but maybe also operational events. Would love to know what has worked for you and what your general methodoly has been.
  2. What tools do you use for this?
  3. Who drives this forward? Is it driven by UI / UX? how do they work to get it implemented?
  4. Is it always collecting, or is there usually a specific time period after which the data collection is turned off - for example, you might collect data to A/B test a specific user interface element, and turn it off once that decision has been made.

In general, I’m interested in how product decisions are made in a service like Gigantum around Jupyter / Data science related workflows. Any information you can provide would be useful <3

1 Like

How can we organise for Gigantum to join forces with/lead an effort to build a Jupyter wide telemetry system? The benefit for Gigantum is that data is collected from the larger Jupyter user and the advantage for Jupyter is that we get the tooling needed to collect telemetry. Seems like a win win.

To increase the chances that users opt-in to this data collection it would have to make an effort to go above and beyond in terms of privacy, anonymisation and the need to trust those who are collecting the data. IMHO a goal to strive for is to setup things so that notebook users at Netflix/Bloomberg/etc could be allowed to turn this on.

Adding some links on how others have approached the privacy challenges:

I need to do a bit more digging in my history/bookmarks for more resources from the big tech organisations who have a solution (even if it is a crappy one) to most of the privacy and technical issues involved.

As with all things crypto: the only easy thing is to screw it up :wink:

1 Like

So there’s some real grokking here and also maybe some misconceptions…

What Gigantum currently does / can do

Currently, we DON’T collect any telemetry. @betatim nails it when he says “the only easy thing is to screw it up.”

We do store a rich history of user behavior. I linked above about this, but since I can’t expect everyone to read everything I link to: this is similar to bash or R or IPython history, or a Jupyter notebook, but more “complete”. We store inputs, outputs, and environment in a git-savvy metadata record that let’s you know exactly what happened, and in what order, with approximately 100% reproducibility.

But again, we DON’T collect this as telemetry because that would be invasive. Unfortunately, we don’t know how to prove this, but I’d love to hear ideas - differential privacy doesn’t work here, because the granular data is part of the service we provide to users. Obviously, anyone can look at this history in public projects, but we’re not even scraping that before figuring out the ethics, etc.

We do provide a comment-based approach where if you include # gtm:ignore in a cell, it won’t be recorded (though an empty record will be created so you know something happened). Initially, we were working on a cell metadata based approach, but unfortunately, you can’t currently send (arbitrary) cell metadata to the kernel.

To put the above in another way: we keep a fairly complete record from the perspective of the kernel, the filesystem, and the computational environment (via Docker). We provide a control surface based on cell comments, but would like something more fluent. As it is, this could be a great tool for research on what people are typing into cells.

The design for this was in the interests of reproducibility (cf. critiques of how notebooks obliterate history) and “building a tool for ourselves.” But in discussion we’ve realized that such a record might be useful for lots of things.

Regarding Front-end UI

Here, we’re only in the conceptual and in-person stage. We are presenting interactive mock-ups to users and our UX lead takes notes for summary and review. So, this isn’t turned on or off, it’s a completely distinct context. Currently, we use InVision, which is a really nice tool for this kind of mock-up, including the ability for a team to comment on different parts of the UI. We just hired our UX lead, and it’s still a somewhat social process of reviewing her findings and also looking at direct user feedback (which is linked from within the app and goes right into our roadmapping software, ProdPad, where we also gather results of discussions, etc.). I’ve spent a LOT of time teaching folks how to use GitHub and trying to get folks to use the issue tracker - but I find that for a full-time team, having these other tools usefully separates out different levels of discussion and makes the broader conversation more accessible within the team… but probably less accessible from the public perspective. I’m trying to answer @yuvipanda’s questions here… please let me know if I’m falling short.

I sat down at JupyterCon with a PhosphorJS UI/UX researcher, and it seemed that process was run quite well. Unfortunately, I don’t recall who ran that, and it was clear that various folks were engaged in various ways.

I’m sure some level of cross-pollinating would be useful here. If there are more specific questions for us, I’ll try to get them answered!

Moving forwards

I want to be clear that we’re a small start-up and we need to develop a path to sustainability (sound familiar?). But given that, I discussed the above with our leadership, and we’re very positive on the idea that @betatim suggests - that we “join forces with/lead an effort to build a Jupyter wide telemetry system.” There are clear benefits to us to work in good faith with the community to develop norms and approaches around telemetry. We also want to contribute to the Jupyter community since we’re obviously highly dependent on it!

This thread is developing into a decent collection of prior work that can help inform what we do next. The only design + engineering task so far is the issue I highlighted above about improving communication from the front-end to the kernel. This could obviously be extended to include arbitrary font-end telemetry. Or, that could be a separate engineering effort.

Probably first, though, we should figure out what we want to do (and subsequently, what data we want), with a particular eye towards respecting users needs and desires. I very much appreciate the suggestion that we develop an approach that Netflix/Bloomberg/etc. would not only permit, but would encourage their employees to enable.

Right now, I’d classify our interests into three buckets:

  1. Are users able to understand and use the tool? This is currently done in (semi-)scripted contexts in which users have specific tasks to perform.
  2. Are users using certain features? For this, it’d be great to have broader telemetry for the front-end.
  3. Where are users apparently getting stuck? This is harder to do, but one motivating example would be to identify a cell execution that produces output, and is run repeatedly with small changes. This would imply that a user is trying multiple times to “get something right”. This could inform designers of plotting libraries, and also the way outputs and messages get produced directly by Jupyter and/or Gigantum. To some extent, we can do this with the Gigantum activity record as-it-is, modulo putting some “research process” into place.

Having said all that, please let us know what would be useful from perspectives in the core Jupyter community.

2 Likes

Clarification: I only linked to our activity record article on this Jupyter GitHub issue

I spent a bunch of time focusing on eventlogging/telemetry on this today, and here are some results.

here is a prototype demo (with a binder link!) of eventlogging where we capture all commands executed in lab in a schema conformant, type-safe (ish) way, and configurably log it (in this case) to a file. Alongside is a 2000 word strawman design document that I hope will help discussion.

I think conversations around these PRs would be great! I’m also happy to scribe comments made here to PRs as appropriate if folks prefer chatting here - although perhaps a new topic might be appropriate.

I think any instrumentation should be done by adding mere hooks into the main code, which then can be used by a `*-telemetry’ package to do the actual thing.

Why? So you can ‘prove’ to your security people that telemetry is disabled in a reliable way, by not installing that package.

Thanks for the feedback, @jhermann. I’ve added the problem of ‘how can we signal that there is no data being sent anywhere’ to the list of open questions in the design document. I’d love to keep conversations focused there rather than spread out, so would appreciate if you can read it and leave comments there. Thank you.

In short - I’m a bit confused at this point about how to proceed.

I’ll point out again that Gigantum has code to parse Jupyter messages and turn them into a view that is backed by a git-aware database approach. Here, for example, is our processor that handles specific Jupyter events that include text, code, or images

This includes a commenting system that allows users to explicitly mark cells as ignored.

We subscribe to messages from the kernel, and process them here.

We actually also wrote a jupyter plug-in here, before finding out that you can’t send metadata back to the kernel from the client. But I think the approach taken in jupyterlab-telemetry seems like an approach we could use as well… jupyterlab-telemetry was initiated after an initial version of Gigantum was completed, and I’m wishing we’d realized some of the redundancy earlier.

I want to point out also that our approach needs to be agnostic to Jupyter per se, as we also are wrapping things like RStudio, for example (we’ll be announcing this next week). I know JupyterHub and Binder also can wrap other tools. This makes me want to minimize intermediate abstractions.

Having a schema for records would be great - but I’m also wondering why not just describe and use the kernel messages themselves? We already have tooling around kernel messages, and shifting that will generate more work (and seems like maybe extra redundancy)? We are also already targeting a schema (defined by our Activity Record / Activity Detail Record classes). Ultimately, what’s useful for us is stuff that’ll help us simplify and maintain this code (what I linked is a good starting point).

I think a central challenge here is that Gigantum has a technical solution subject to a set of constraints, and you’re working on another technical solution and sourcing a community set of constraints. I think that somehow we need to get to a shared technical solution, and I’m not sure how to get there! Perhaps this is what @choldgraf was talking about in terms of difficulties collaborating when you start late in the process.

It may surprise you, but I’ve actually deleted a good chunk of stuff I wrote already! I’m trying to work through some confusion, and I’m not there yet - so I’m writing to expose my thoughts and hopefully get some guidance and feedback.

For us to derive value from the process, I think it’d be useful to communicate the problem Gigantum is trying to solve and how we’re doing it. Of course, we’re already deriving value from the Jupyter project, and so we want to support it anyway.

Can anyone help get me unstuck and figure out how to (at a minimum) find a way to support telemetry in Jupyter? I don’t want to add our needs to the design constraints if we’re not going to use it!

Sorry for the confusion, @davclark. I think there are at least two separate conversations here - one about telemetry on user actions related to interfaces and actions, and one around recording of user’s work. My post was about the former, and I think yours is generally about the latter. I think there’s probably an intersection in terms of people interested, but I agree that the solutions are probably technically very different. The word ‘telemetry’ is so broad that I think it

It sounds like at least on the folks in this discourse, Gigantum is sortof at the cutting edge of the kind of work you are doing. I know CoCalc is doing something similar as well, and am sure there are other folks too.

Either way, I think there are two conversations intermingled here, and am sorry for the confusion.

1 Like

I’m not sure how to get unstuck or reduce the overlap/confusion. Here is my attempt:

  • I think there are several tools that solve (different) parts of the technical hurdles involved. Recording kernel messages and recording where/what/if/when users click on UI elements.

  • what do all the involved parties need/want to get out of this exercise? For example IRB review/approval: is this something needed and if yes “who to talk to about getting it” seems like another next action (I personally have no opinion nor experience :-/)

  • are there already questions that we’d like to answer by using the system? If not maybe that is a first thing to do. I think if we can pick a set of (simple) questions that would be good because a) user data collection is better if you define your question first b) we have a concrete problem to solve and c) we can get started with a full cycle sooner. Even if we just use the people on this thread as “users” I think being able to do a full cycle would be useful.

  • I think we want a solution that works with “stock” jupyter so that we can reach as many users as possible (getting user opt-in is hard, getting user opt-in by asking them to replace their existing tool is even harder). Is this assumption correct and technically possible with the work Gigantum has done?

  • can we create a demo of each of the currently existing tools so that people can try them out? I think this will help make it concrete what each tool does and doesn’t do.