Two metrics from Binder that have come up recently that we don’t have data on, because our analytics don’t capture them:
- people asking about launches via the API vs pageviews
- asking about specific clients (thebelab or other clients)
Until the events publication, we weren’t tracking API launches at all (outside prometheus).
The Referer and Origin headers can be used to identify users. Logging the Referer whole would be a big ol’ privacy violation, but I see two options:
- log only the hostname in the Referer/Origin, and only if it’s not an ip
- log a
kind='api'
flag (or similar) if the Referer doesn’t match the Host
I don’t know if capturing the hostname of the Referer is too personal to collect long-term (short-term logging is a different story). On one hand, I could imagine it potentially identifying users for low-volume or unique hostnames. On the other hand, I feel like it’s appropriate to see what sites are using mybinder.org to provide free, anonymous compute. I feel like we should at least do the second option so we can see how much binder is being used to launch kernels via the api from other sites, even if we decide not to track the site of origin.
Note that I am specifically not talking about the Referer to the page that builds, but only the Referer for the api requests. Binder links would show the Referer as mybinder.org, because the API request originates on our page.
For the second question, it might be useful to define an opt-in field that clients should use to identify themselves (e.g. X-Binder-Client: “thebelab 3.1.5”), so that we can see what clients are being used if they are interested in being tracked.