(edit: initially posted in Notebook, might be better in JupyterLab)
We’re looking in to ways to enhance telemetry around user interface activities on the NotebookPanel, like the scrolling and rendering of cells, creating new cells, changing content, etc. I’m hunting around in the source for a nice way to refer to a given notebook state, and thought that with the increased interest in collaborative notebooks it means that the notebook content is likely being replicated and thus a distributed referencing scheme might not be difficult.
The current approach I have in mind is to listen for signals against the NotebookModel.contentChanged() then create a UUID for the notebook and send this to my telemetry service. This should ensure on the back end I can always map notebook UUIDs to notebook contents. For actions which do not change the content of the notebook model but are otherwise interesting (e.g. rendering of cells on the screen, selection of text in a cell) I can then just log the event-specific information (e.g. which cells by identifier, or the text highlight offsets and cell identifier) and the notebook UUID to the telemetry service.
This breaks down when notebooks get big, and since I care about output cells too this can happen in some unfortunate cases (Altair, I’m looking at you) where the rendering of the output is verbose. Down the line I think what would be best is if I had a server-side service which always had a consistent clone of the notebook model – something I expect needs to happen for the collaboration features – and that this clone could be seamlessly being checkpointed in the back end. So telemetry then becomes a link to a hash in a git-like DVCS.
Thoughts on both our current approach and potential future? Other places we might want to consider connecting in for signals?