Annotating Jupyter notebooks

This is picking up a discussion started several years ago at a workshop about how to enable annotation for notebooks.

What is meant with annotation? Tools like https://web.hypothes.is/ that follow the W3C standard (or coming standard?). More here.

To see it in action on an example head over to http://ivory.idyll.org/blog/2019-communities-of-effort.html which has a few extra buttons in the top right:

This works for websites and also for documents like PDFs. In order for everyone to see the same annotations for the same PDF (even when you get your copy by email) hypothesis uses a unique identifier for the file. For PDF files this identifier is part of the format (I think). For more details on how hypothesis uses it checkout https://web.hypothes.is/help/how-hypothesis-interacts-with-document-metadata/#what-happens-when-urls-change

One idea on how to get similar functionality for notebooks would be to add a unique identifier to the metadata of the notebook and for renderers (classic notebook, jupyter lab, nteract, nbpreview) that want to allow annotations to then render something like a <link rel="canonical" href="http://notebooks.jupyter.org/<identifier-for-the-metadata>" in the HTML they generate. Or maybe a meta tag is better than using a canonical as the canonical link is also looked at by search engines.

How would you generate this unique identifier? Maybe it is enough to generate a random 32byte value when the notebook is created. For those who are interested the code that PDF.js uses to generate the fingerprint for PDFs is here.

What do you think of adding a extra field in the metadata and setting it to a random value on document creation? Then rendering that in the HTML version of a notebook so annotation tools can use it as identifier?

2 Likes

There are two levels at which identifiers in a Jupyter notebook might interact usefully with annotation software.

Element level. Fernando Perez suggested, some years ago, that per-node identifiers could be important. A t the time, as I recall, they didn’t exist. But anyway the idea is that while anchoring annotations to selections of text in a rendered notebook, by default Hypothesis will do it based only on the position of the target selection in the stream of rendered text, and on the target text itself, surrounded by a prefix/suffix context window. Maybe that’s fine, but it could be interesting to anchor annotations relative to nodes in the notebook, if they are identified, and depending on how that identification surfaces in the rendered page. If none of this is readily available, then the notebook is just a web page from Hypothesis’ point of view, and it deals with it in the way it normally does.

Document level. Here we want URL-independent identifiers. There’s no need to follow the PDF fingerprint model, something human-readable would be better. I would not recommend rel=“canonical” but rather the dc.identifier/dc.relation.ispartof metadata pair described here: https://web.hypothes.is/help/how-hypothesis-interacts-with-document-metadata/#what-happens-when-urls-change. The two parts combine, and you can use the identifier/relation pair however makes sense.

1 Like

Making the annotations independent of matching the HTML would be nice so that they end up in the same place across UIs (they presumably generate different enough HTML).

Each cell has a metadata field as well so we could have a unique ID there as well. Would the dc.relation.ispartof tags work for cells to indicate they are part of the whole document or how would it work?

On twitter Tony pointed to https://github.com/jupyterlab/jupyterlab-commenting as something people are working on to create a new commenting system that works in jupyter lab.

Hey Tim, I lost track of this, sorry.

If the cell’s ID surfaced as a fragment ID, then the https://www.w3.org/TR/annotation-model/#h-fragment-selector could be appropriate.

As it happens, Hypothesis used to record a FragmentSelector with annotations but there wasn’t a compelling use for it.

Here’s an example of an annotation that did record a FragmentSelector:

https://jsoneditoronline.org/?url=https://hypothes.is/api/annotations/9mT3gsbQEeag7MOeBozdXQ

Drill down into target -> selector and you’ll find 4 selectors.

The value of the FragmentSelector (the one we don’t use any more) is main, because that’s the governing id (<div id=“main” role=“main”>).

If cells had ids, and if we reinstituted the use of FragmentSelectors, that could be a nice combination.

1 Like