Annotating Jupyter notebooks

betatim · September 8, 2019, 7:45am

This is picking up a discussion started several years ago at a workshop about how to enable annotation for notebooks.

What is meant with annotation? Tools like https://web.hypothes.is/ that follow the W3C standard (or coming standard?). More here.

To see it in action on an example head over to http://ivory.idyll.org/blog/2019-communities-of-effort.html which has a few extra buttons in the top right:

This works for websites and also for documents like PDFs. In order for everyone to see the same annotations for the same PDF (even when you get your copy by email) hypothesis uses a unique identifier for the file. For PDF files this identifier is part of the format (I think). For more details on how hypothesis uses it checkout https://web.hypothes.is/help/how-hypothesis-interacts-with-document-metadata/#what-happens-when-urls-change

One idea on how to get similar functionality for notebooks would be to add a unique identifier to the metadata of the notebook and for renderers (classic notebook, jupyter lab, nteract, nbpreview) that want to allow annotations to then render something like a <link rel="canonical" href="http://notebooks.jupyter.org/<identifier-for-the-metadata>" in the HTML they generate. Or maybe a meta tag is better than using a canonical as the canonical link is also looked at by search engines.

How would you generate this unique identifier? Maybe it is enough to generate a random 32byte value when the notebook is created. For those who are interested the code that PDF.js uses to generate the fingerprint for PDFs is here.

What do you think of adding a extra field in the metadata and setting it to a random value on document creation? Then rendering that in the HTML version of a notebook so annotation tools can use it as identifier?

judell · September 9, 2019, 5:03am

There are two levels at which identifiers in a Jupyter notebook might interact usefully with annotation software.

Element level. Fernando Perez suggested, some years ago, that per-node identifiers could be important. A t the time, as I recall, they didn’t exist. But anyway the idea is that while anchoring annotations to selections of text in a rendered notebook, by default Hypothesis will do it based only on the position of the target selection in the stream of rendered text, and on the target text itself, surrounded by a prefix/suffix context window. Maybe that’s fine, but it could be interesting to anchor annotations relative to nodes in the notebook, if they are identified, and depending on how that identification surfaces in the rendered page. If none of this is readily available, then the notebook is just a web page from Hypothesis’ point of view, and it deals with it in the way it normally does.

Document level. Here we want URL-independent identifiers. There’s no need to follow the PDF fingerprint model, something human-readable would be better. I would not recommend rel=“canonical” but rather the dc.identifier/dc.relation.ispartof metadata pair described here: https://web.hypothes.is/help/how-hypothesis-interacts-with-document-metadata/#what-happens-when-urls-change. The two parts combine, and you can use the identifier/relation pair however makes sense.

betatim · September 9, 2019, 9:00pm

Making the annotations independent of matching the HTML would be nice so that they end up in the same place across UIs (they presumably generate different enough HTML).

Each cell has a metadata field as well so we could have a unique ID there as well. Would the dc.relation.ispartof tags work for cells to indicate they are part of the whole document or how would it work?

betatim · September 10, 2019, 5:11am

On twitter Tony pointed to https://github.com/jupyterlab/jupyterlab-commenting as something people are working on to create a new commenting system that works in jupyter lab.

judell · September 28, 2019, 4:57pm

Hey Tim, I lost track of this, sorry.

If the cell’s ID surfaced as a fragment ID, then the https://www.w3.org/TR/annotation-model/#h-fragment-selector could be appropriate.

As it happens, Hypothesis used to record a FragmentSelector with annotations but there wasn’t a compelling use for it.

Here’s an example of an annotation that did record a FragmentSelector:

https://jsoneditoronline.org/?url=https://hypothes.is/api/annotations/9mT3gsbQEeag7MOeBozdXQ

Drill down into target -> selector and you’ll find 4 selectors.

The value of the FragmentSelector (the one we don’t use any more) is main, because that’s the governing id (<div id=“main” role=“main”>).

If cells had ids, and if we reinstituted the use of FragmentSelectors, that could be a nice combination.

westurner · March 20, 2020, 9:52am

“Add unique ID to the notebook metadata”
https://github.com/jupyter/nbformat/issues/148
jupyterlab-commenting > “Roadmap of features (not yet prioritized)”
https://github.com/jupyterlab/jupyterlab-commenting#roadmap-of-features-not-yet-prioritized

westurner · March 20, 2020, 10:08am

If cells had ids,

This would be very helpful. If the next major revision of nbformat is JSON-LD, these ids could be the @id for the e.g. nbformat:InputCell < schema:CreativeWork.

This says comments are stored in a comments.db which presumably needs to be merged separately?

github.com

jupyterlab/jupyterlab-commenting/blob/0fe8bf12350aeca0b1199e2d0efbed5ef180b6d1/docs/usage.md#where-do-comments-save

# Usage

## Overview

The commenting panel is located on the right side panel on Jupyter's main area.

![](./img/usage-1.gif)

When opened for the first time it will ask for you GitHub username to know who is commenting. This uses the [public GitHub API](https://developer.github.com/v3/) to retrieve your name and profile image.

Once logged in, you are able to do a variety of things.

---

-   **[General Usage](general-usage)**
    -   [Creating a comment thread](#creating-a-comment-thread)
    -   [Resolving a thread](#resolving-a-thread)
    -   [Edit a comment](#edit-a-comment)
    -   [Deleting comments](#deleting-comments)
    -   [Filtering and sorting threads](#filtering-and-sorting-threads)

This file has been truncated. show original

It’s likely possible to run a private instance of hypothesis/h with ideonate/jhsingle-native-proxy or ihenry42/jupyter_wsgi, but IDK how to handle spam or moderation; integration with JupyterHub authenticators would be cool.

IIUC, with the durable ID @judell describes in Add unique ID to the notebook metadata · Issue #148 · jupyter/nbformat · GitHub , any central hypothesis WebAnnotation server could host comments / annotations / highlights on HTML renders of Jupyter notebooks.

When would the UUID need to be changed?

When copying a notebook

When creating a notebook from a template (~copying)

When nbgrader copies from a template

What sort of UI does this need?

“Generate new UUID” > “Confirm?” (maybe in the metadata editor?)

MSeal · March 20, 2020, 7:28pm

At this point I think we should make a JEP proposal for the change. The problem is well outlined and the solution seems defined enough to get potential consensus from the larger community I think. If you wanted to do an initial draft for that it would help, I’m a little swamped in other threads around async and nbconvert 6 (if we can ever get it fully released ). But I’d be glad to chime in or help review proposals with the time I do have currently.

choldgraf · March 22, 2020, 5:06pm

A quick note re:cell ids, is that a unique cell “name” is already in the spec

https://nbformat.readthedocs.io/en/latest/format_description.html

It just doesn’t have any tooling built around it as far as I know. This is an issue we’re running into in another project as well, where we’d like to be able to refer to specific cells.

Re: a Jep, I would love to see this happen @MSeal

MSeal · March 23, 2020, 4:43pm

The issue is that name is not required to be unique (only should be) and has no requirement to be present, making it not ideal for consistent identification. Name is almost always going to be a field type in any system that’s human friendly but not machine friendly to use.

choldgraf · March 23, 2020, 5:22pm

I agree - just wanted to note where there was some steps in this direction already. IMO having an “ID” that is more restricted (e.g. no spaces, etc) along with a “name” field would be great. Creating a new notebook could auto-generate IDs for each cell (e.g. generate a hash each time a cell is created) and then UI could make it easy for people to over-write Cell IDs if they wish for something more human-referencable.

Is it worth opening an issue specifically about that, and taking conversation specific to that point over to nbformat?

betatim · March 23, 2020, 5:43pm

Is there a template for JEPs? edit: Yes there is! https://github.com/jupyter/enhancement-proposals/blob/9a608e88be32af66757785b8e0f48541e71388a8/jupyter-enhancement-proposal-guidelines/jupyter-enhancement-proposal-guidelines.md

Otherwise I think I’d combine the first post of this thread and https://github.com/jupyter/nbformat/issues/148# to make a start.

MSeal · April 23, 2020, 4:07pm

I’ll take a stab in a week or two here and make JEP for this.

choldgraf · April 23, 2020, 5:18pm

Ping @Zsailer as I believe he is also starting to coordinate efforts on this

Topic		Replies	Views
Make my object render as HTML in notebook: multiple outputs, javascript, node ids and classes? Notebook how-to	2	904	August 9, 2020
Cell ID changes when notebook rerun only in JupyterLab JupyterLab	2	2468	March 24, 2021
Seeking Annotation Solutions for Jupyter Notebooks in Higher Education Notebook	1	62	September 25, 2024
How to have modes of execution, and normal and annotation mode? Notebook	0	589	December 1, 2021
View and change cell id JupyterLab	0	152	June 9, 2024

Annotating Jupyter notebooks

Related topics