Notebook Cell-Type Generalisation

agoose77 · September 9, 2021, 12:50pm

Objective

In my view, at the most “general” of representations, the Jupyter Notebook is a

rich, structured document

Although frontends like JupyterLab impose even more constraints on this definition, such as “columnar document”, other frontends (e.g. voila + voila-reveal/jupyter-flex) do not.

Whilst Notebooks are currently comprised of three cell types: Markdown, Code, and Raw cells; these are “implementation details”. I believe that we could generalise the Notebook even further, with the following goals:

Support multiple interoperable Markdown renderers
More-easily facilitate polyglot kernels (e.g. SoS)
Extend kinds of rich output supported at the document level

Motivations

For some time I’ve felt that the existing 3-cell type notebook schema is both a blessing and a curse.
Whilst we support a huge range of code-cell output MIME types in the various Jupyter Notebook frontends (e.g. JupyterLab), the notebook itself can only represent a small subset of these at the document level in the form of cells. Besides Markdown, there are other markup languages that may be useful in the notebook context:

Diagrams e.g. MermaidJS, DrawIO
GeoJSON
Vega (etc)

We support these in cell outputs, why not support them directly as cells?

Furthermore, the Markdown cell is currently defined as a GFM syntax. Whilst this has served us very well over the years, there are an increasing number of projects that want to extend this in various ways:

To support these kinds of Markdown flavours at present, we have to re-purpose the existing Markdown cell with different renderers, and there is no standardised way to communicate this to the frontend-in-question; users need to know which extensions / packages to install.

One of the huge strengths of JupyterLab has been the rendermime interfaces that drive the rich-representation paradigm. I believe we should extend this to the Notebook itself; cells should be able to describe their contents sufficiently that the frontend can provide the appropriate view(s).

Details

Note: the following attempt at a “solution” isn’t actually a good fit for what would need to be done, but fun to at least consider

Notebook Schema

In another thread, @fcollonval touched upon the idea that we might generalise the notebook to support more cell types, with a stronger model-view concept. I quite like this idea, and I wonder if we ought to go as far as to remove the cell “type” from the schema altogether, in favour of a single “MIME” cell. In this design, the three cell types are just views:

[
  // Raw
  {
    "mimetype": "text/plain",
    "data": "I am a raw cell"
  },
  // Markdown (GFM)
  {
    "mimetype": "text/markdown;flavor=GFM",
    "data": "This is `some inline code` inside Markdown"
  },
  // Code (Python)
  {
    "mimetype": "text/x-python",
    "data": "import numpy as np\nx = np.arange(10)"
  },
]

Both Markdown cells and Code cells currently have the ability to carry extra data:

Markdown cells have attachments that contain MIME-bundles
Code cells have outputs that the frontend displays (usually below) the cell editor

These could be views of the same in-document data.

The point here is that the existing notebook schema enshrines these behaviours in the schema itself. By lifting this out of the notebook schema and into the frontend, we can extend things more easily, and (ironically) keep the notebook future proof (e.g. with the flavor=GFM parameter)

Given that IRenders are allowed to modify their models, we can have multiple views for the same cell (just as we have rendered/source mode for Markdown cells), e.g.

vs

Cell Execution

By simplifying the notebook schema, the frontends now have to do a bit more work. How do we implement kernel execution of code cells (in JupyterLab)? A “kernel” extension could be made aware of the MIME type for the current kernel (e.g. via the metadata.mimetype field of the notebook`). For those cells in the current document with the correct mimetype, this extension is responsible for taking the code, executing it in the kernel, and storing the results in the notebook. This partially relates to @jasongrout’s comment here

agoose77 · September 15, 2021, 2:07pm

Hmm, this needs a lot more thought. An approach like the one outlined here really hurts the ability to validate a notebook - most of the contents depend now upon the frontend.

The problems and benefits still hold, in my opinion, but the solution needs to be something that looks a bit different to what is outlined here.

chrisjsewell · September 15, 2021, 8:30pm

Yeh hmm, certainly having a standard convention for making at least the intended (best case) rendering of “text” cells explicit would be welcome, e.g. “if possible this should be rendered with a MyST renderer with this configuration…”.

This I guess would be similar to specifying the execution kernel in the metadata:

{  
   "metadata": {
      "text": {
          "flavour": "myst-markdown",
          "config": {
             "extensions": [
                 "deflist",
                 "amsmath"
             ]
          }
     }
   }
}

Having polyglot kernels/renderers seems a fair bit more complexity and difficult to have working across multiple interfaces, like jupyterlab, vs code, google colab, etc

fcollonval · September 16, 2021, 6:24am

Thanks a lot for starting this discussion @agoose77

Yeah this also push the need to improve handling mimetype extensions in the ecosystem. We could then imagine that part of the interface they need to provide is a JSON schema that could be used for validation.

agoose77 · September 16, 2021, 9:05am

This is definitely one good way of solving the “Markdown crisis” with what we currently have. My thought is maybe to define all markdown variants in the form of “extensions to commonmark”. Even if under the hood they’re not implemented as such, I think the abstraction of “syntax” is something we can agree on at the Jupyter level, and even define. I don’t think it has to be perfect - I suspect that different Markdown rendered probably don’t even render the exact same markup identically due to ambiguity in the grammar (which I think you also referenced elsewhere). I’m not a standards person, so I don’t know how robust this is, but it seems like something that doesn’t enshrine one technology e.g. markdown-it into the Jupyter specification.

In terms of what JupyterLab & nbconvert would do here, they would (abstractly) build a Markdown renderer from a known registry of flavours. JupyterLab could extend that registry via plugins similar to jupyterlab-markup with additional metadata, and nbconvert could use something like the entry_points mechanism to provide new implementations for mistune or markdown-it-py.

I do want to keep the scope open in this thread to think more radically about what Jupyter Notebook could look like, but I also want to keep an eye on less invasive proposals at the same time.

chrisjsewell · September 16, 2021, 9:36am

Specifications is certainly something I’ve been thinking about more, in relation to MyST: Create a myst-markdown repository as a ref implementation for myst · Issue #305 · executablebooks/meta · GitHub

yeh would like to stop using mistune; its not even CommonMark compliant thats why markdown-it is so great; its so easy to create and “document” plugins: https://github.com/executablebooks/mdit-py-plugins/tree/master/tests/fixtures

yep agree; I guess it’s just finding ways to make things as “extensible” as possible, without making it horribly impossible to make any tooling around Jupyter, because you have to account for all this extensibility

agoose77 · September 16, 2021, 10:02am

To be fair to us all talking about this, I am under the impression that the problem of “standardising” Markdown rendering is a big issue, and I recall it being worse in the past when Markdown was first spreading into different web applications. I’m a relatively new stakeholder, so I won’t pretend to have a full knowledge of the history here.

I’d advocate strongly for using markdown-it for both JupyterLab and nbconvert. As you say, it’s quite easy to create plugins, and it would make it a lot easier to write Python packages that deliver both nbconvert and JupyterLab extensions.

I’m sure that we’re on the same page at this point, but I would not want to enshrine markdown-it into the Notebook specification — just establish a semantic interpretation of what the extensions mean, and let the frontends Do the Right Thing™

agoose77 · May 5, 2022, 1:17pm

I haven’t much more to add here - I’m mainly using this thread as a journal of ideas.

Cells as Transforms

In generalising the notebook schema, I think a basic cell definition would have

data - source of cell
outputs - existing code-cell outputs
mimetype - mimetype of cell

I was thinking about the multiple-view feature, and I actually think that’s an orthogonal feature of selecting which mimetype should be rendered in an output.

Maybe what we want is to consider a cell undergoing a transformation from data to outputs. We then have a distinction between a notebook consumer (a read-only viewer of the notebook, e.g. nbconvert without the execution), and a notebook transformer (a read-write viewer of the notebook, e.g. JupyterLab).

I imagine a Markdown cell is then just a transform on the data. But, unlike the present situation where this transformation is special-cased in JupyterLab’s cell handling, we have a text/markdown;flavor=GFM transformer that runs markdown-it. It’s up to the frontend to do this, whether that’s nbconvert, jupyterlab etc. The frontend can make some per-mimetype decisions to improve the UX, e.g. special shortcuts for Markdown ctrl+b to insert emphasis during editing, or defaulting Markdown cells to collapse the source when opening a notebook, like ObservableJS which separates input and output:

The significant benefit of this in the Markdown case is that your transformer can choose to embed both the original text/markdown data and rendered HTML text/html into the outputs bundle, so that HTML frontends are left with no ambiguity as to how to render the markup. In fact, the best approach here would be to always consider the cell itself alongside its outputs, e.g.
mimebundles = cell.outputs or {cell.mimetype: cell.data}

The really interesting side effect of this is that the problems discussed in Inline variable insertion in markdown become less important - we don’t have to enshrine a “perfect solution” because anyone can write a markdown transformer and associate it with the Markdown cell type.

Writing jupyterlab-imarkdown was fiddly, because (as I recall) JupyterLab’s notebook rendering is slightly “all or nothing”. This new system would make it easy to modify a cell’s rendering using this transform system.

Drawbacks

Right now, we can get fairly far using jupytext to convert between notebook representations. This works because ultimately notebooks are code-focused, and we can settle on a code representation by dropping any Markdown / Raw cells. However, by extending to arbitrary MIME types, jupytext et al. would need to become smarter to support this system. This could actually be a good thing - it would be a new avenue for developers to be able to extend jupytext with per-cell transformations etc.

agoose77 · May 17, 2022, 6:19pm

Surprisingly, I only just stumbled across Display data as a cell type · Issue #1123 · jupyter/notebook · GitHub

This thread aligned quite well with the feelings I’ve shared here.

Topic		Replies	Views
Creating a Custom JupyterLab Notebook Markdown Cell Renderer extensions	0	1403	April 14, 2022
Dynamic markdown in notebooks Notebook	4	2128	June 8, 2023
Possible to render a MarkdownCell inside a Widget (not a notebook)? (Extension) JupyterLab help-wanted	4	558	November 11, 2020
Multilingual Jupyter Notebook by maintaining separate Markdown text Notebook how-to	9	1825	April 10, 2022
What's the correct way to create a markdown cell programmatically? JupyterLab how-to , help-wanted , markdown	1	805	November 28, 2023