Inline variable insertion in markdown

Description

For many years people have requested inline variable insertion in Jupyter Notebook markdown. This is a huge feature in the RMarkdown ecosystem, where you can do things like `r somevar` and have that variable be inserted into the text at rendering time.

There have been some attempts at doing this in the classic notebook interface, most notably the python-markdown extension, which uses moustache insertion so that {{ myvar }} will render whatever myvar is in the Python kernel:

I am wondering if there is a way that this pattern of moustache insertion could be generalized as a part of the Jupyter Notebook specification.

Benefit?

There are more and more libraries that deal with notebooks programmatically, so codifying a pattern like this could be useful even if it weren’t supported in the interactive user interfaces. I could imagine tools like papermill and jupyter book making heavy use of this. I believe that it would be a really useful feature that would bring the RMarkdown and Jupyter ecosystems more aligned with one another. If this could be done in a language-agnostic way, then it would also be really impactful for many other communities.

Potential implementations

I don’t think that this needs to be supported in the live user interfaces in order to be useful. An initial implementation could be done in one of the notebook execution libraries like nbclient. For example, you could imagine a pattern using nbclient like:

  1. nbclient is run on a notebook
  2. It executes the notebook code cells
  3. It then inspects the markdown cells of the notebook, and searches for anything within curly brackets

two options from there:

  1. For all items that it finds, it calls display on a variable of the same name, and replaces the curly brackets with the resulting string.
  2. For all items that it finds, it calls display on a variable of the same name, and inserts the result in notebook-level metadata. Then downstream packages can do what they want with this metadata.

I don’t think this would be possible currently with nbclient because it doesn’t give you the state of the live kernel after executing the notebook, but maybe it wouldn’t be too difficult to extend somehow?

Thoughts?

I am curious what others think about this - would it be useful? Does it seem sensible?

Related topics

and a really long issue that has been open for more than 5 years :upside_down_face: Allow references to Python variables in Markdown cells · Issue #2958 · ipython/ipython · GitHub

12 Likes

Thank you for raising this issue, Chris. I think this is an essential feature for writing computational narratives and for publishing books.

While the post-processing quick fix could make sense for a book, you really need real-time notebook rendering support for computational narratives, so that values referred to in Markdown cells reflect the state of the kernel.

Being able to write something like the following is tremendously useful:

In the above example, the table has {{ x.shape[0] }} rows.
The first row has {{ x.shape[1] }} values, which sum to {{ sum(x[0]) }}.

Otherwise, you are stuck in the pattern of “Let’s see how many rows the table has:” [code cell] “And how many values?” [code cell] “Let’s calculate its sum:” [code cell] Either that, or you hard-code the values (bad!).

You all recognize this style of writing, because it’s found everywhere with Jupyter notebooks. Authors are forced to sacrifice clarity because of a structural impediment. So, I think this would be a much valued addition!

9 Likes

Just agreeing with Stefan - partly because we share experience of writing a book together. It really wasn’t obvious to me how much I needed `r somevar` until I could use it (because we were using RMarkdown and Knitr). Before, it was just a desirable feature. Now I’m used to using it - it seems to me to be requirement for a fluent computational document. Stefan’s sketch of “Let’s see how many rows the table has” is very familiar, and that pattern is a real barrier to good writing in notebooks.

2 Likes

I think this would be an awesome thing to have in nbconvert, for example. In R, are the variables replaced with the current value of the variable at that point in the narrative, or the final value at the end? I think the former makes more sense, which would be a difference in your pattern steps, to something like:

For each cell:
    If a code cell, execute it
    If a markdown cell: replace expression templates with the current expression value

Ideally, the expressions would be immutable.

@stefanv - in an interactive context, do you imagine the expressions auto-updating (possibly very hard to keep track of), or just updating when the markdown cell is rendered?

We’ve been working in this vein in pidgy for a couple years. It leaves the client-owned markdown cells as-is, but its code cells are also markdown-first. Code blocks (and fenced code) generate displays that are updateable, and literate in the sense that they needn’t be fulfilled until the end of the tangle/weave per cell, so you can write your story up front. It also works with tests, which get continuously updated.

We’ve never quite gotten typographically-sound jupyter widgets working, a la tangle, but yeah. I have this feeling that’s what’s really needed.

I think we’d need a far better contract for naming things between the client and kernel in order to do anything useful… making it work “just for python” or “just for pidgy” or whatever would be cute but not really move the state-of-the-art forward.

1 Like

Countless times I needed this feature, often resorting to using a re-implementation of markdown cell magics to substitute variables. It would be really nice to have it on runtime, even if be only updated on Markdown cell render rather than automatically (to lessen the performance implications).

Another idea for making this work would be to re-use the debugger protocol request which can retrieve the variable value (and representation since recently) from kernel.

Rendering on execution of Markdown would be in line with the rest of the notebook, so that should be sufficient.

@jasongrout - the `r somevar` construct in R works just like the output from a code cell - inline values are those current when the cell gets rendered. Here’s an example R notebook:

```{r}
a <- 1
```

`a` now has value `r a`!

```{r}
a <- 99
```

`a` has a new value: `r a`!

The rendered version gets 1 and 99 in the first and second `r a` inline code chunks, respectively.

This seems like a reasonable contract - that the inline code output behaves the same way as the ouput from the code chunks.

This feature is a must have! I always find myself in need of this. As a workaround, I use f-strings. The R implementation seems to be the most flexible one. It is also used in MyST Markdown for Jupyter Books! How can we push for this feature?

There is a version of this feature in MyST Markdown, using the inline glue construct - for reference, the docs are here.

One serious problem is that the glue construct only renders from the book build, and not on standard rendering of the notebook. This leaves some rather inconvenient cruft in the notebook. For example, the R notebook markup, that we’ve already seen, is:

```{r}
a <- 1
```
`a` is `r a`!
```{r}
a <- 99
```
`a` is now `r a`.

The JupyterBook equivalent (in RMarkdown to make it comparable):

```{python, include=FALSE}
from myst_nb import glue
```
```{python}
a = 1
```
```{python, include=FALSE}
glue("first_a", a)
```
`a` is {glue:}`first_a`
```{python}
a = 99
```
```{python, include=FALSE}
glue("second_a", a)
```
`a` is now {glue:}`second_a`

This gives a notebook that is rather hard to write, but also hard to read, unless rendered by the book engine.

Re: rendering: while marked has served us fairly well for a long time, having a look at alternate renderers is probably more fruitful, and helpfully, this is hot-swappable in JupyterLab.

jupyterlab-markup, based on markdown-it (which is also what myst emulates, for good or ill) is already capable of a quite a bit, and could be extended to support any of the above syntaxes. Therein lies the problem, of course: there is no “Jupyter Markdown” spec (handshakily, it’s GFM+MATH$… but with some mangled links), myst is extensible, pidgy does different things per release :woman_shrugging: and the commonmark extension spec hasn’t budged in years.

The danger of this: whatever clever thing is done, at whatever level, if the renderer isn’t portable/formally defined (e.g. an ANTLR/lark grammar), a given notebook would need a static representation of the rendered HTML of itself to even be confidently shareable, much less reproducible, even if it was pleasantly authorable. In pidgy, we achieve this by taking “fancy” markdown, and emitting “boring” HTML during the REPL loop in-kernel, but one could also imagine implementing this inside a client or postprocessor… albeit with a much higher bar of requirements vs “boring” markdown.

2 Likes

So to try and disentangle the things that would need to be done here, and summarize some of the conversation around implementation. It feels like these are a few major questions:

Syntax: Define an inline execution or variable interpolation syntax for Jupyter Notebook markdown

It seems that our most likely candidate would be {{ somevariable }} under the assumption that the notebook-level kernel is the language that will be used. Though another option would be to define a language-agnostic version of the RMarkdown pattern (`r somecode`). E.g., `j somecode`, or maybe piggy-backing on myst markdown like {exec}`somecode` .

Doing this would require a few new things:

  1. having a formal definition of Notebook markdown. Currently it is just “whatever marked.js does” and we’d need to have a formal way of saying “this new syntax is now supported”.
  2. Agreeing on a syntax and adding it to that definition.

As @bollwyvl suggests I would be a big fan of moving to markdown-it because then we could formally define Jupyter Notebook markdown as “Markdown-it + this collection of plugins”. That is a natural extension point in the future: want more functionality? Define a syntax and make a markdown-it plugin for it to prototype.

Execution vs. variable

Would {{ }} syntax represent a variable that is somehow stored in the notebook’s metadata and doesn’t depend on a live kernel to user. For example {{ myvar }} exists in a markdown cell, and it’s assumed that something has stored myvar: value in notebook-level metadata.

OR, would it represent “executable code”. So that you could also do {{ myvar.shape[0] }} or {{ some_R_func(myvar) }} and this would be executed somehow.

Execution logistics: how would execution be different from code cells?

If we assume that {{ }} means “execution”, we’d need to decide whether / how {{ }} would be treated differently from code cells. I think the big question there might be whether inline execution follows the same order of execution as code cells, or if it is more like a post-processing step. AKA, if there were a notebook that had:

(in a code cell)
a = 2

(in a markdown cell)
The value of a is {{ a }}

(in a code cell)
a = 5

(in a markdown cell)
The value of a is now {{ a }}

then would the two values of a be the same or different in the markdown cells? I think it would be important to define this outside the context of an interactive session first. Then the interactive interfaces could decide how they wanted things to behave.

Interactive rendering logic

Finally there’s the question of how this should behave in an interactive environment. Does inline markdown execution happen whenever a markdown cell is rendered? Does it occur across all markdown cells every time a code cell is updated? Does the interactive renderer just display a placeholder that tells the author “this will be executed if you ever run the code top to bottom”.

I feel like this one could be a UI-specific choice, so long as the expected behavior of inline execution syntax was explicitly defined in the syntax.

2 Likes

Chris - thanks for the summary.

For syntax - I suspect it would be frustrating to disallow more than one language for the inline values. Although I guess that could be achieved by some bridge between the languages - e.g. making the R workspace accessible within Python via Rpy2 or similar.

For execution order, it seems to me that leaving the inline rendering to the post-processing step would condemn us to to the kind of hoops that the Glue construct has to jump through. It would also make it harder, and perhaps invalid, to view the inline values in the notebook while in the interactive interfaces, and I think that would make it less useful.

To clarify - that is the case for an R notebook. Standard RStudio, at least, does not render the markdown interactively, so you see all the Markdown markup while editing the notebook interactively. In the case of inline values, this is not too bad, because the syntax is so simple and easy to read. More complex syntax would be more distracting.

For the interactive rendering logic - it would make the most sense to me, to insert the values in the shift-enter or equivalent rendering step, when finishing an edit of a Markdown cell. Sure, the values could get out of order, but that’s just as true of code cells, and I think the user would learn to expect that.

1 Like

define Jupyter Notebook markdown as “Markdown-it + this collection of plugins”.

… which changes the state of things not much, trading marked for markdown-it. Hence the desire for an actual grammar, and a conformance suite, implementable in more that two language runtimes, and an ability to document additional plugins used in a notebook.

This is part of the challenge with many things we call “specs” which are in fact, “choices we’ve made that you, too, are free to re-implement after reading our code,” such as LSP (vscode) or Jupyter Kernel Messaging (ipython)…

Execution vs. variable

…which provides the execute and execute_response messages, which offer user_expressions, which probably fits the bill… though not new, it isn’t widely implemented. Each expression may return a single display mimebundle, and each can fail independently.

As this would be keyed by the text of the expression itself, it might be more reasonable and portable than inventing some new id system next to display_ids, comm_ids, cell_ids, etc. while a client/processor would be able to handle re-constituting them into the correct place in the UI. The cadence of when/which expressions are executed would indeed need to be a choice, though presumably simple displays (e.g. non-comm/widget) would be expected to reach a steady state after restart-and-run all.

  1. Any syntax which can (optionally) incorporate language specifiers would have an additional benefit of allowing the use of polyglot kernels, but it would be much nicer if such language specifier would be optional. While `r variable` allows for polyglot support, dropping the language prefix gives just `variable` which is ambiguous, so it might not be that great. I would love to have simple sytnax, yet allowing us to add parametrization in the future if we choose so (that parametrization could be the language which the polyglot kernel should use, or maybe a mime type the kernel should return).

  2. As this would be keyed by the text of the expression itself

    What if we have a markdown like this:

    The value was {{ next(iterator) }} but it now is {{ next(iterator) }}.
    
  3. While I like the simplicity of re-using the execute request, maybe the initial implementation could restrict this to variables and use the richInspectVariables request of the DAP (see https://github.com/jupyterlab/jupyterlab/pull/10299) which would be safer in a way as it would prevent markdown from executing arbitrary code. I am not saying that the ultimate solution should be restricted to variables, but that it might be better to introduce it step by step so that users do not get surprised. I imagine that any syntax we choose can overlap with existing text in Markdown cells and if this will run arbitrary code when executed in a next version of a Jupyter, it might have surprising consequences. Edit: but maybe we could just analyse a sample of public notebooks to get an estimate of how many notebooks could be affected for any given syntax and only if this is a subtotal number then worry about things like that. And the use of DAP has many downsides as well.

It seems like a polyglot kernel would be responsible for this disambiguation with a magic or whatever. Also see discussions on jupyterlab-lsp about kernels declaratively documenting these syntax variations from their host language’s official syntax (if there is such a thing) as it’s going to get hairy trying to give good syntax highlighting, completion, etc for this kind of stuff. Already, it’s rather hard to know what, exactly, plain-old-jupyter Markdown provides, much less jupyterlab-markup, much less jupyterlab-markup and some rando extensions.

What if we have a markdown like this:

Then that’s… a non-reproducible expression. Most kernels’ rich display systems can have side-effects (some ship JS to the browser, etc) so “showing a variable is free and safe,” is probably somewhat unsound to assume in the general case. So even then: sure, let’s say a configurable mode can generate “at that time” expressions that aren’t updated when the underlying expressions change. My point is, in the spec, these things are just a mime bundle, and don’t have a display_id, as they are “meant for” things like prompt bling, etc.

Limiting these (even initially) to only variables will probably lead to folk creating… lots more named variables, just for the sake of display, e.g. fig.as_svg vs fig_but_as_an_svg, which may not help write/readability.

prevent markdown from executing arbitrary code

I think the whole point of a literate document is arbitrary code execution.

do not get surprised

Yeah, pretty sure this would have to be opt-in, from both the arbitrary code execution perspective, as well as the “don’t break my frigging markdown” perspective. Having it be an installable, extension, first, would be suitably opt-in, of course… and getting something like this through a JEP would be… tedious.

Meanwhile, re-using the (many) markdown hypertext/hypermedia notations, would make it far easier to reason about, add fallbacks, and accessibility features.

Defining a new URI schema (vs a new markdown syntax in an already hostile environment) seems entirely reasonable, given what we’re trying to do. Particuarly the semantics of image URLs actually jives quite nicely with the concept of values that update dynamically over the course of interactive execution. As no client would actually know what to do with them, the fallback would look broken, but still convey meaning, vs a bunch of braces.

By working within the bounds of existing parsers, we’d be able to fiat a bunch of stuff, for example saying that the expression is the local anchor (e.g. #) and later specify the semantics of the rest of it.

The value of [the variable x](jp:#x "a title for x"), limited to text MIMEs.

An value of [a reference to y][y]. Or just [y].

[y]: jp:#y "a title for y"

A picture of ![a variable z](jp:#z.plot() "a plot of z"), with a regular MIME priority.

Which would fall back to:

The value of the variable x, limited to text MIMEs.

An value of a reference to y. Or just y.

A picture of a variable z, with a regular MIME priority.

Certainly worth a consideration!

use of DAP has many downsides as well.

Indeed, while user_expressions is not widely implemented, execute is usually the second message a kernel must implement after kernel_info, whereas adding DAP is… a lot to even get started, and many kernels have not taken the plunge.

So anyhow

On the main, I think getting someone to actually try some of these out as an installable extension for a Jupyter client which also implements a server-side transformer, both aware of some notebook/cell metadata for some of the knobs discussed above, makes the most sense… not signing up to do so, mind you!

If the client’s JupyterLab, jupyterlab-markup is probably the right substrate. No idea which tool will be correct one on the backend, though hooking it in nbconvert and nbexecute is probably more useful as a demonstration vs some higher-order publishing system.

1 Like

For syntax how about moustache with dot? As in:

Default form is {{. some_var }} .

This displays the value of the some_var variable in the default kernel namespace. Note the period after {{.

You can choose the kernel with a specified language after the dot, as in:

Use {{.r another_variable }} to display a variable from the R kernel namespace, or {{.julia a_julia_variable }} for a Julia variable.

I suspect no-one is using {{. at the moment, so we would not have to worry about accidentally triggering code execution, but happy to be corrected.

2 Likes

I don’t think the code expressions in md cells need to be rendered interactively, just when the notebook is exported/converted into pdf/html/presentation.

How does R render all of this? What Markdown flavour do they use and which library is used for rendering?

Is this the kind of thing that could be prototyped via an extension in nbclient? To my knowledge, that library doesn’t have an extension point that would let you execute inline markdown in the same order that it appears in a notebook (ie, I feel like there is a pretty strict separation between MD and code cells, so prototyping this would require modifying library code…I would love to be proven wrong there though!)

I think that adding in this to nbclient would mean modifying this line here and adding some logic to check for “inline execution w/ markdown syntax”:

Though, I think there’s some questions to understand there (e.g., is the original executable code now replaced with the output? is the output stored in the notebook and how it’s used is up to the renderer?)


(quick thoughts on syntax since I think that’s an interesting question)

I think @matthew.brett 's moustache+dot is a nice idea! The moustaches generally evoke the assumption that “this thing will be replaced by something else” and the . is a nice way to differentiate “executable” moustaches from regular interpolation moustaches.

Two other ideas for syntax since we are brainstorming:

Take inspiration from IPython

Piggy-back on Jupyter’s roots and use a similar syntax to the IPython kernel, the [ ]: syntax. So you could imagine:

  • The shape is []:`a.shape[0]` for default kernel
  • The shape is [python3]:`a.shape[0]` to specify a specific kernel
  • The shape is [py]:`a.shape[0]` to specify a kernel w/ shorthand

Or you could remove the : and this would then have a very similar structure to MyST Markdown directives:

  • The shape is []`a.shape[0]` for default kernel
  • The shape is [python3]`a.shape[0]` to specify a specific kernel
  • The shape is [py]`a.shape[0]` to specify a kernel w/ shorthand

Take inspiration from RMarkdown

You could use the same pattern that RMarkdown uses, but keep j for a generic execution that would use the default kernel. For a more specific kernel you could specify that instead of the j. So:

  • The shape is `j a.shape[0]` would use the kernel of the notebook in a language-agnostic way
  • The shape is `py a.shape[0]` would be a way to specify a Python kernel explicitly
1 Like

This is just to unpack the argument I have made before that the inline expressions should render inside the interactive notebook.

Consider the typical JupyterBook case, where a page is a notebook, and there is an Interact button that opens the notebook in a Jupyter UI on the web somewhere.

Say you have a notebook like this:

```{python}
import numpy as np
x = np.array([4, -1, 3, 7, 9])
n = len(x)
s = np.sum(s)
m = np.mean(x)
```

The formula for the mean is $\frac{1}{n} \sum{x_i}$.

Here $n$ = {{. n}} and $\sum{x_i}$ is {{. s}}.  Therefore, the mean in our case
is {{. m }}.

When we open such a notebook in the interactive UI, we expect to see the math rendered correctly, and if we edit the math and re-render the cell, we expect to see the updated math rendering. What should we expect to see for the inline values? The post-processing model says that, in the interactive UI, we’d only ever see something like {{. m }} as a placeholder for values to be filled in later by some other rendering engine. What I think most users would expect, is that we’d see the rendered inline value, just as we see the rendered math.