Inline variable insertion in markdown

psychemedia · September 8, 2021, 8:12am

For static Jupyter Book outputs, I started to explore some simple magic that attempts to use the markdown contents of a code cell as a Python f-string then render it as HTML cell output.

The original code cell then needs hiding in the output Jupyter Book which could be automated with a simple script that makes sure a code cell magicked as f-string is tagged as hide-input.

stefanv · September 8, 2021, 11:51pm

Although the detailed technical comments are very interesting, we need high level agreement that (a) this feature is wanted and (b) it is feasible. I think the answer to both is “yes”.

The last is proven by the existing notebook infrastructure, which already supports code cells. If you then start thinking of Markdown cells simply as a combination of text cells (already implemented) & code cells (already implemented), little new is needed other than specialized rendering & control for these composite cells.

To those who know the Jupyter decision making process: what would be the way to get this approved for development?

agoose77 · September 9, 2021, 12:35pm

TL;DR

I like the aesthetics of this idea, but I’m not sure that we can / want to do this in a completely kernel agnostic manner.

Context

This topic is one that I’m deeply interested in. I remember discussing the concept of “rich markdown” (concerning widgets) with @jasongrout a while back. We both (IIRC) expressed an interested in the ability to have a WYSIWYG kind of editor for widgets-in-text. Other products like Observable have this feature, which gives something to experiment with. Fundamentally, in-line widgets are a similar problem to in-line variables — rendering kernel variables in-line in markup.

I think others have raised the question of

What is a notebook for?

and this captures my philosophical concerns for this idea. I’m not going to state what I think a notebook is, because I think I need to give that more thought myself.

Additionally, there are already other conversations happening at the moment that ask:

How can we make notebooks more reproducible w.r.t embedded widgets / interactive outputs?

and I think this thread is related.

Using JupyterLab-Markup

Before I go any further, we could bolt on an interactive Markdown renderer on-top of the existing Jupyter notebook, and it would work. A brief discussion of this follows. However, I think now is a good time to discuss wider changes that make the Notebook more forward-looking, and this is discussed in the next section.

As @bollwyvl mentioned, we could already make a good start on this with jupyterlab-markup. It wouldn’t (famous last words) be that hard to write a plugin that embeds variables in markup (using the DAP, I think), and for non-reactive notebooks this would work reasonably well. It would also support widgets (I believe), which would be interesting. As Nick states, the bigger Jupyter-scale problem is that if we start investing heavily into the jupyterlab-markup route (for this plugin or others) without some way to demand that these extensions (be-it in JupyterLab, or in any frontend) be supported, we start to move towards the old-style Notebook free-for-all. More-over, we lose the robustness of Markdown rendering as a universal feature.

Add Interactive Markdown Cells?

To my mind, this should not be something that is performed by the frontend, at least not directly. Requiring a kernel to render Markdown would preclude running such a notebook in a non-kernel backed viewer (something that several commenters have pointed to).
What we could do is establish a special MIME-bundle that represents a Markdown object that containers rich display objects, i.e.

{
   content: "Live long and {{ what }}",
   data: {
      "what": {
          "text/plain": "prosper"
      }
   }
}

Here, the kernel is responsible for sending over these mimebundles, but it doesn’t need to know how they are assembled, and is no longer required after the cell has been executed (i.e. the notebook can be viewed in nbviewer).

Here’s a mockup based upon IPython’s Markdown object:

from IPython.display import IMarkdown
what = "prosper"

IMarkdown(
   "Live long and {{ what }}",
   data={
      "what": what
   }
)

or

%%imarkdown what=what
Live long and {{ what }}

We could parse the Markdown in the kernel (e.g. via magics) to look for variable definitions, but as soon as we start needing to infer the variables in the kernel, we require the kernel-library to maintain awareness over the syntax, which (as discussed) can vary between plugins.

For posterity, we can do a hack job by just using f-strings:

import ast
from IPython.core.magic import register_cell_magic


@register_cell_magic
def imarkdown(line, cell):
    shell = get_ipython()
    expr = ast.Constant(cell)
    formatter = ast.Call(
        func=ast.Attribute(value=expr, attr="format", ctx=ast.Load()),
        args=[],
        keywords=[
            ast.keyword(
                value=ast.Call(func=ast.Name(id="globals"), args=[], keywords=[])
            )
        ],
    )
    node = ast.Call(
        func=ast.Expr(
            value=ast.Attribute(
                value=ast.Call(
                    func=ast.Name(id="__import__", ctx=ast.Load()),
                    args=[ast.Constant(value="IPython.display")],
                    keywords=[
                        ast.keyword(arg="fromlist", value=ast.Constant(value="display"))
                    ],
                ),
                attr="Markdown",
                ctx=ast.Load(),
            )
        ),
        args=[formatter],
        keywords=[],
    )
    shell.run_cell(ast.unparse(node))

%%imarkdown
Live long and { what }

But this is just to make a point.

choldgraf · September 9, 2021, 6:21pm

Thanks @agoose77 for that write-up, I appreciate all of the thoughtfulness, and the inspiration in there!

A few thoughts from me:

To answer @stefanv , I don’t think that the right next step is “get this approved for development in all of Jupyter”. This feels to me like it should be prototyped and experimented either
- As a JupyterLab or kernel-level extension (as @agoose77 describes)
- As an extension of programmatic notebook execution via nbclient or papermill (I opened Create some extension points for notebook / cell execution · Issue #158 · jupyter/nbclient · GitHub to discuss that possibility)

I think we should have a working prototype that people can play around with and that can iterate more quickly than having a vote across the whole Jupyter ecosystem. That has been a successful model for other extension points in the JupyterLab world (e.g. jupyterlab-lsp, debugger, etc) and I think it could work here as well.

I also think we don’t want to make a perfect solution a blocker on iterative progress. In my opinion it is OK if we introduce some noisiness into markdown syntax etc, so long as it’s clear that this is at the prototype/experiment level, not “altering core Jupyter”.

I guess the question then is who is interested in experimenting on this?. I don’t mean “who wants to modify core JupyterLab”, I mean “who wants to play around with some combination of extensions, packages, etc that explore how this use-case could be made possible.”?

I care about this a good amount, it sounds like @rowanc1 is interested as well in building some prototypes. @agoose77 it would be awesome if you could provide some collaboration on the jupyterlab-markup side if this is of interest to you! (though, I don’t think that making progress here necessarily means building JupyterLab extensions)

stefanv · September 9, 2021, 9:41pm

It will be disappointing if we cannot land this feature in Jupyter itself. That would let us display properly rendered rich Markdown cells to the reader. Perhaps you are saying that the core team would need a proof of concept first in the form of an extension, but is it even possible to add a new cell type or to change the way cells are rendered this way?

If we are only talking about rendering inline expressions while executing a notebook (i.e. in a publishing step), that can already be achieved by pre and post processing a notebook: split up markdown cells into text & inline code, attach relevant metadata to each part, run it through an execution engine, then stitch everything together again.

fperez · September 9, 2021, 10:20pm

Stefan, I have wanted something like this for a long time too, but I also think it makes complete sense to prototype it as an extension first… One of the downsides of Jupyter today is that it has a huge user base That is a downside from the development/maintenance perspective: any changes in the core have a huge impact and are very hard to revert later on if they prove sub-optimal in some unforeseen way.

Thus our emphasis on less “committing” prototyping of new ideas around the edges and on having extension points everywhere…

The static post-processing part of this can certainly be implemented in JuptyerBook/nbconvert even today with custom tooling (though there may be API improvements to be made to facilitate it), while the UI/UX for interactive use should be doable in JupyterLab as an extension (that potentially takes advantage of metadata).

That will help us iron out all the details and explore graceful fallbacks: what would these look like when rendered on github, how would they interact with tools like jupytext or other frontends like pycharm or vscode, etc.). And off a working implementation, it becomes much easier to see whether core changes to APIs or formats are needed, or whether it’s a simple matter of say “blessing” an extension as a core JuptyerLab one that ships by default. But making that decision now would be premature, for both technical and social reasons, so I don’t see a problem with starting now to build working prototypes for both the static and the interactive cases.

stefanv · September 9, 2021, 10:28pm

I probably didn’t express myself clearly, but what I was asking is exactly this: whether what we need (adding a new cell type and rendering it in a special way with kernel interaction) is doable as an extension.

I had a gut feel that it might require some surgery in the core of JupyterLab itself, but I would be glad if that weren’t the case (it would make our job here much easier!).

Further I was pointing out that proving the viability of a pre/postprocessing or cell magic approach isn’t all that interesting (to me), because we already know that will work—it’s just very clunky.

chrisjsewell · September 9, 2021, 10:33pm

Just to mention here, as a principal maintainer on myst-parser, jupyter-cache, myst-nb and jupyter-book, I have some fairly concrete ideas about how I would go about achieving this (which I was already intending to do).
Just not the time to flesh them out here right this minute lol.

I would quickly though echo some concerns above, that I would be wary of the kernel having to start getting involved in Markdown parsing

fperez · September 9, 2021, 10:37pm

The post-processing version may still have different needs from the live/interactive one, as there may be different output constraints on say a static PDF output compared to a JS-rendered one in a browser (just like the nice HTML render of say pandas dataframes doesn’t quite make it to LaTeX output).

But as to your first question: without having tried to build it, I’m not 100% sure that Lab right now has all the right APIs for this (I don’t know them well enough). However, I can imagine a number of ways to try and build this as a Lab extension that should work, and where if they hit a wall, the answer at first might be to request a given API improvement in Lab (which isn’t an immutable entity ) to facilitate things…

But for example, here @jasongrout pointed out some ideas on how to write a custom cell provider in Lab… It’s under-documented ATM but seems viable, so I do think we have the starting points we need…

agoose77 · September 9, 2021, 11:12pm

To clarify my position (which is just that, only my opinion!), I’m considering the following axioms:

Notebooks should be viewable both on-line (with a kernel) and off-line (using nbviewer, etc)
Notebook frontends should avoid favouring a particular kernel (e.g. the original IPython kernel)

I don’t actually think this is impossible to do by any means, but I do think it will require some big changes (which can be done outside of the core). I stand by my earlier conclusions that we want to decouple the variable-loading from the Markdown rendering (to support the above points).

@stefanv et al. have convinced me to rethink the problem, and I think a Jupyter-first solution is actually possible, it just requires a little more work.

Here’s what I think we might be able to try:

Add a new cell type (e.g. IMarkdown).
Create a new “execution” mode that treats IMarkdown execution as an instruction to ask the kernel for variable mimebundles (e.g. via the DAP, if possible)
Create an output mimebundle that effectively contains the variable mimebundles + the original markdown
Create a Markdown renderer that can compose these mime-bundles into a rendered result.

This approach puts the burden of tokenizing the markdown into a single place — the frontend.
It would probably be straightforward to create a plugin that finds the variable templates {{ x }}, and stores the information. This information can then be retrieved to query the kernel. Once the results are available, the final composite mimebundle can be stored in the cell outputs, and the renderer invoked.

It is tempting to just do this in a near-single pass — parse the markdown, and store the rendered HTML output (with expanded template strings), but I don’t know if I like the idea of storing text/html in the output of the cell. Instead, I’d prefer to store the constituent parts, so that we can support widgets, and also avoid filling the notebook with generated output!

With this approach, the notebook document maintains the reproducible rich output, but whilst being easy to author.

I don’t have time to work on this right now, but I am interested.

Related but off-topic

In this thread I was thinking about custom cell types, and suggested replacing the non–code-cell types with a single MIME cell that specifies its MIME type. I wonder now whether the more bold proposal to make all cells MIME cells and use a kind of execution registry that invokes an executor for a given MIME type would be workable. In the context of inline variable insertion, we’d just register an executor for text/markdown that queries the backend as outlined above, and returns the processed results.

Hmm, I really want to work on this now!

stefanv · September 9, 2021, 11:18pm

Your four point plan seems spot-on to me, @agoose77! The only change I would make is that “variable” should become “expression”, but I suppose you could also interpret variable here as “inline snippet identifier”.

Regarding the point 4 concerns, one approach may be to first compose Markdown from cell text + inline outputs, and to then render that using Markdown-it.

agoose77 · September 9, 2021, 11:24pm

I’ve modified my post since your reply, so I’ll reflect the changes here:

If we use a three-phase approach of parsing the markdown for expression templates, generating a composite mimebundle, and rendering this result, then we can still support things like widgets that don’t save as text/html but rather application/vnd.jupyter.widget-view+json

I like the idea of just calling the kernel with an expression. That seems pretty simple + robust across different kernels.

fperez · September 9, 2021, 11:24pm

I mostly agree with the above, but @agoose77 - I’m curious why the DAP road rather than the user_expressions one (that can return a mime-bundle)? It seems the latter might be simpler to get going, if less sophisticated than DAP? It’s much older and primitive, for sure (and as noted by @bollwyvl rarely used), but in fact we put that in the spec ages ago pretty much with this use case in mind… If it’s insufficient for some reason then the DAP approach may be needed, but I don’t see that quite yet, perhaps I’m missing something?

Also - for now the IMarkdown “cell type” could be as simple as a metadata tag, so that these notebooks render and work normally with other tools, no? Or do you see a need for a custom cell type in the document format itself?

agoose77 · September 9, 2021, 11:30pm

Ha! I mentioned in reply to @stefanv that I think the DAP might be overkill / unnecessary indirection here, with just invoking the kernel directly in mind. However, as you point out, user_expressions would seemingly be even better. How prescient of you and the IPython team!

Or do you see a need for a custom cell type in the document format itself?

I think we do need a new cell type, because Markdown cells don’t have outputs field — we can’t store any mimebundles for a Markdown cell.

Long term, I think a proposal a la Notebook Cell-Type Generalisation would be best. Maybe if I work on this at some point, I’ll do both at the same time!

fperez · September 9, 2021, 11:31pm

Sure? I’d imagined all this - cell tag and a dict of expressions with output bundles would all live in the metadata for the cell, overwritten by the extension. Doesn’t that work, you think?

agoose77 · September 9, 2021, 11:36pm

Ah I see, I’m crossing wires slightly here.

The bigger picture in my mind is that we might benefit from de-regulating the kind of cells we have in notebooks, in favour of a model whereby the notebook contains contents (with mime types), and the frontends interpret the contents according to these MIME types. We already do this in one sense, but for a tiny subset of possible MIME types (text/markdown, text/plain, and text/plain+code e.g.).

In this proposal for inline-markdown rendering, “conventional” viewers would not support the rendered output anyway, they’d only have a mangled version of the GFM interpretation of the markdown. Given this, I am tempted to just move to a new cell type because it’s future proof (in my thoughts of where the notebook needs to go).

However, a PoC doesn’t need to go that far, and my ambitions for notebook schema evolution can take a back-seat! If we want to keep things simple, we can definitely do this all-in metadata, it just complicates the implementation a little more (and in other ways, it’s simpler ).

fperez · September 9, 2021, 11:42pm

Yup, I suggest that, partly for historical reasons: we used to have a bunch of cell types at the start (headings, etc) and backed away into today’s simpler model. I’m not saying that with ~10y of experience we can’t revisit that decision, but that is a much, much more complicated discussion with a massive impact surface.

The “shove it all into metadata and interpret it creatively” approach may be a bit hackish, but it does offer a ton of leg room for experimentation, degrades reasonably gracefully in other clients/frontends, and can offer a fairly seamless update path in terms of UX if later on a core format change is deemed worthwhile.

But given how important I think this idea is, and that we now have all the pieces to do it in place, I would hate to see the experiments bog down in a discussion about the core format. I suggest running with the custom syntax + user_expressions + metadata + jupyuterlab/nbclient extensions for now, which can teach us a LOT and give people something to use in short order…

If from those lessons deeper changes (to Lab, format, protocol) come back, fine. But they’ll then be backed by usage and experience, and will thus be much easier to discuss/integrate…

Thanks a lot for all the work you’ve been doing on the project, BTW!! It’s awesome to get to meet new faces contributing so collaboratively, much appreciated

jasongrout · September 10, 2021, 12:26am

I went into more detail in a conversation a few years ago with Spencer Lyon on Gitter: https://gitter.im/jupyterlab/jupyterlab?at=5a877e177685a046389baa3e

More specifically: https://gitter.im/jupyterlab/jupyterlab?at=5a87899ac3c5f8b90dc6131b

From the top-down: you want to create an extension like https://github.com/jupyterlab/jupyterlab/blob/987702e76cb17265fa249ba67775e35804a69077/packages/notebook-extension/src/index.ts#L303
you’ll also want to subclass https://github.com/jupyterlab/jupyterlab/blob/4bfbf2def3a5778b0d57e9aaca08487e8a0d1b62/packages/notebook/src/panel.ts#L444
and your extension returns your subclass instance
the class inheritance diagram goes like NotebookPanel content factory is a subclass of Notebook content factory is a subclass of cell content factory
so you subclass the notebook panel content factory, which means you inherit the createMarkdownCell method and override that.
in pseudo-pseudo code:

const factory: JupyterLabPlugin<NotebookPanel.IContentFactory> = {
  id: '@jupyterlab/my-notebook-extension:factory',
  provides: NotebookPanel.IContentFactory,
  requires: [IEditorServices],
  autoStart: true,
  activate: (app: JupyterLab, editorServices: IEditorServices) => {
    let editorFactory = editorServices.factoryService.newInlineEditor;
    return new MyNotebookPanelContentFactory({ editorFactory });
  }
};

This may be a bit dated at this point, but I think the principles still hold, that it should be possible to create a new notebook viewer with a custom markdown cell renderer.

jasongrout · September 10, 2021, 12:33am

I wholeheartedly agree. That seems to be the best path forward to me too.

jasongrout · September 10, 2021, 2:28am

I’m happy to pair with someone for an afternoon (probably not in the next week or so, but before the end of the month) to get them started on writing a JupyterLab extension that would introduce a new notebook type with a custom markdown renderer, which could communicate to the kernel to get user expressions to render.

Topic		Replies	Views
Referenced variables in markdown do not appear when exporting to other format like HTML nbconvert help-wanted	0	1252	March 5, 2021
Dynamic markdown in notebooks Notebook	4	1850	June 8, 2023
[julia] Accessing variable contents inside of markdown cells Notebook how-to , help-wanted	0	915	February 2, 2021
Embed widgets in markdown Widgets help-wanted	4	2454	March 11, 2022
Doubly-multilingual notebook Notebook	0	807	July 19, 2019