Inline variable insertion in markdown

Thanks @agoose77 for that write-up, I appreciate all of the thoughtfulness, and the inspiration in there!

A few thoughts from me:

I think we should have a working prototype that people can play around with and that can iterate more quickly than having a vote across the whole Jupyter ecosystem. That has been a successful model for other extension points in the JupyterLab world (e.g. jupyterlab-lsp, debugger, etc) and I think it could work here as well.

I also think we don’t want to make a perfect solution a blocker on iterative progress. In my opinion it is OK if we introduce some noisiness into markdown syntax etc, so long as it’s clear that this is at the prototype/experiment level, not “altering core Jupyter”.

I guess the question then is who is interested in experimenting on this?. I don’t mean “who wants to modify core JupyterLab”, I mean “who wants to play around with some combination of extensions, packages, etc that explore how this use-case could be made possible.”?

I care about this a good amount, it sounds like @rowanc1 is interested as well in building some prototypes. @agoose77 it would be awesome if you could provide some collaboration on the jupyterlab-markup side if this is of interest to you! (though, I don’t think that making progress here necessarily means building JupyterLab extensions)

It will be disappointing if we cannot land this feature in Jupyter itself. That would let us display properly rendered rich Markdown cells to the reader. Perhaps you are saying that the core team would need a proof of concept first in the form of an extension, but is it even possible to add a new cell type or to change the way cells are rendered this way?

If we are only talking about rendering inline expressions while executing a notebook (i.e. in a publishing step), that can already be achieved by pre and post processing a notebook: split up markdown cells into text & inline code, attach relevant metadata to each part, run it through an execution engine, then stitch everything together again.

Stefan, I have wanted something like this for a long time too, but I also think it makes complete sense to prototype it as an extension first… One of the downsides of Jupyter today is that it has a huge user base :slight_smile: That is a downside from the development/maintenance perspective: any changes in the core have a huge impact and are very hard to revert later on if they prove sub-optimal in some unforeseen way.

Thus our emphasis on less “committing” prototyping of new ideas around the edges and on having extension points everywhere…

The static post-processing part of this can certainly be implemented in JuptyerBook/nbconvert even today with custom tooling (though there may be API improvements to be made to facilitate it), while the UI/UX for interactive use should be doable in JupyterLab as an extension (that potentially takes advantage of metadata).

That will help us iron out all the details and explore graceful fallbacks: what would these look like when rendered on github, how would they interact with tools like jupytext or other frontends like pycharm or vscode, etc.). And off a working implementation, it becomes much easier to see whether core changes to APIs or formats are needed, or whether it’s a simple matter of say “blessing” an extension as a core JuptyerLab one that ships by default. But making that decision now would be premature, for both technical and social reasons, so I don’t see a problem with starting now to build working prototypes for both the static and the interactive cases.

1 Like

I probably didn’t express myself clearly, but what I was asking is exactly this: whether what we need (adding a new cell type and rendering it in a special way with kernel interaction) is doable as an extension.

I had a gut feel that it might require some surgery in the core of JupyterLab itself, but I would be glad if that weren’t the case (it would make our job here much easier!).

Further I was pointing out that proving the viability of a pre/postprocessing or cell magic approach isn’t all that interesting (to me), because we already know that will work—it’s just very clunky.

Just to mention here, as a principal maintainer on myst-parser, jupyter-cache, myst-nb and jupyter-book, I have some fairly concrete ideas about how I would go about achieving this (which I was already intending to do).
Just not the time to flesh them out here right this minute lol.

I would quickly though echo some concerns above, that I would be wary of the kernel having to start getting involved in Markdown parsing

2 Likes

The post-processing version may still have different needs from the live/interactive one, as there may be different output constraints on say a static PDF output compared to a JS-rendered one in a browser (just like the nice HTML render of say pandas dataframes doesn’t quite make it to LaTeX output).

But as to your first question: without having tried to build it, I’m not 100% sure that Lab right now has all the right APIs for this (I don’t know them well enough). However, I can imagine a number of ways to try and build this as a Lab extension that should work, and where if they hit a wall, the answer at first might be to request a given API improvement in Lab (which isn’t an immutable entity :slight_smile:) to facilitate things…

But for example, here @jasongrout pointed out some ideas on how to write a custom cell provider in Lab… It’s under-documented ATM but seems viable, so I do think we have the starting points we need…

3 Likes

To clarify my position (which is just that, only my opinion!), I’m considering the following axioms:

  • Notebooks should be viewable both on-line (with a kernel) and off-line (using nbviewer, etc)
  • Notebook frontends should avoid favouring a particular kernel (e.g. the original IPython kernel)

I don’t actually think this is impossible to do by any means, but I do think it will require some big changes (which can be done outside of the core). I stand by my earlier conclusions that we want to decouple the variable-loading from the Markdown rendering (to support the above points).

@stefanv et al. have convinced me to rethink the problem, and I think a Jupyter-first solution is actually possible, it just requires a little more work.

Here’s what I think we might be able to try:

  1. Add a new cell type (e.g. IMarkdown).
  2. Create a new “execution” mode that treats IMarkdown execution as an instruction to ask the kernel for variable mimebundles (e.g. via the DAP, if possible)
  3. Create an output mimebundle that effectively contains the variable mimebundles + the original markdown
  4. Create a Markdown renderer that can compose these mime-bundles into a rendered result.

This approach puts the burden of tokenizing the markdown into a single place — the frontend.
It would probably be straightforward to create a plugin that finds the variable templates {{ x }}, and stores the information. This information can then be retrieved to query the kernel. Once the results are available, the final composite mimebundle can be stored in the cell outputs, and the renderer invoked.

It is tempting to just do this in a near-single pass — parse the markdown, and store the rendered HTML output (with expanded template strings), but I don’t know if I like the idea of storing text/html in the output of the cell. Instead, I’d prefer to store the constituent parts, so that we can support widgets, and also avoid filling the notebook with generated output!

With this approach, the notebook document maintains the reproducible rich output, but whilst being easy to author.

I don’t have time to work on this right now, but I am interested.

Related but off-topic

In this thread I was thinking about custom cell types, and suggested replacing the non–code-cell types with a single MIME cell that specifies its MIME type. I wonder now whether the more bold proposal to make all cells MIME cells and use a kind of execution registry that invokes an executor for a given MIME type would be workable. In the context of inline variable insertion, we’d just register an executor for text/markdown that queries the backend as outlined above, and returns the processed results.

Hmm, I really want to work on this now!

3 Likes

Your four point plan seems spot-on to me, @agoose77! The only change I would make is that “variable” should become “expression”, but I suppose you could also interpret variable here as “inline snippet identifier”.

Regarding the point 4 concerns, one approach may be to first compose Markdown from cell text + inline outputs, and to then render that using Markdown-it.

I’ve modified my post since your reply, so I’ll reflect the changes here:

If we use a three-phase approach of parsing the markdown for expression templates, generating a composite mimebundle, and rendering this result, then we can still support things like widgets that don’t save as text/html but rather application/vnd.jupyter.widget-view+json

I like the idea of just calling the kernel with an expression. That seems pretty simple + robust across different kernels.

I mostly agree with the above, but @agoose77 - I’m curious why the DAP road rather than the user_expressions one (that can return a mime-bundle)? It seems the latter might be simpler to get going, if less sophisticated than DAP? It’s much older and primitive, for sure (and as noted by @bollwyvl rarely used), but in fact we put that in the spec ages ago pretty much with this use case in mind… If it’s insufficient for some reason then the DAP approach may be needed, but I don’t see that quite yet, perhaps I’m missing something?

Also - for now the IMarkdown “cell type” could be as simple as a metadata tag, so that these notebooks render and work normally with other tools, no? Or do you see a need for a custom cell type in the document format itself?

Ha! I mentioned in reply to @stefanv that I think the DAP might be overkill / unnecessary indirection here, with just invoking the kernel directly in mind. However, as you point out, user_expressions would seemingly be even better. How prescient of you and the IPython team!

Or do you see a need for a custom cell type in the document format itself?

I think we do need a new cell type, because Markdown cells don’t have outputs field — we can’t store any mimebundles for a Markdown cell.

Long term, I think a proposal a la Notebook Cell-Type Generalisation would be best. Maybe if I work on this at some point, I’ll do both at the same time!

1 Like

Sure? I’d imagined all this - cell tag and a dict of expressions with output bundles would all live in the metadata for the cell, overwritten by the extension. Doesn’t that work, you think?

Ah I see, I’m crossing wires slightly here.

The bigger picture in my mind is that we might benefit from de-regulating the kind of cells we have in notebooks, in favour of a model whereby the notebook contains contents (with mime types), and the frontends interpret the contents according to these MIME types. We already do this in one sense, but for a tiny subset of possible MIME types (text/markdown, text/plain, and text/plain+code e.g.).

In this proposal for inline-markdown rendering, “conventional” viewers would not support the rendered output anyway, they’d only have a mangled version of the GFM interpretation of the markdown. Given this, I am tempted to just move to a new cell type because it’s future proof (in my thoughts of where the notebook needs to go).

However, a PoC doesn’t need to go that far, and my ambitions for notebook schema evolution can take a back-seat! If we want to keep things simple, we can definitely do this all-in metadata, it just complicates the implementation a little more (and in other ways, it’s simpler :man_shrugging:).

1 Like

Yup, I suggest that, partly for historical reasons: we used to have a bunch of cell types at the start (headings, etc) and backed away into today’s simpler model. I’m not saying that with ~10y of experience we can’t revisit that decision, but that is a much, much more complicated discussion with a massive impact surface.

The “shove it all into metadata and interpret it creatively” approach may be a bit hackish, but it does offer a ton of leg room for experimentation, degrades reasonably gracefully in other clients/frontends, and can offer a fairly seamless update path in terms of UX if later on a core format change is deemed worthwhile.

But given how important I think this idea is, and that we now have all the pieces to do it in place, I would hate to see the experiments bog down in a discussion about the core format. I suggest running with the custom syntax + user_expressions + metadata + jupyuterlab/nbclient extensions for now, which can teach us a LOT and give people something to use in short order…

If from those lessons deeper changes (to Lab, format, protocol) come back, fine. But they’ll then be backed by usage and experience, and will thus be much easier to discuss/integrate…

Thanks a lot for all the work you’ve been doing on the project, BTW!! It’s awesome to get to meet new faces contributing so collaboratively, much appreciated :slight_smile:

2 Likes

I went into more detail in a conversation a few years ago with Spencer Lyon on Gitter: https://gitter.im/jupyterlab/jupyterlab?at=5a877e177685a046389baa3e

More specifically: https://gitter.im/jupyterlab/jupyterlab?at=5a87899ac3c5f8b90dc6131b

From the top-down: you want to create an extension like https://github.com/jupyterlab/jupyterlab/blob/987702e76cb17265fa249ba67775e35804a69077/packages/notebook-extension/src/index.ts#L303
you’ll also want to subclass https://github.com/jupyterlab/jupyterlab/blob/4bfbf2def3a5778b0d57e9aaca08487e8a0d1b62/packages/notebook/src/panel.ts#L444
and your extension returns your subclass instance
the class inheritance diagram goes like NotebookPanel content factory is a subclass of Notebook content factory is a subclass of cell content factory
so you subclass the notebook panel content factory, which means you inherit the createMarkdownCell method and override that.
in pseudo-pseudo code:

const factory: JupyterLabPlugin<NotebookPanel.IContentFactory> = {
  id: '@jupyterlab/my-notebook-extension:factory',
  provides: NotebookPanel.IContentFactory,
  requires: [IEditorServices],
  autoStart: true,
  activate: (app: JupyterLab, editorServices: IEditorServices) => {
    let editorFactory = editorServices.factoryService.newInlineEditor;
    return new MyNotebookPanelContentFactory({ editorFactory });
  }
};

This may be a bit dated at this point, but I think the principles still hold, that it should be possible to create a new notebook viewer with a custom markdown cell renderer.

1 Like

I wholeheartedly agree. That seems to be the best path forward to me too.

1 Like

I’m happy to pair with someone for an afternoon (probably not in the next week or so, but before the end of the month) to get them started on writing a JupyterLab extension that would introduce a new notebook type with a custom markdown renderer, which could communicate to the kernel to get user expressions to render.

3 Likes

Here is a tech demo I made years ago exploring some possibilities around rich javascript input in a code cell: jsfiddle of codemirror live widgets · GitHub

3 Likes

That’s a wonderful offer @jasongrout, thanks so much! I’d certainly join in just to learn a bit more about the APIs, and if you don’t mind, I think having a recording of this (totally informal, live, as-it-goes) could be very useful for others as well!

2 Likes

Hey I would like to help with this. I set up a doodle to start this experiment based on @jasongrout time slots suggestion.

https://doodle.com/poll/b9gga38v5q6q2pig?utm_source=poll&utm_medium=link