Inline variable insertion in markdown

Thanks for that historical perspective. It is helpful to see that this has been thought through before. Reading through that PR, I’m reassured that there is actually a decent amount of agreement on the patterns that are there. Interesting that this brainstorm also converged on a similar syntax ({{. some_expression }}). I agree with your assessment that the biggest challenge is just agreeing on something and going with it (this is the same problem that the CommonMark community has).

I think the important two questions to ask are

  • should this feature exist?
  • what should it look like from a user’s perspective?

To me, the answer to “should it exist?” is obviously yes. This is a massively popular part of the RMarkdown ecosystem, it has a lot of people requesting it in the ipython issue, and I think there’d be a huge benefit to Jupyter users to allow for similar behavior.

For the second answer, I think that here we have a relatively clear target to shoot for: It should be as easy or easier than using RMarkdown. This is the comparison that people will make when they think about a feature like this, and this pattern clearly already works well enough.

So in my opinion, anything that adds much extra user-facing complexity on top of

The value of myvariable plus 2 is `r myvariable + 2`!

is too much complexity. This is why I don’t think we can use a magic command, or require people to change cell types, etc. All of that will feel cumbersome to users, and the benefit of quickly writing something will be lost.

I think that this second question is still where we are right now. What should the syntax be? What is the expected behavior in an interactive setting, and in a non-interactive setting?

Is it realistic that we can answer some of these questions without knowing all of the details of what an implementation would look like? Obviously implementation is important, and it is clear there could be a lot of complexity there. But the important thing is that people agree on the “what” and the “why”, so that we know that this is worth doing in the first place, and so that we have some constraints about what it must look like from a user’s perspective.

And secondarily - I still wonder if there’s some way that we can prototype something in the form of an extension, rather than require it to be part of core from day 1. Perhaps this would make it easier to experiment and iterate. Maybe @krassowski’s idea of a different “type” of markdown cell could be a way to experiment?

I’d like to highlight the IPython issue 2958 that Chris mentions (sorry, I would link, but new users on this forum can only include 2 links in a post), and that follows up on Matthias’s original implementation. I see developers giving the idea the thumbs up and users enthusiastically supporting it. Even Matthias’s original PR seemed like it could potentially have made it through with some more work.

We should not over-complicate things. Users may not care about the 1% of use cases that we cannot support (very complex expressions that need special escaping, e.g.), or expressions that raise exceptions (the result can simply be rendered as some red text). The feature is so useful that it is worth finding a way to do it, even if it isn’t universally perfect.

I don’t know why the concept of static Markdown cells have become important along the way. Is there a reason we care about our text cells having predictable output? If this is a rendering speed issue, we can fill in the display async as @minrk suggested.

With the “send-string-to-kernel” approach, there is no need for any kernel to be modified; we can re-use the existing cell execution pathways. I think Brian also supports this route.

[Edit] Matthias added the link.

1 Like

I think a lot of reasons are historical ones, keep in mind that 8 years ago, the main way of sharing notebooks was still emailing back and forth. So the balance of why or why not might have changed.

I think we can go ~90% there with an extension, there will be som UI/UX in live notebook to think a lot about (how to re-execute a markdown cell without showing its source for example).

I’m happy to pitch ideas/review PRs or prototypes, but not sure I’ll have any bandwidth to do any actual implementation.

If this is something critical enough we could make that part of a grant.

3 Likes

Do you know of anyone who would be willing to do this work? Perhaps this can be funded with one or two rounds of a NumFOCUS SDG, for example.

1 Like

I am definitely interested in making this happen, though it’s still a bit unclear to me what it would take to make it happen. I think there are a few moving pieces here:

  1. Deciding on a path forward
  2. Understanding who is needed to agree to that path forward
  3. Getting buy-in from that group of people
  4. Doing the work

I think that we can realistically get funding for maybe 1 and 4, though it is unclear to me who 2 is, which makes it difficult to know how to do 3.

If we can scope this as an extension of some kind then maybe 2/3 is just a matter of “whoever is interested in working on the extension” and then this can be much simpler. The potential downside there is that there’s no guarantee that it would be broadly adopted, but I think that’ll be more likely if there is a working prototype that people like. That’s how I’ve seen many other things ultimately make their way into core.

Either way I am happy to write proposals and provide collaboration at the design level. I don’t have time to do the actual development, though potentially we could try and find somebody in the Executable Books community, or in 2i2c to work on it

1 Like

Not sure it needs more +1s, but to add to the “should it exist” my most enthusiastic YES. We’ve talked about this for ages and it’s something we discussed well before even RMarkdown existed. For various reasons in our ecosystem we never fully implemented it (sometimes I wonder if the existence of reST with some of its worst design and implementation decisions has hurt us more than helped in the long run, but I digress). But I think it’s unquestionably a needed idea, and now between MyST (that gives us some design freedom away from pure reST) and JupyterLab (that gives us better extensibility to explore UI ideas), we should go ahead.

I’ll dig deeper into the rest of the thread later, but I wanted to chime in about how happy it made me to stumble on this tonight :slight_smile:

2 Likes

Hey,

I wanted to bring a different perspective on this.

Looking at document oriented softwares like good old Office or Confluence, they have the concept of embedded object (OLE technology for Microsoft, Macro component in Confluence). One of the reason for those external component is to provide a tool better suited for writing a piece of documentation. Notebook could allow such a behavior by supporting more type of cells (called viewer later); like a plotly chart editor cell, drawio cell, … . The source of the cell would be the required model (like a json for plotly). But to have those type of cells really powerful (on the opposite of the afford mentioned softwares), it should be possible for some viewers to link variable views from the kernel into their model. So the user is offer to process data efficiently through code cell. But some more visual tasks like annotating a graph is more easily done with plotly chart editor for example.
And so markdown cell would be a variant of those non-code cells.

The request variable view type will be tight to the viewer need through mimetype (like plotly would ask application/json view). And we could image that the markdown viewer is constrained to text/plain; other views required either to use a code cell or another viewer - injecting a graph or a table view in markdown is easier solved by adding a intermediate code cell in the short run anyway.

1 Like

Wanted to leave a quick note here saying I am also very enthusiastic about this concept/direction! I am working on curvenote.com which is a downstream user of jupyter outputs, myst - and helps with creating and collaborating on scientific documents and weaving in Jupyter results. Our editor embeds variables/outputs (some examples here), although up to now we have been mostly focused on JS not Jupyter. We have been working with @choldgraf and others on some of the MyST in javascript implementation, which may be able to help on some of the frontend user interfaces down the road (certainly through an extension at least).

I think there are some amazing opportunities for educational content and scientific writing for embedded interactivity through some combination of thebe/jupyterlite/myst and this new syntax. And to some of @choldgraf’s points, even very small steps towards this style of rendering (e.g. static, post processing in JupyterBook) can have large impact (and be done mostly through extensions to start with?).

Keeping an eye on this, and hopefully may be able to contribute time to some aspects through working with @choldgraf/Executable Books!

3 Likes

I was wondering how hard would hard it be to implement this feature alongside storing notebooks in plain markdown/MyST and have them converted in JSON in the background with Jupytext (perhaps in .jupyter_cache/). In this way, quick modifications could be done on the fly with any editor, without needing to turn on a server each time. (I know this seems pretty unrelated or might ignore that the community already discussed it and does not like this)

For static Jupyter Book outputs, I started to explore some simple magic that attempts to use the markdown contents of a code cell as a Python f-string then render it as HTML cell output.

The original code cell then needs hiding in the output Jupyter Book which could be automated with a simple script that makes sure a code cell magicked as f-string is tagged as hide-input.

2 Likes

Although the detailed technical comments are very interesting, we need high level agreement that (a) this feature is wanted and (b) it is feasible. I think the answer to both is “yes”.

The last is proven by the existing notebook infrastructure, which already supports code cells. If you then start thinking of Markdown cells simply as a combination of text cells (already implemented) & code cells (already implemented), little new is needed other than specialized rendering & control for these composite cells.

To those who know the Jupyter decision making process: what would be the way to get this approved for development?

TL;DR

I like the aesthetics of this idea, but I’m not sure that we can / want to do this in a completely kernel agnostic manner.

Context

This topic is one that I’m deeply interested in. I remember discussing the concept of “rich markdown” (concerning widgets) with @jasongrout a while back. We both (IIRC) expressed an interested in the ability to have a WYSIWYG kind of editor for widgets-in-text. Other products like Observable have this feature, which gives something to experiment with. Fundamentally, in-line widgets are a similar problem to in-line variables — rendering kernel variables in-line in markup.

I think others have raised the question of

What is a notebook for?

and this captures my philosophical concerns for this idea. I’m not going to state what I think a notebook is, because I think I need to give that more thought myself.

Additionally, there are already other conversations happening at the moment that ask:

How can we make notebooks more reproducible w.r.t embedded widgets / interactive outputs?

and I think this thread is related.

Using JupyterLab-Markup

Before I go any further, we could bolt on an interactive Markdown renderer on-top of the existing Jupyter notebook, and it would work. A brief discussion of this follows. However, I think now is a good time to discuss wider changes that make the Notebook more forward-looking, and this is discussed in the next section.

As @bollwyvl mentioned, we could already make a good start on this with jupyterlab-markup. It wouldn’t (famous last words) be that hard to write a plugin that embeds variables in markup (using the DAP, I think), and for non-reactive notebooks this would work reasonably well. It would also support widgets (I believe), which would be interesting. As Nick states, the bigger Jupyter-scale problem is that if we start investing heavily into the jupyterlab-markup route (for this plugin or others) without some way to demand that these extensions (be-it in JupyterLab, or in any frontend) be supported, we start to move towards the old-style Notebook free-for-all. More-over, we lose the robustness of Markdown rendering as a universal feature.

Add Interactive Markdown Cells?

To my mind, this should not be something that is performed by the frontend, at least not directly. Requiring a kernel to render Markdown would preclude running such a notebook in a non-kernel backed viewer (something that several commenters have pointed to).
What we could do is establish a special MIME-bundle that represents a Markdown object that containers rich display objects, i.e.

{
   content: "Live long and {{ what }}",
   data: {
      "what": {
          "text/plain": "prosper"
      }
   }
}

Here, the kernel is responsible for sending over these mimebundles, but it doesn’t need to know how they are assembled, and is no longer required after the cell has been executed (i.e. the notebook can be viewed in nbviewer).

Here’s a mockup based upon IPython’s Markdown object:

from IPython.display import IMarkdown
what = "prosper" 
IMarkdown(
   "Live long and {{ what }}",
   data={
      "what": what
   }
)

or

%%imarkdown what=what
Live long and {{ what }}

We could parse the Markdown in the kernel (e.g. via magics) to look for variable definitions, but as soon as we start needing to infer the variables in the kernel, we require the kernel-library to maintain awareness over the syntax, which (as discussed) can vary between plugins.

For posterity, we can do a hack job by just using f-strings:

import ast
from IPython.core.magic import register_cell_magic


@register_cell_magic
def imarkdown(line, cell):
    shell = get_ipython()
    expr = ast.Constant(cell)
    formatter = ast.Call(
        func=ast.Attribute(value=expr, attr="format", ctx=ast.Load()),
        args=[],
        keywords=[
            ast.keyword(
                value=ast.Call(func=ast.Name(id="globals"), args=[], keywords=[])
            )
        ],
    )
    node = ast.Call(
        func=ast.Expr(
            value=ast.Attribute(
                value=ast.Call(
                    func=ast.Name(id="__import__", ctx=ast.Load()),
                    args=[ast.Constant(value="IPython.display")],
                    keywords=[
                        ast.keyword(arg="fromlist", value=ast.Constant(value="display"))
                    ],
                ),
                attr="Markdown",
                ctx=ast.Load(),
            )
        ),
        args=[formatter],
        keywords=[],
    )
    shell.run_cell(ast.unparse(node))
%%imarkdown
Live long and { what }

But this is just to make a point.

2 Likes

Thanks @agoose77 for that write-up, I appreciate all of the thoughtfulness, and the inspiration in there!

A few thoughts from me:

I think we should have a working prototype that people can play around with and that can iterate more quickly than having a vote across the whole Jupyter ecosystem. That has been a successful model for other extension points in the JupyterLab world (e.g. jupyterlab-lsp, debugger, etc) and I think it could work here as well.

I also think we don’t want to make a perfect solution a blocker on iterative progress. In my opinion it is OK if we introduce some noisiness into markdown syntax etc, so long as it’s clear that this is at the prototype/experiment level, not “altering core Jupyter”.

I guess the question then is who is interested in experimenting on this?. I don’t mean “who wants to modify core JupyterLab”, I mean “who wants to play around with some combination of extensions, packages, etc that explore how this use-case could be made possible.”?

I care about this a good amount, it sounds like @rowanc1 is interested as well in building some prototypes. @agoose77 it would be awesome if you could provide some collaboration on the jupyterlab-markup side if this is of interest to you! (though, I don’t think that making progress here necessarily means building JupyterLab extensions)

It will be disappointing if we cannot land this feature in Jupyter itself. That would let us display properly rendered rich Markdown cells to the reader. Perhaps you are saying that the core team would need a proof of concept first in the form of an extension, but is it even possible to add a new cell type or to change the way cells are rendered this way?

If we are only talking about rendering inline expressions while executing a notebook (i.e. in a publishing step), that can already be achieved by pre and post processing a notebook: split up markdown cells into text & inline code, attach relevant metadata to each part, run it through an execution engine, then stitch everything together again.

Stefan, I have wanted something like this for a long time too, but I also think it makes complete sense to prototype it as an extension first… One of the downsides of Jupyter today is that it has a huge user base :slight_smile: That is a downside from the development/maintenance perspective: any changes in the core have a huge impact and are very hard to revert later on if they prove sub-optimal in some unforeseen way.

Thus our emphasis on less “committing” prototyping of new ideas around the edges and on having extension points everywhere…

The static post-processing part of this can certainly be implemented in JuptyerBook/nbconvert even today with custom tooling (though there may be API improvements to be made to facilitate it), while the UI/UX for interactive use should be doable in JupyterLab as an extension (that potentially takes advantage of metadata).

That will help us iron out all the details and explore graceful fallbacks: what would these look like when rendered on github, how would they interact with tools like jupytext or other frontends like pycharm or vscode, etc.). And off a working implementation, it becomes much easier to see whether core changes to APIs or formats are needed, or whether it’s a simple matter of say “blessing” an extension as a core JuptyerLab one that ships by default. But making that decision now would be premature, for both technical and social reasons, so I don’t see a problem with starting now to build working prototypes for both the static and the interactive cases.

1 Like

I probably didn’t express myself clearly, but what I was asking is exactly this: whether what we need (adding a new cell type and rendering it in a special way with kernel interaction) is doable as an extension.

I had a gut feel that it might require some surgery in the core of JupyterLab itself, but I would be glad if that weren’t the case (it would make our job here much easier!).

Further I was pointing out that proving the viability of a pre/postprocessing or cell magic approach isn’t all that interesting (to me), because we already know that will work—it’s just very clunky.

Just to mention here, as a principal maintainer on myst-parser, jupyter-cache, myst-nb and jupyter-book, I have some fairly concrete ideas about how I would go about achieving this (which I was already intending to do).
Just not the time to flesh them out here right this minute lol.

I would quickly though echo some concerns above, that I would be wary of the kernel having to start getting involved in Markdown parsing

2 Likes

The post-processing version may still have different needs from the live/interactive one, as there may be different output constraints on say a static PDF output compared to a JS-rendered one in a browser (just like the nice HTML render of say pandas dataframes doesn’t quite make it to LaTeX output).

But as to your first question: without having tried to build it, I’m not 100% sure that Lab right now has all the right APIs for this (I don’t know them well enough). However, I can imagine a number of ways to try and build this as a Lab extension that should work, and where if they hit a wall, the answer at first might be to request a given API improvement in Lab (which isn’t an immutable entity :slight_smile:) to facilitate things…

But for example, here @jasongrout pointed out some ideas on how to write a custom cell provider in Lab… It’s under-documented ATM but seems viable, so I do think we have the starting points we need…

3 Likes

To clarify my position (which is just that, only my opinion!), I’m considering the following axioms:

  • Notebooks should be viewable both on-line (with a kernel) and off-line (using nbviewer, etc)
  • Notebook frontends should avoid favouring a particular kernel (e.g. the original IPython kernel)

I don’t actually think this is impossible to do by any means, but I do think it will require some big changes (which can be done outside of the core). I stand by my earlier conclusions that we want to decouple the variable-loading from the Markdown rendering (to support the above points).

@stefanv et al. have convinced me to rethink the problem, and I think a Jupyter-first solution is actually possible, it just requires a little more work.

Here’s what I think we might be able to try:

  1. Add a new cell type (e.g. IMarkdown).
  2. Create a new “execution” mode that treats IMarkdown execution as an instruction to ask the kernel for variable mimebundles (e.g. via the DAP, if possible)
  3. Create an output mimebundle that effectively contains the variable mimebundles + the original markdown
  4. Create a Markdown renderer that can compose these mime-bundles into a rendered result.

This approach puts the burden of tokenizing the markdown into a single place — the frontend.
It would probably be straightforward to create a plugin that finds the variable templates {{ x }}, and stores the information. This information can then be retrieved to query the kernel. Once the results are available, the final composite mimebundle can be stored in the cell outputs, and the renderer invoked.

It is tempting to just do this in a near-single pass — parse the markdown, and store the rendered HTML output (with expanded template strings), but I don’t know if I like the idea of storing text/html in the output of the cell. Instead, I’d prefer to store the constituent parts, so that we can support widgets, and also avoid filling the notebook with generated output!

With this approach, the notebook document maintains the reproducible rich output, but whilst being easy to author.

I don’t have time to work on this right now, but I am interested.

Related but off-topic

In this thread I was thinking about custom cell types, and suggested replacing the non–code-cell types with a single MIME cell that specifies its MIME type. I wonder now whether the more bold proposal to make all cells MIME cells and use a kind of execution registry that invokes an executor for a given MIME type would be workable. In the context of inline variable insertion, we’d just register an executor for text/markdown that queries the backend as outlined above, and returns the processed results.

Hmm, I really want to work on this now!

3 Likes

Your four point plan seems spot-on to me, @agoose77! The only change I would make is that “variable” should become “expression”, but I suppose you could also interpret variable here as “inline snippet identifier”.

Regarding the point 4 concerns, one approach may be to first compose Markdown from cell text + inline outputs, and to then render that using Markdown-it.