Inline variable insertion in markdown

Errrm, I’m not quite sure of the relevance you are implying here :grimacing:?
I mean, I know the functionality because I wrote it lol, but this is dealing with “static” substitutions, where we know the variable values ahead of time (and is not strictly related to jupyter notebooks), whereas in this discussion we are talking about dynamic substitutions that need to go via the kernel which, to some extent, is a different kettle of fish

@westurner - thanks for these references. Just a quick note on Discourse: long comments (or many strings of comments) will make it harder for others to follow. For especially long comments with references in them, try either:

  1. Using the Discourse “hide details” syntax so that others can expand/collapse:
    [details="Summary"]
    This text will be hidden
    [/details]
    
  2. Or, linking out to pre-existing documentation

Thanks!

(also agree with @chrisjsewell’s point that the substitution functionality in MyST-NB is not quite what we’re talking about here, though the syntax is quite similar)

Yep but certainly standardised notebook-level metadata for markdown renderers might be nice.
I guess maybe this is the better place for that discussion? Notebook Cell-Type Generaliastion - #3 by chrisjsewell (although certainly a lot of overlap)

It’s really interesting to see how others view the same problem!

I’m looking at this from the perspective of kernel outputs are a special case of embedded data, whereas your perspective seems to be kernel outputs are the primary use case. Is that right?

In the former view, it makes sense to me that we re-use as much of the existing machinery as possible. But from your angle, we’d be trying to fit a square peg in a round hole. Maybe these need to be two different features; despite the overlap they’re not similar enough to share the same approach.

I think part of my uncertainty is that I’m reconsidering the nb as a much richer, extensible document, and this sits at odds with the more rigid structure we have at present. So, forgetting all that…

Let’s say that we add a new emarkdown cell type with additional property ‘outputs’, that contains a list of display_data or error types. I don’t think we should store the expression that generated them; this should be taken from the source in the same way that the code cell just has an ordered list of outputs (I’m not strong on this, but it seems like we want to keep things simple & stay closer to the kernel outputs). Perhaps the ability to embed MIME bundles is orthogonal to the ability to embed kernel expressions, and we can support both mechanisms, with kernel output only implemented for the emarkdown cell. It’s not necessarily a bad thing to specialise and separate general rich output from kernel-derived output.

As before, the notebook makes no guarantee about the syntax required to leverage this; the syntax needs to be enforced by some other (schema backed) field (e.g your post in the linked thread).

I don’t know whether the UX of having two very similar cell types is nice. I’d almost rather just have one cell type; maybe only emarkdown. If the execution count is only visible for cells with any expression outputs, that might keep visual noise to a minimum.

My intuition: from the perspective of an engineer, it makes sense to have those as two distinct cell types since the machinery would be quite more complex for executable markdown. From the perspective of a user, it will seem clunky to change a markdown cell to “another type of markdown cell” just to turn on one piece of syntax, unless the extra syntax were significantly different.

for example, if the syntax were only adding one extra token for {{ }}, that might seem clunky…but if it were adding a whole new variety of markdown tokens (e.g., MyST or RMarkdown or something like that) then I could see that being acceptable to users and maybe even preferable to be more explicit about “which flavor of markdown this cell contains” (though, we’d want a way for the notebook to easily set a default, because manually changing that every time would be a pain and I suspect most users would use 1 flavor per notebook).

(also to be clear, I think the idea of being able to define new kinds of cells in general is really cool and I love that idea as an extension point, I am just talking specifically about markdown right now)

How will these be parsed as distinct?

display(_kernel.video)  # : dict(mimetype=data)
{{video}} {# jupyter_book variable #}
{{#video}} {# kernel variable }}

Jinja has a:

jinja in yaml and safety

FWIW, here’s how jinja in yaml works in ansible; there’s a reason for safe_eval.py:
ansible/lib/ansible/template at c3fc8fb99a409d7555c1587697a3cfd78b7f3eb9 · ansible/ansible · GitHubinit.py
https://github.com/ansible/ansible/blob/c3fc8fb99a409d7555c1587697a3cfd78b7f3eb9/lib/ansible/template/__init__.py

Filters, loops, Lookup Plugins:
https://docs.ansible.com/ansible/latest/user_guide/playbooks_templating.html

[details=“ipython-beautifulsoup: [SEC: XSS: JS/CSS Injection]”]

[/details]

[details="ipython-beautifulsoup: [SEC: XSS: JS/CSS Injection]"]
https://github.com/Psycojoker/ipython-beautifulsoup/issues/2
[/details]
  • [x] Trust notebook content within the
  • [ ] Trust kernel output in a variable echoed within [YAML front-matter of] [MyST]+[e]markdown

I can’t remember exactly how Markdown cells, (ReStructuredText cells that precede nbsphinx ReStructuredText cells), and display({'text/html': render_md(cell_input)) differ in terms of escaping etc. Looks like I’m out of date: my notebook has different outputs for kernel outputs that have changed since the previous execution.

1 Like

Ah I see what you were getting at @westurner - it’s a good point that the final product will need to think carefully about syntax so that we don’t step on important toes. It is probably worth opening up a specific discussion around syntax to make sure it doesn’t accidentally step on another language’s toes. Also just a note that the {{ }} syntax is not part of the MyST Markdown “core syntax” - it’s an optional extension, so it is not “set in stone” as strongly.

@agoose77 would it be productive to open up an issue in your prototype repo to discuss potential syntax anti-patterns and converge on something reasonable as a first step?

1 Like

Always happy to consider other perspectives (maybe sometime begrudgingly, lol)

well I’d say maybe not so much the kernel outputs, but the actual interactions with the kernel; without a kernel its less an executable document, more just a document.
I feel it should be clear where there will be / has been interactions with the kernel

So yeh, as I’ve said before; I’m not against re-using the markdown cell per se. But then I think we have to agree that it will now become slightly more complex:

  • There needs to be an optional execution_count key (added if the kernel is called), and the render needs to show this, when present
  • I feel the attachments key (which essentially stores “static” mimebundles) should be separate from the key used to store the outputs from the user_expressions call to the kernel (which can dynamically change every time you execute that cell)
  • We don’t necessarily have to store the expression that generated them, I just felt from a “static analysis” perspective it would make it easier to guess whether re-execution is necessary (from the perspective of e.g. jupyter-cache)
  • Whatever syntax we decide, I imagine it will “mess up” at least one person’s existing notebook. Perhaps there should be something like an agreed cell tag to “turn-off” execution of a particular markdown cell?

IIRC the reason that ReStructuredText cell support was removed was due to the difficulty/probable_impossibility of sandboxing what’s effectively another - possibly even complete - language; .. include: etc were considered too dangerous by @takluyver IIRC.

Another use case for in and around markdown cells could just be display()'d like code cells (with an execution count) except probably with Jupyter’s markdown rendering library instead of the kernel’s and syntax highlighting:

  • I want to include Linked Data within Markdown, and then parse that out from (hopefully not in a second pass over the) outputs to collect all of the application/ld+json outputs and merge their @context with the notebook-level @context and metadata
    • Would returning {'text/html': ..., 'text/markdown_': ...., 'application/ld+json': ...} be the only way to include that Linked Data in an .ipynb - that’s potentially already JSON-LD from a top-level @context - without additional aggregation of application/ld+json outputs from parsed Markdown?

Actually this is not what the kernel protocol specified; user_expressions is a dictionary. So indeed if we want to preserve evaluation ordering, it will require changing the Kernel request or add a new on or clarify that keys order matters.


For now, here are examples of kernel replies:

Nominal case

{
	"user_expressions": {
		"jupyterlab-imarkdown-0": {
			"status": "ok",
			"data": {
				"text/plain": "IntSlider(value=0, layout=Layout(display='inline-flex'))",
				"application/vnd.jupyter.widget-view+json": {
					"version_major": 2,
					"version_minor": 0,
					"model_id": "018244e845c941f886bb20d4c0975320"
				}
			},
			"metadata": {}
		}
	}
}

Error case

{
	"user_expressions": {
		"jupyterlab-imarkdown-0": {
			"status": "error",
			"traceback": [
				"\u001b[0;31mNameError\u001b[0m\u001b[0;31m:\u001b[0m name 'a' is not defined\n"
			],
			"ename": "NameError",
			"evalue": "name 'a' is not defined"
		}
	}
}

So it looks like all the info are there (except the order although that could be enforced by ordering the dictionary keys).

Yeh I guess that depends on the semantics you apply to dictionaries; it may or may not be relevant, but certainly in python 3.7+ and JS ES2015+ dictionary ordering is a specification of the language.

I would give a toy example here from my nbclient draft PR:

{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "class A:\n",
    "    def __init__(self):\n",
    "        self.a = 1\n",
    "    def get1(self):\n",
    "        self.a += 1\n",
    "        return self.a\n",
    "    def get2(self):\n",
    "        self.a += 2\n",
    "        return self.a\n",
    "c = A()\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "attachments": {
    "md-expr-0": {"text/plain": "2"},
    "md-expr-1": {"text/plain": "4"},
    "md-expr-2": {"text/plain": "5"}
   },
   "source": [
    "# Variables\n",
    "\n",
    "{{ c.get1() }} {{ c.get2() }} {{ c.get1() }}"
    ]
  }
 ],
 "metadata": {},
 "nbformat": 4,
 "nbformat_minor": 0
}

So I think here order certainly matters (both the order you execute expressions, and the order you would substitute them back in to the Markdown)

I think @fcollonval is (correct me if I’m wrong) proposing a convention to order the keys, rather than requiring that the language itself support key ordering. I actually missed the point that user_expressions is not ordered at the kernel level (probably writing too much in one go!), which is why we use a key convention in the extension, and the expressions getter on XMarkdown cell is ordered.

@fcollonval & @chrisjsewell I’ve had a re-think about my position here, and (aggressively ignoring Notebook 2.0) I’m now leaning towards creating a new user_expressions property that contains the kernel outputs (effectively what’s proposed here). We already enshrine the kernel in other places in the schema, so doing it here also makes sense. There’s nothing to say we can’t later settle on a syntax to embed attachments, e.g.

This is an attachment {{ key-1 }} and this is the result of an expression ${{ 1 + 2 }}

In other words, it’s not the fact that we can “inject” rich outputs that warrants a new cell type; it’s the fact that we want to be able to use kernel outputs to do this. We could look at the attachment-injection as a separate feature that might apply to both markdown and emarkdown cells.
I am slightly more in favour of enforcing the ordering in the schema rather than via keys i.e. not exactly storing the result of user_expressions. Given that the results need to be ordered, let’s properly enforce this, and break from the kernel output in the process.

Both jupyterlab-imarkdown and @chrisjsewell’s proposed extension to nbclient I think are sufficient to demonstrate the concept well, so at this point I feel we would benefit from some wider input from the community (in particular long-standing contributors to Jupyter) to get a sense of where things should go.

2 Likes

Hmm, good question. I think this thread might be the best place for that, so that we maintain visibility and don’t segregate the conversation into one implementation?

@westurner makes a good point concerning shadowing existing syntax, even if it’s not a core feature. Given my thoughts about attachments & kernel outputs, I am strongly in favour of getting the syntax “right” for both of them at an early stage, even if we only deliver the kernel-variant to start with.

A small disclaimer: the following suggestions probably won’t quite work unmodified for polyglot kernels. Whilst I don’t use these myself, I don’t want to deny their ability to use this feature before it’s even ready for use! However, I don’t think we need to endorse any syntax in Jupyter Notebook itself - this can be done indirectly by defining Markdown extensions e.g. Notebook Cell-Type Generaliastion - #3 by chrisjsewell
Instead we’re settling upon reasonable single-kernel syntax that might become markdown-it-expression and markdown-it-attachment

I’ll offer up some suggestions:

Variant A

  • Kernel Injection: {{python `x` }}
  • Attachment Injection: {{ x }}

Variant B

  • Kernel Injection: ${{ x }}
  • Attachment Injection: {{ x }}

I prefer variant A, because embedding code inside single quotes is a natural idea, and it re-inforces the fact that this is an inline expression (i.e. not a block token). In both variants, we re-purpose the {{ }} syntax, though.

Indeed.
Hopefully to focus feedback, I have also created Proposed schema additions for inline variables by chrisjsewell · Pull Request #230 · jupyter/nbformat · GitHub with the proposed additions of the (optional) execution_count and expressions keys to the markdown cell (if that is the route we want to go)

A dollar suffix may be difficult to make compatible with dollar-math

Having “unequal” opening/closing syntax may make it easier to parse, e.g. something like <\ name \>
although maybe backslashes aren’t a good idea, because they are related to escaping, so something like <| name |>, or just << name >>

Perhaps. I’m not aware of any legitimate use for ${{ ... $ so we could take precedence here for ${{ character spans, but I’m all for something that doesn’t require much :slight_smile:

I’m just a bit wary of using braces or dollars (or combinations therein), which are already quite widely used

Clearly the world needs another templating syntax :wink:

I thought about something like this actually — I think the first syntax is slightly elegant (or <{ ... }> / <[ ... ]>). My only worry here is that {{ is easier to type and read than <|.

As long as we don’t settle on autotools @var@ I’ll be happy!

Just to round out the combinations lol: [[ name ]]

1 Like