# Aim
Within jupyter-book, and EBP in general, it is a desire for users to be… able to embed the outputs of Jupyter Notebook code cell execution within the body of their documents.
For example, referencing the value of a calculation:
```python
a = 1 + 2
```
with some form of "placeholder" syntax:
```markdown
The result is {{ a }}
```
As well as simple variable referencing, one would also like to embed "richer" outputs, such as images and HTML.
1. The abstraction should aim for potential implementations across different editing/rendering platforms (Jupyter Lab, Sphinx, VS Code, Curvenote, etc)
2. The abstraction should aim to be (Jupyter) kernel agnostic, i.e. work across the range of possible: kernels https://github.com/jupyter/jupyter/wiki/Jupyter-kernels
3. Embedding should allow for both inline and block level components
4. It is desirable for the process to be a simple as possible for the user
5. It may be desirable to embed code outputs in a cross-document manner, i.e. one can embed a code output from one document in another
6. Caching of the
# Sphinx recap
Before discussing potential abstractions, and their pros/cons, it will be helpful to recap the basic sphinx build process phases:
1. For each document:
- (myst-nb + notebook only) the notebook is executed, if necessary, populating all code cell outputs
- The source text is parsed to an Abstract Syntax Tree (AST), which is agnostic of the eventual output format (HTML, LaTeX, ...)
- Transforms are applied to the AST, to apply changes that require knowledge of the full AST
- Certain variables are extracted to a global "database", known as the environment, such as reference targets
- The AST is cached to disk, so that re-builds only have to re-parse modified documents
3. The global environment is cached to disk
4. For each output format
- Post-transforms are applied to each cached AST, to apply changes that require knowledge of the full project, such as inter-document referencing (using the global environment)
- All ASTs are converted into the output format
One difficulty with the outputs of Jupyter notebook code cells, is that they can provide multiple output formats (a.k.a mime types), which can only be "selected" in phase (3)
# Potential abstractions
A number of potential abstractions are discussed below, with their pros and cons
## Current myst-nb glue abstraction
In myst-nb v0.13, there is the `glue` function & roles/directives.
This is implemented for IPython kernels only, whereby one "binds" a variable name to a variable by the `glue` function:
```python
from myst_nb import glue
a = "content"
glue("variable_name", a)
```
and placeholder syntax look like:
````markdown
Inline: {glue:}`variable_name`
Block:
```{glue:} variable_name
```
````
All mime types for such outputs (such as `text/plain`, `text/html`, image/png`, ...) are saved to a cache, during phase (1) of the sphinx build.
Then, during phase (3), placeholder roles/directives are replaced with a suitable mime type for that output format, taking the mime type's content and converting it to AST, before injecting it into the document's AST.
Pros:
- ✅ It is relatively simple for users to use
- ✅ it provides a one-to-one mapping between variable name and variable output
- ✅ All required outputs are saved in the Jupyter notebook (i.e. can be parsed without a live kernel)
- ✅ It works cross-document
Cons:
- ❌ It is not kernel agnostic
- ❌ It requires that variable names are unique across the whole project
- ❌ It would be very difficult to implement outside of sphinx
- ❌ In sphinx, because the outputs are only converted to AST in phase (3), it means that important AST transformations from phase (1) can be missed, and need to be retroactively applied, making the conversion "brittle"
## Refactored myst-nb glue abstraction
The refactor in https://github.com/executablebooks/MyST-NB/pull/380 is not primarily aimed at `glue`, but it does intrinsically change how it works. It primarily addresses the issue of AST creation in phase (3), moving it to phase (1).
In its current form, the implementation precludes cross-document use, a proposal though is to use the form:
````
```{glue:} variable_name
:doc: docname
```
````
This would fix the issue of requiring variable names to be unique across the project
It would require a "bi-modal" approach though, whereby `glue` without the `doc` option would proceed by directly converting outputs to AST in phase (1), but with the `doc` option AST would still need to be generated in phase (3)
## Using code cell IDs (or metadata)
As discussed above, a big issue with the `glue` abstraction above, is that it is only currently implemented for Python, and would require different implementations for different kernels.
One way round this, is to assign an ID to each code cell, then use this as the reference for embedding code outputs.
This ID could either be assigned within the cell's metadata or also now via the recent addition of: https://nbformat.readthedocs.io/en/latest/format_description.html#cell-ids
- ✅ It is kernel agnostic
- ❌ The metadata/ID fields of a code cell are less accessible to users, for example
- VS Code has no front end access to metadata or ID (see https://github.com/microsoft/vscode-jupyter/issues/1182)
- Jupyter lab has access to metadata but, as of v3.2.9, no access to the ID field
- ❌ A "cell wide" ID does not bind a variable name to a specific output, i.e. is a one-to-many mapping
For example, if one had a code cell like:
```yaml
id: cell-id
source:
```
```python
import sys, IPython import display
print("stdout")
print("stderr", file=sys.stderr)
IPython.display.display(1)
2
```
This cell actually has four outputs, and so this may require additional logic, to specify which output is being referred to (or limiting to only the final output).
## Using the `user_expressions` kernel feature
`user_expressions` are a feature of the Jupyter client/kernel, which allow expressions to be evaluated after execution of the code cell's main content, and bound to variable names, see: https://jupyter-client.readthedocs.io/en/stable/messaging.html#execute
It would be implemented for example like:
```yaml
user_expressions:
variable_name1: a
variable_name1: b
source:
```
```python
a = 1
b = 2
```
This overcomes an issue with the above cell ID:
- ✅ it provides a one-to-one mapping between variable name and variable output
However, similar to IDs
- ❌ `user_expressions` are not currently implemented for any Notebook editor/renders
Additional to this limitation, it should be noted that this feature of the client is quite under-documented and, appears to be unimplemented in some kernels.
The IPython kernel's implementation is to call https://docs.python.org/3/library/functions.html#eval on each expression: https://github.com/ipython/ipython/blob/d9b5e550b673db900a08d03740ec0ce94e1b8feb/IPython/core/interactiveshell.py#L2606-L2631
This is somewhat problematic, since it means that it is technically possible for the expression to change the "state" of the python interpreter. This makes the order of execution important, and one feels it would have been a better design choice to make the `user_expressions` format a list rather than a dict.
For nbclient, a proof-of-principle implementation can be found at https://github.com/jupyter/nbclient/pull/160
## Using dynamic kernel injection
A somewhat radically different approach, would be to allow the Jupyter client to evaluate variables within the Markdown cells, during execution.
For example, as demonstrated in https://github.com/executablebooks/MyST-NB/pull/382
````
```{code-cell}
a=1
```
First call to {eval}`a` gives us: 1
```{code-cell}
a=2
```
Second call to {eval}`a` gives us: 2
````
Here, the user does not need to provide any "additional" binding of variables to variable names, it simply utilises the binding already present in the target kernel language.
As shown, the variable's output is also specific to where in the documentation it is evaluated, dependent on the state of the kernel at that point in the execution flow.
Pros
- ✅ Requires no extra input from the user
- ✅ it provides a one-to-one mapping between variable name and variable output
- ✅ It is kernel agnostic
Cons
- ❌ It would not work cross-document
This is also somewhat similar to https://github.com/agoose77/jupyterlab-imarkdown, which arose from the discussion in https://discourse.jupyter.org/t/inline-variable-insertion-in-markdown/10525/126.
Here, the outputs of such evaluations are stored as attachments, on the markdown cell.