How to refer to a specific variable or output of a cell in a notebook?

choldgraf · March 5, 2022, 3:09pm

Context

In Jupyter Book, we’d like a language agnostic way to grab the output of a cell in a notebook, and inject it into a page in a book. However, there isn’t an obvious way to do this and we’re considering a few options. I’m curious if anybody in the JupyterLab world has an intuition for what makes sense.

Two things we’re considering

Use the Cell ID. The notebook spec now has the notion of Cell IDs to uniquely refer to a cell. However, there is almost no UI around this, and most Cell IDs are programmatically generated, non-memorable names. The other problem is that you’d still need a way to refer to a specific output of a multi-output cell.
Use user_expressions. The notebook has a concept of user expressions, which is kind-of like a dictionary that stores the results of running expressions in a cell. This could be used to define a unique variable name that has the result of some expression stored in it. However, the UI and documentation around this notebook metadata is also very minimal, and many (most?) kernels don’t implement functionality for this.

Any other ideas?

It seems like any of the options we’ve discussed would require some refinement of either the interfaces, documentation, or metadata around notebooks, so there doesn’t seem to be a single obvious solution.

Are we missing anything obvious? And if not, which of the approaches above seems most reasonable to others?

You can find a discussion laying out some of these (and other) ideas here:

github.com/executablebooks/meta

Embedded code outputs abstraction

opened 07:37AM - 02 Mar 22 UTC

chrisjsewell

discussion

# Aim Within jupyter-book, and EBP in general, it is a desire for users to be… able to embed the outputs of Jupyter Notebook code cell execution within the body of their documents. For example, referencing the value of a calculation: ```python a = 1 + 2 ``` with some form of "placeholder" syntax: ```markdown The result is {{ a }} ``` As well as simple variable referencing, one would also like to embed "richer" outputs, such as images and HTML. 1. The abstraction should aim for potential implementations across different editing/rendering platforms (Jupyter Lab, Sphinx, VS Code, Curvenote, etc) 2. The abstraction should aim to be (Jupyter) kernel agnostic, i.e. work across the range of possible: kernels https://github.com/jupyter/jupyter/wiki/Jupyter-kernels 3. Embedding should allow for both inline and block level components 4. It is desirable for the process to be a simple as possible for the user 5. It may be desirable to embed code outputs in a cross-document manner, i.e. one can embed a code output from one document in another 6. Caching of the # Sphinx recap Before discussing potential abstractions, and their pros/cons, it will be helpful to recap the basic sphinx build process phases: 1. For each document: - (myst-nb + notebook only) the notebook is executed, if necessary, populating all code cell outputs - The source text is parsed to an Abstract Syntax Tree (AST), which is agnostic of the eventual output format (HTML, LaTeX, ...) - Transforms are applied to the AST, to apply changes that require knowledge of the full AST - Certain variables are extracted to a global "database", known as the environment, such as reference targets - The AST is cached to disk, so that re-builds only have to re-parse modified documents 3. The global environment is cached to disk 4. For each output format - Post-transforms are applied to each cached AST, to apply changes that require knowledge of the full project, such as inter-document referencing (using the global environment) - All ASTs are converted into the output format One difficulty with the outputs of Jupyter notebook code cells, is that they can provide multiple output formats (a.k.a mime types), which can only be "selected" in phase (3) # Potential abstractions A number of potential abstractions are discussed below, with their pros and cons ## Current myst-nb glue abstraction In myst-nb v0.13, there is the `glue` function & roles/directives. This is implemented for IPython kernels only, whereby one "binds" a variable name to a variable by the `glue` function: ```python from myst_nb import glue a = "content" glue("variable_name", a) ``` and placeholder syntax look like: ````markdown Inline: {glue:}`variable_name` Block: ```{glue:} variable_name ``` ```` All mime types for such outputs (such as `text/plain`, `text/html`, image/png`, ...) are saved to a cache, during phase (1) of the sphinx build. Then, during phase (3), placeholder roles/directives are replaced with a suitable mime type for that output format, taking the mime type's content and converting it to AST, before injecting it into the document's AST. Pros: - ✅ It is relatively simple for users to use - ✅ it provides a one-to-one mapping between variable name and variable output - ✅ All required outputs are saved in the Jupyter notebook (i.e. can be parsed without a live kernel) - ✅ It works cross-document Cons: - ❌ It is not kernel agnostic - ❌ It requires that variable names are unique across the whole project - ❌ It would be very difficult to implement outside of sphinx - ❌ In sphinx, because the outputs are only converted to AST in phase (3), it means that important AST transformations from phase (1) can be missed, and need to be retroactively applied, making the conversion "brittle" ## Refactored myst-nb glue abstraction The refactor in https://github.com/executablebooks/MyST-NB/pull/380 is not primarily aimed at `glue`, but it does intrinsically change how it works. It primarily addresses the issue of AST creation in phase (3), moving it to phase (1). In its current form, the implementation precludes cross-document use, a proposal though is to use the form: ```` ```{glue:} variable_name :doc: docname ``` ```` This would fix the issue of requiring variable names to be unique across the project It would require a "bi-modal" approach though, whereby `glue` without the `doc` option would proceed by directly converting outputs to AST in phase (1), but with the `doc` option AST would still need to be generated in phase (3) ## Using code cell IDs (or metadata) As discussed above, a big issue with the `glue` abstraction above, is that it is only currently implemented for Python, and would require different implementations for different kernels. One way round this, is to assign an ID to each code cell, then use this as the reference for embedding code outputs. This ID could either be assigned within the cell's metadata or also now via the recent addition of: https://nbformat.readthedocs.io/en/latest/format_description.html#cell-ids - ✅ It is kernel agnostic - ❌ The metadata/ID fields of a code cell are less accessible to users, for example - VS Code has no front end access to metadata or ID (see https://github.com/microsoft/vscode-jupyter/issues/1182) - Jupyter lab has access to metadata but, as of v3.2.9, no access to the ID field - ❌ A "cell wide" ID does not bind a variable name to a specific output, i.e. is a one-to-many mapping For example, if one had a code cell like: ```yaml id: cell-id source: ``` ```python import sys, IPython import display print("stdout") print("stderr", file=sys.stderr) IPython.display.display(1) 2 ``` This cell actually has four outputs, and so this may require additional logic, to specify which output is being referred to (or limiting to only the final output). ## Using the `user_expressions` kernel feature `user_expressions` are a feature of the Jupyter client/kernel, which allow expressions to be evaluated after execution of the code cell's main content, and bound to variable names, see: https://jupyter-client.readthedocs.io/en/stable/messaging.html#execute It would be implemented for example like: ```yaml user_expressions: variable_name1: a variable_name1: b source: ``` ```python a = 1 b = 2 ``` This overcomes an issue with the above cell ID: - ✅ it provides a one-to-one mapping between variable name and variable output However, similar to IDs - ❌ `user_expressions` are not currently implemented for any Notebook editor/renders Additional to this limitation, it should be noted that this feature of the client is quite under-documented and, appears to be unimplemented in some kernels. The IPython kernel's implementation is to call https://docs.python.org/3/library/functions.html#eval on each expression: https://github.com/ipython/ipython/blob/d9b5e550b673db900a08d03740ec0ce94e1b8feb/IPython/core/interactiveshell.py#L2606-L2631 This is somewhat problematic, since it means that it is technically possible for the expression to change the "state" of the python interpreter. This makes the order of execution important, and one feels it would have been a better design choice to make the `user_expressions` format a list rather than a dict. For nbclient, a proof-of-principle implementation can be found at https://github.com/jupyter/nbclient/pull/160 ## Using dynamic kernel injection A somewhat radically different approach, would be to allow the Jupyter client to evaluate variables within the Markdown cells, during execution. For example, as demonstrated in https://github.com/executablebooks/MyST-NB/pull/382 ```` ```{code-cell} a=1 ``` First call to {eval}`a` gives us: 1 ```{code-cell} a=2 ``` Second call to {eval}`a` gives us: 2 ```` Here, the user does not need to provide any "additional" binding of variables to variable names, it simply utilises the binding already present in the target kernel language. As shown, the variable's output is also specific to where in the documentation it is evaluated, dependent on the state of the kernel at that point in the execution flow. Pros - ✅ Requires no extra input from the user - ✅ it provides a one-to-one mapping between variable name and variable output - ✅ It is kernel agnostic Cons - ❌ It would not work cross-document This is also somewhat similar to https://github.com/agoose77/jupyterlab-imarkdown, which arose from the discussion in https://discourse.jupyter.org/t/inline-variable-insertion-in-markdown/10525/126. Here, the outputs of such evaluations are stored as attachments, on the markdown cell.

Topic		Replies	Views
Inline variable insertion in markdown Notebook notebook , feature-idea , markdown	137	42331	June 29, 2023
Referenced variables in markdown do not appear when exporting to other format like HTML nbconvert help-wanted	0	1287	March 5, 2021
Jupyter code cells in Google docs, possible or impossible? General	10	8963	August 28, 2020
From external program, append cell, execute, and show output in a running Jupyter notebook JupyterLab jupyterlab	3	2502	February 12, 2022
[julia] Accessing variable contents inside of markdown cells Notebook how-to , help-wanted	0	960	February 2, 2021

How to refer to a specific variable or output of a cell in a notebook?

Context

Two things we’re considering

Any other ideas?

Related topics