Feature Idea: A specification for notebook output dependencies

As someone who has done a lot of work in the Jupyter Book ecosystem, one of the biggest pain points comes from trying to render a notebook’s outputs outside of the environment where they were generated.

This is especially true for interactive outputs - things like Bokeh, plotly, ipywidgets, etc. While the outputs do make it into the notebook (setting aside widget state etc), one has to manually load the relevant JS libraries in order to visualize those outputs. This means that you have to look through documentation and often codebases in order to remember to manually load the right library.

As an example, see this bqplot example in thebe/. The first cell is to manually include require.js as well as the relevant JS library for the visualization. Without this, the page won’t know what to do with the output.

One way that some libraries have gotten past this is by bundling the entire JS blob with the outputs themselves (I believe some of plotly’s renderers do this). However, this is sub-optimal because the notebooks are huge and even less diffable before.

One solution: make a standard

I think it would be really useful if there were a standard around how to store dependencies of cell outputs. For example, the output metadata could have a dependencies/ structure. This could have references to JS libraries or other programmatic instructions for displaying (or at least saying what’s needed in order to display) the outputs of the cell.

If this existed, a downstream environment (like a static HTML page rendered by Jupyter Book). Could simply look at an interactive output’s dependencies field, and then have all of the instructions needed to recreate that cell output.

Do others think that this would help as well? Or have ideas for a different way to solve this issue? Would love to know what people think.

(also I know that @saulshanabrook has done some thinking along these lines. I also recently had a chat with @echarles where he noted similar experiences in re-using widgets. so pinging them)

1 Like

A concrete use case I have faced last week was around exporting (with papermill or nbconvert) a jupyterlab notebook with outputs created by jupyterlab extension (like plotly or a jupyterlab rendermime extension). Those limits have driven us to recommend using for now a ipywidgets HTML object that can be exported. We are missing indeed some important features that impact our users. I guess the additional dependencies would help. Do you see it as a change in the nbformat specs?

PS: More example on issues I have faced sometime ago logged/discussed on VegaLite does not render with jupyter_execute_notebooks=force · Issue #266 · executablebooks/MyST-NB · GitHub

A initiative to spec outputs has been created by @saulshanabrook GitHub - Quansight-Labs/jupyter-output-spec: Rendering Jupyter outputs accross platforms with an implementation for jupyterlab by @blois GitHub - blois/js-module-renderer

The spec does not take into account any dependencies but it could be added I guess.

Here are some thoughts that I had from a while back around widgets specifically, but which I think apply to most rich visualizations- nes/portable-widgets at master · nteract/nes · GitHub.

Specifically the goals that Colab has are:

  1. Users can install and use arbitrary widgets, including upgrading or downgrading versions of pre-installed widgets.
  2. Later viewers of notebooks have a high-fidelity viewing experience, using the same version of widgets as the original author of the notebook.
  3. Widget authors can build widgets that work in multiple notebook frontends.

In support of this Colab exposes only a very minimal public API that we aim to keep stable in perpetuity. This API uses browser globals which works well with Colab’s iframed outputs but does not really work for JupyterLab- the above specifications and prototypes are a stab at an API that would work in more environments.

A significant difference here is that instead of having the full Widgets api there is just HTML/JS for rendering UI and Comms for communicating to the kernel. This is a model that I believe Bokeh uses where a plot can be rendered via HTML then it can ‘light up’ via comms when available.

I’m quite bullish on ES6 modules for dependencies and think that these libraries should be either standalone or have the ability to load their dependencies themselves. I worry about versioning issues that will preclude the above 3 goals otherwise.

I’m still very passionate about this and am happy to help out any way I can.

1 Like

Thx @blois for your inputs. I still need to think about all that, but am curious to know what you think about the webpack5 federation feature used by jupyterlab3 to solve dynamic deps loading, like eg. the example shown on module-federation-examples/App.js at 976379fb72033d128aa34b3fc13529a3a0cdcfef · module-federation/module-federation-examples · GitHub

Module federation seems like a great solution to a common problem. For this particular problem though I prefer the barest primitives possible to ensure long term stable APIs. I think that module federation should be able to be used on top of an API exposing only ES6 modules.

Module federation is commonly a solution for large dependencies and I’d really like to see many of these outputs be much lighter weight. Uses like JupyterBook should not require 800KB of JS just to display a button widget, especially if the rest of the UI can be mostly static HTML. This is easier said than done when libraries like VegaLite are >1MB.

I want to be clear that I believe output rendering should be independent of editor extensions. Extensions are great for enhancing the experience for the author of notebooks but the notebooks generated should ‘just work’ when viewed with no extensions installed.

Colab has thus far held a fairly hard line here- it doesn’t expose jQuery or requirejs or much else. It also does not include any native support for Vega, Plotly or Bokeh- they all must have rendering modes which emit plain HTML/JS. The hope is that these outputs are ultimately more portable- at the very least they eliminate versioning problems for Colab.

A few thoughts:

I’m not sure the best way to move forward on this- it almost feels like more of a working group between notebook renderers and visualization creators? I am very interested in this and would like to help move this forward but I firmly believe it needs to be a broad community effort.

1 Like

This is of course a long standing issue of balance between security, efficiency, portability, and extensibility, etc. We considered js-assets-over-websockets in the kernel spec, and more recently pure esmodules, a dedicated Jupyter CDN, etc.

Julia does do the js-over-websockets thing, and may be worth investigating… except it doesn’t really help this case.

Anything that relies on hard-coded links to CDN, etc. is basically a non-starter for many use cases, and is certainly not sufficient for archival grade documents, as services disappear, even ones run by FAANGMAs or whatever. As a community, we’ve been bitten time and again by putting any kind of special handling into Jupyter software for specific proprietary platforms…

But in the end, there’s nothing that has proven to keep working in browsers like dumb js/css files served over http, and or things in standards (other peoples’) and specs (ours).

One existing tool that is currently in the notebook schema is the attachments stored in (unfortunately) the cell level, and are not available on code cells. For notebook outputs of any particular weight, I would want exactly one copy of a bokeh, etc. but this feature seems like it might not work very well for this case. And dismissing interactive content because it’s too big may be false economy, when one base64-encoded png of a plot can be as heavy as visualization library.

That being said, I do see the servers and clients working within the existing mimebundle spec as probably the high road, with a goal towards lower-common-denominator outputs. vega, for example, already does this, as its mimerenderer updates the mimebundle with image/svg+xml. SVG is actually a fantastic container format, because of foreignObject, and the browsers finally, mostly support the SVG 1.1 spec.

This approach is also being explored for ipywidgets. https://github.com/jupyter-widgets/ipywidgets/pull/3107

3 Likes

One thing that I am trying to suss out is the difference between “notebooks need to be perfectly replicable out-of-the-box anywhere”, from “notebooks need to have enough information in them that somebody could replicate them”. Does that make the problem any more tractable?

Just coming from my case of Jupyter Book - right now if I want to add support to a book for some interactive viz library, I have to:

  1. Create the notebook
  2. Go to the library’s documentation, and figure out which JS files it uses
  3. Load those files manually in my book’s configuration
  4. Build the book

I’m trying to figure out if we can make it possible for a tool to automate the process of 2 so that they can do 3 programmatically.

AKA I’d like enough breadcrumbs in the notebook metadata that this could be solved semi-automatically, so that a notebook cell could say “my output was created by XYZ library of a particular version, and if you want to render it you’ll need these JS files somehow”. Then Jupyter Book could write infra that scans the outputs for this information, and loads those libraries however it sees fit.

I agree this wouldn’t be a failsafe solution, and will succumb to bitrot just like anything else, but it would still beat requiring a book author to look things up in the documentation every time they wanted to use Bokeh :slight_smile:

I think the breadcrumbs are the media types, and as hinted at above, a possible distribution mechanism is the brave new world of federated extensions. What may be missing upstream, then, is some modifications to the (still somewhat inaccurate) metadata on

  • refinement of jupyterlab's package.json schema
  • improvements to jupyter_servers's programmatic API to be able to request, gimme the list of all the extensions i need for vnd/whatever-viz-v1

If a mimeExtension looked more like:

"jupyter": {
  "lab": {
    "mimeExtension": {
      "path": "lib/mimerenderer.js",
      "mimetypes": ["vnd/whatever-viz-v1"]
    }
  }
}

Then, on the consumer side, a package creating static HTML would need to be able to:

  • determine the effective labextension search path
  • request all of the dependencies
  • copy them onto a hostable place in a structure that mimics what static/labextensions provides
  • put links to them on a page, with enough lumino junk to pull them in

Leaving the person who actually wants to build the path just having to add whatever-viz to requirements.txt… which they may well already have done, if they are planning to execute notebooks that contain the library.

This isn’t going to work on, say, nbviewer any time soon, but i think expecting extension authors to build standalone static, standalone HTML is… even less likely.

1 Like

Raised an issue for this.

1 Like

@bollywyvl - thanks for this explanation! Perhaps if any progress is made here, I can be a good test-case for “the least-informed person who might also be interested in this problem”? I know very little about JavaScript and web standards, but have managed to work often on a tool that would benefit from these patterns :slight_smile:

As opposed to JS-over-websockets a trick that Colab uses is a browser Service Worker to intercept all network requests from the output to the kernel manager and cache those in the notebook (as base64 strings… ick). When the notebook is viewed later on the same service workers can be used to serve up the content. It actually mostly works for fairly complex pages.

I agree with @bollwyvl that an API to expose the mimeExtensions would help a lot (I believe I was thinking of something similar here, but it’s not well articulated). An issue is that renderers like Bokeh still use Comms and need a stable API to access them. In Lab today this is only possible via extensions.

If I understand https://github.com/jupyter-widgets/ipywidgets/pull/3107 correctly it’s essentially a fallback rendering approach. This is important but I see it as a bandaid- for example Colab’s DataTable has to have a plain HTML fallback so the content would still appear in GitHub’s renderer. But given a choice, we would much prefer to have a renderer which could work everywhere. If I make a notebook using nteract’s Data Explorer and send it to another user they should get the same visualization- not an HTML table fallback.

Plenty of pain in this space. I lost days when plotly charts just stopped working in documentation portals. Had to use screenshots instead. When they did work it was magical. Would be a good one-up on nbsphinx.

1 Like

If taken to the extreme, it would enable researchers to serve ~backendless realtime apps with Dash (similar to R Shiny). Where the Book becomes a collection of interactive, multi-step tools (UI on top of a protocol) written by their team. I love Django, but it’s a lot.

It would make portals like this a lot easier:
https://usegalaxy.org/

Also JuliaHub

But it would also increase the need for a more robust auth system like keycloak

Comms are indeed a thing, but I think the archival-grade HTML output is a more attainable baseline for something to work. Renderers, unless very carefully architected, won’t work outside of their “home” client, but rando comm things pretty much need to live within their client today.

I think a way forward here would be a JEP encouraging either the comm target_name to be a mimetype, or more backwards-compatible, adding a new field, and running more things through the mimerender detection pipe described above. For example, I’ve got this open PR (and a draft JEP) to enable wrapping the state of Language Server Protocol exchange. If there were an application/lsp+json (and a schema for it) it would be more straightforward, of course, but anything would be better than hand-waving. To that end, the LSIF protocol shows a nice path forward for at-rest interactive content, which we’d also like to support. If a non-browser client (or, at worst a headless browser) was able to excite the comm target during execution, and this could be intercepted and stored, statically-hosted documentation sites would have a path forward to maintaining at least an “on the garden path” set of excursions, at the expense of a heavier deployment payload.

realtime apps with Dash

Much like Bokeh, I do not foresee any plotly/dash stuff becoming a first-class and -party tool supported by the Jupyter protocols, specs, and clients… but maybe that’s just me. However, once voila has adopted federated modules, I very much see the prospect of having fully-statically-hosted, nicely customizable apps, similar to the jyve experiments, built out of Jupyter widgets.

The service workers I mentioned above do happen to somewhat work with Dash as well. I’m not super familiar with Dash but an example is https://colab.research.google.com/gist/blois/558819808ea6b151ce43972ca8daa5aa/dash.ipynb. (This file is pretty old, the boilerplate in that last cell is now in a library).

The same functionality is used by TensorBoard to persist itself in the notebook. The service worker approach is quite finicky- it’s complex and works for 90% of the cases but that 10% is trouble.

I firmly disagree that Comms cannot be portable- the baseline API is exceedingly simple. Renderers like Plotly and Bokeh show that a rich visualization can both be portable and use Comms. Visualizations should be able to render using JS, transmit interactions back to the kernel via Comms and persist any resulting changes in the notebook. Allow a user to zoom a plot, annotate an image or filter a table and persist that view in the notebook- why would this not be a goal?

Note that I don’t think Comms themselves should go directly into the document- instead there should probably be some abstract storage mechanism akin to how the final widget state is persisted in the notebook.