Thanks for starting this!
A related discussion is on the JEP for DAP. In general, we might want to separate the specification from the reference implementation, and embed the human-readable documentation inside the formal specification. DAP itself is a good example of this, with its toolchain. In the DAP case, the ability for a Jupyter spec to reference another spec, in both a machine- and human-resolvable way would likely be preferable to re-implementing or re-documenting it.
Once (more) formalized, including a concrete reference to these specs in so-constrained objects would go a long way towards self-description, while actually including the schema might be a bit too Goedel-Escher-Bach. Including $schema
seems like the most straightforward approach. What this does not provide, however, is an easy means for a document to be, for example, both an nbformat.v4
document as well as a particular, more-constrained format. I am not sure if schema could be crafted in such a way as to make this self-describing.
As this would generally necessitate a major (breaking) change on both ends of the pipe, I would also advocate for (optional) inclusion of a list of JSON-LD context, which would permit much deeper, unambiguous integration with high-value metadata formats like W3C Web Annotation and PROV.
Finally, setting a goal for a computationally-lossless, yet publication-ready format would make sense. I submit that PDFA/2
is a format really worth considering for this role, as it is already the de facto (or indeed, de jure) format in a number of domains. In addition to the familiar features of PDF, it includes a virtual file system, such that a “Jupyter PDF” meant:
- a
PDFA/2
- at least one
.ipynb