Hi @danielskatz (and thanks @labarba for the shoutout) - a bit late to this party but we built a simple and minimal workflow for document submission, review and publication inside Authorea. (Note: an Authorea document can include and execute Jupyter Notebooks). You can see the workflow in action in this video: https://www.youtube.com/watch?v=YQO0FDk4BDE
A couple of things to note: (1) a DOI can (optionally) be minted upon notebook publication, (2) peer review reports (signed or anonymous) are published as well (transparent peer review)
2 Likes
very cool! is there a place where we can see the backend details of how that is handled, or code that others could use to build upon?
In addition to krassowski/data-vault, krassowski/nbpipeline, and nteract/scrapbook;
pachyderm and quiltdata/quilt do (1) data versioning and (2) data analysis pipelines with sequences of container image invocations.
2 Likes
What is a Journal, what value do Journals provide, how can Journals and Notebooks merge to become supreme Notebook-hosting Journals?
What is a Journal? What value do Journals provide?
- Document hosting: PostScript, LaTeX, PDF, HTML ā HTML+RDF (RDFa), HTML+JSONLD
- Document-level bibliographic metadata:
Title, Authors (Organizations, Funding), Abstract
- Comments / Threaded Comments
- Search: Documents, Comments, Datasets, Code
- Premises: Inputs and Outputs
- Citations as (typed) graph edges (already parsed into JSON-LD)
- Code repositories with version control
- Data repositories with version control
- Image hosting: charts and figures (CDN: Content Delivery Network)
- Recommended/similar articles
- #LinkedResearch (linkeddata/dokieli,)
- Expert Community
- Audience
- Moderation (this is a real cost)
How can Journals and Notebooks merge to become supreme Notebook-hosting Journals?
1 Like
Iām quite late to this thread (thanks @danielskatz for pointing me to it), but I thought Iād share the notebook publishing solution we have developed for Pangeo Gallery. This is far from a complete / finished solution, but there may be some elements in our workflow that can be remixed / reused in other ways.
The main elements of Pangeo Gallery are:
- The gallery is organized into repos. Each repo contains notebooks, a shared environment, and a simple configuration file. The repos can live in any organization.
-
Binderbot, a CLI which uses the binder API to execute notebooks from within a running binder. This is a key ingredient that allows us to ābuildā notebooks in the cloud, in the user-specified environment.
- A GitHub workflow which gets run on each repo of the gallery, which calls binderbot, builds the notebooks, and commits them to a separate branch in the same repo.
- A Sphinx Website which builds http://gallery.pangeo.io/ statically from the built notebooks. Each repo in the gallery is added as a submodule to the
pangeo-gallery
repo.
This combination of tools provides a fairly simple and lightweight way to continuously integrate notebooks and build them into a nice website, using all open-source tools and platforms. By using binder, we get interactive execution for free.
Going forward, this could conceivably form the basis of a peer-review / publication pipeline, similar to JOSS, in which the review occurs in the authorās repo itself, via comments, PRs, etc
To achieve archivability, one would want to store the built notebooks in a more permanent repository with DOIsāeither something custom made for this purpose or, the easier path, Zenodo or Figshare.
2 Likes
Iām very glad I saw this! For Bluesky we use a system modeled on https://github.com/dask/dask-examples:
- There is a single repository of notebooks and supporting files.
- The notebooks are built on Travis using nbsphinx, and the built artifacts are uploaded to GH pages using doctr.
The system you describe would address these pain points we have found:
- The single repo mixes notebooks for different audience and maintained by different groups. Separate repos would make it easier to manage this in some ways.
- The build time is long, and nbsphinx does not make it convenient to target specific notebooks.
- Debugging anything inside of sphinx is painful, not least of all notebook execution.
I like in particular that Pangeoās system only relies on sphinx to stitch together already-executed static notebooks.
The overall complexity is higher ā multiple repos, more moving parts ā but seems like it might be easier to maintain over time.
1 Like