Guidelines for submitting a notebook for peer review today

If someone wants to submit a notebook to an event so that it can be peer-reviewed, eg as planned for http://earthcube.org/EC2020, what are best practices that should be followed?

We want to ensure that the notebook is reproducible for at least a few months, and ideally longer. We also want to ensure that another person can find it and execute it.

One simple option seems to be to create a repo with a binder badge.

Are there problems with this? What would be better?

2 Likes

Maybe GitHub/the repo won’t be around for very long (bitrot) so you could take your repo (with notebook and binder badge) and store it on Zenodo.

You get a DOI and people can launch the version from Zenodo https://mybinder.org/v2/zenodo/10.5281/zenodo.3523526/. Zenodo points to GitHub where maybe things have moved on since.

(The above is just some random example from the first page of results when searching Zenodo for “binder”.)

Binder supports several other repositories (in the archiving sense) so you aren’t limited to Zenodo and adding support for more is a question of someone caring enough to spend a few hours implementing it.


going slightly off-topic/to the future section: something I think is worth remembering: notebooks are terrible for authoring longer texts with cross-references, tables, figures, maths, citations and other typesetting stuffs. I wouldn’t try to make them better at it. I’d keep using something like LaTeX (or what ever your field uses) to do the typesetting of the bulk of the text. At the right places in the article I’d insert links to the notebook. The notebook would still contain more than just code, but it wouldn’t be as extensive as a longer paper.

1 Like

With the caveat that the community is still figuring this out, I’d imagine something like:

  • Notebook should be on GitHub
  • Branch or tag should be associated with the submission time
  • Zenodo DOI should be generated for the page
  • There should be a CITATION file, or something like it in the repository
  • Notebook should be reproducible with a BinderHub (if the computation allows it)
  • For the content itself, I’d suggest either:
    • Do a @betatim suggests, and keep computation-focused content in the notebook and link out to it from another markup language (like latex)
    • Suggest that the author includes a Sphinx site to handle the cross-refs etc, along with something like nbsphinx.
    • Keep an eye on Jupyter Book, and in particular, ebp.jupyterbook.org, as it should get better at things like cross refs / citations / etc soon.
4 Likes

Where would metadata about the notebook live? Within the notebook or in an entry attached to the DOI (Zenodo) or ? What basic info should be included? So much about (meta)data is domain specific, but I don’t think this needs to be… at least not a base set.

on the topic of notebook metadata I guess I would point people to ROCrate

I’m not sure WholeTale supports RO although they use the term research objects in a generic sense

1 Like

That’s a great suggestion! For the uninitiated do you think we could find an easy path for them to include? That is, 1. any advice on making that approach digestible? 2. Any implications for non-Jupyter submissions?

What exactly do you mean with metadata?

The basic information every notebook should contain about the notebook, e.g. author, what it’s about. The metadata would set us up for creating a directory searchable on those parameters. Most people use Jupyter, but we might get R Studio and Matlab entries too. So want guidance that works across any platform. Metadata is important also to ensure machines can find and interpret research artifacts.

A list of authors is already collected by Zenodo (and probably all the other platforms like it) when you submit something to it.

Metadata and making it machine readable is important however we already have solutions to this for PDFs, Word files, images, etc which people regularly submit. Which suggests we can keep using that system.

The exception would be if there are specific new fields or kinds of metadata which are not collected already for the many other types of files people submit. I can’t think of anything unique to notebooks (or a directory of files one of which is a notebook as I think the shareable unit is a directory not a notebook).

Thank you @danielskatz - having non-standard/executable publication objects that are peer-reviewed is super exciting :slight_smile:

Together with a colleague of mine, we were thinking about doing something similar in a Computational Social Science workshop. So I’m certainly interested in what experiences you make with this format.

There has also been some discussion on GH about using Notebooks in journals

Gittaca on twitter suggests that https://www.reviewnb.com can help, in https://twitter.com/gittaca/status/1238199999223209984?s=20 (and in https://twitter.com/gittaca/status/1238485429969653760?s=20, asked for this to be posted here)

Does anyone have any experience with this? I can’t immediately figure out what it offers beyond nbdime, but it seems like it might be helping more with the review part.

1 Like

I haven’t used it myself - I thought it looked quite interesting, but I agree it seems like a proprietary wrapper around various Jupyter tools so I don’t think it’s very usable for most of our use-cases

edit: to avoid confusion I struck out a section above that came across stronger than I meant it, I don’t know enough about it to know whether it is a wrapper or not. We just haven’t used it because it’s not open source so it wouldn’t work for our workflows.

1 Like

Does anyone have any experience with this? I can’t immediately figure out what it offers beyond nbdime, but it seems like it might be helping more with the review part.

@danielskatz I built ReviewNB so I can help answer this. It is a peer review tool for Jupyter Notebooks where one can share their notebooks & others can comment on the content (at an individual cell level) to share feedback, ask clarifying questions etc. It also shows visual diff instead of JSON diff for displaying how the notebook content has changed from it’s previous version.

Being said that, ReviewNB is primarily designed for peer review within a data science team context & requirements for peer review in the publication context might vary. E.g. ReviewNB’s review workflow is tightly coupled with GitHub pull requests (all comments you write are posted on GitHub, visual diff is generated from GitHub patch etc). So you might want to try it out to see if this fits your requirements.

it seems like a proprietary wrapper around various Jupyter tools

@choldgraf I appreciate all your work in Jupyter community. But this comment is really insulting to my work. What Jupyter tools is ReviewNB a “wrapper” around? If you are referring to nbdime - No, we are not using nbdime at all. I wouldn’t have been able to build commenting using a third party diff tool. I literally take git patch from GitHub and build the diff from it. If anything, I rely on the notebook format and the notebook styling more than any Jupyter tool or library.

I’m a proud engineer & trust me I’d really have better things to work on than building proprietary wrapper around open source tools. It’s so disheartening to see such comments.

2 Likes

Hey @amirathi - I just want to clarify my point above. I didn’t mean “proprietary” in a negative way, and certainly did not mean to imply that NBReview is not adding significantly above the open source ecosystem. My point was that in my context at the university, it is much harder to build on top of proprietary tools and so we haven’t been able to use or look into the codebase much as a result, even though it does seem interesting and related to the Jupyter ecosystem. I am assuming in this thread that the infrastructure needed to conduct reviewing needs to be open source, which is why I thought it was a relevant comment. I apologize that I didn’t choose my words more carefully, I didn’t consider that they were so negative :slightly_frowning_face: .

(though actually, you did teach me something new, as I had assumed it was using nbdime under the hood. Now I am even more curious how the diffing is managed!)

Hi all!

We’d love your thoughts on a thread that is tangentially related here - Proposed-JEP: Investigate alternate, optional file formats

Please come join the discussion!