If someone wants to submit a notebook to an event so that it can be peer-reviewed, eg as planned for http://earthcube.org/EC2020, what are best practices that should be followed?
We want to ensure that the notebook is reproducible for at least a few months, and ideally longer. We also want to ensure that another person can find it and execute it.
One simple option seems to be to create a repo with a binder badge.
Are there problems with this? What would be better?
Maybe GitHub/the repo won’t be around for very long (bitrot) so you could take your repo (with notebook and binder badge) and store it on Zenodo.
You get a DOI and people can launch the version from Zenodo Binder. Zenodo points to GitHub where maybe things have moved on since.
(The above is just some random example from the first page of results when searching Zenodo for “binder”.)
Binder supports several other repositories (in the archiving sense) so you aren’t limited to Zenodo and adding support for more is a question of someone caring enough to spend a few hours implementing it.
going slightly off-topic/to the future section: something I think is worth remembering: notebooks are terrible for authoring longer texts with cross-references, tables, figures, maths, citations and other typesetting stuffs. I wouldn’t try to make them better at it. I’d keep using something like LaTeX (or what ever your field uses) to do the typesetting of the bulk of the text. At the right places in the article I’d insert links to the notebook. The notebook would still contain more than just code, but it wouldn’t be as extensive as a longer paper.
Where would metadata about the notebook live? Within the notebook or in an entry attached to the DOI (Zenodo) or ? What basic info should be included? So much about (meta)data is domain specific, but I don’t think this needs to be… at least not a base set.
That’s a great suggestion! For the uninitiated do you think we could find an easy path for them to include? That is, 1. any advice on making that approach digestible? 2. Any implications for non-Jupyter submissions?
The basic information every notebook should contain about the notebook, e.g. author, what it’s about. The metadata would set us up for creating a directory searchable on those parameters. Most people use Jupyter, but we might get R Studio and Matlab entries too. So want guidance that works across any platform. Metadata is important also to ensure machines can find and interpret research artifacts.
A list of authors is already collected by Zenodo (and probably all the other platforms like it) when you submit something to it.
Metadata and making it machine readable is important however we already have solutions to this for PDFs, Word files, images, etc which people regularly submit. Which suggests we can keep using that system.
The exception would be if there are specific new fields or kinds of metadata which are not collected already for the many other types of files people submit. I can’t think of anything unique to notebooks (or a directory of files one of which is a notebook as I think the shareable unit is a directory not a notebook).
Thank you @danielskatz - having non-standard/executable publication objects that are peer-reviewed is super exciting
Together with a colleague of mine, we were thinking about doing something similar in a Computational Social Science workshop. So I’m certainly interested in what experiences you make with this format.
There has also been some discussion on GH about using Notebooks in journals
Does anyone have any experience with this? I can’t immediately figure out what it offers beyond nbdime, but it seems like it might be helping more with the review part.
I haven’t used it myself - I thought it looked quite interesting, but I agree it seems like a proprietary wrapper around various Jupyter tools so I don’t think it’s very usable for most of our use-cases
edit: to avoid confusion I struck out a section above that came across stronger than I meant it, I don’t know enough about it to know whether it is a wrapper or not. We just haven’t used it because it’s not open source so it wouldn’t work for our workflows.
Does anyone have any experience with this? I can’t immediately figure out what it offers beyond nbdime, but it seems like it might be helping more with the review part.
@danielskatz I built ReviewNB so I can help answer this. It is a peer review tool for Jupyter Notebooks where one can share their notebooks & others can comment on the content (at an individual cell level) to share feedback, ask clarifying questions etc. It also shows visual diff instead of JSON diff for displaying how the notebook content has changed from it’s previous version.
Being said that, ReviewNB is primarily designed for peer review within a data science team context & requirements for peer review in the publication context might vary. E.g. ReviewNB’s review workflow is tightly coupled with GitHub pull requests (all comments you write are posted on GitHub, visual diff is generated from GitHub patch etc). So you might want to try it out to see if this fits your requirements.
it seems like a proprietary wrapper around various Jupyter tools
@choldgraf I appreciate all your work in Jupyter community. But this comment is really insulting to my work. What Jupyter tools is ReviewNB a “wrapper” around? If you are referring to nbdime - No, we are not using nbdime at all. I wouldn’t have been able to build commenting using a third party diff tool. I literally take git patch from GitHub and build the diff from it. If anything, I rely on the notebook format and the notebook styling more than any Jupyter tool or library.
I’m a proud engineer & trust me I’d really have better things to work on than building proprietary wrapper around open source tools. It’s so disheartening to see such comments.
Hey @amirathi - I just want to clarify my point above. I didn’t mean “proprietary” in a negative way, and certainly did not mean to imply that NBReview is not adding significantly above the open source ecosystem. My point was that in my context at the university, it is much harder to build on top of proprietary tools and so we haven’t been able to use or look into the codebase much as a result, even though it does seem interesting and related to the Jupyter ecosystem. I am assuming in this thread that the infrastructure needed to conduct reviewing needs to be open source, which is why I thought it was a relevant comment. I apologize that I didn’t choose my words more carefully, I didn’t consider that they were so negative .
(though actually, you did teach me something new, as I had assumed it was using nbdime under the hood. Now I am even more curious how the diffing is managed!)