Creating a future infrastructure for notebooks to be submitted and peer-reviewed

If we wanted to create an infrastructure where notebooks could be submitted and peer-reviewed, and had about a year to do this, what would this look like?

Would we build review functionality in GitHub using a fork of OpenJournals (https://github.com/openjournals/) and whedon?

Or something completely different?

7 Likes

This is something that NeuroLibre (http://neurolibre.conp.ca) is trying to work out now. Currently, the discussions have focused on using a lot of the whedon infrastructure in conjunction with Jupyter Book. An initial draft of our ideas is available here: https://github.com/neurolibre/submit

Pinging in @pbellec in case he’d like to add anything else !

3 Likes

I am a big fan of what the neurolibre team is doing :slight_smile:

2 Likes

What about WholeTale? https://wholetale.org/

Over the last year we have been working to implement a notebook review system within our nbgallery project to help with our our enterprise use-case for using notebooks at scale.

In our organization we have many more users of notebooks (10,000+) then we do authors (1,500+) and we want to give users confidence that the notebooks they find will work properly. We first developed a recommendation system to help users identify notebooks that are of interest to a given user. We then built a health monitoring system to provide an automatic measurement on the notebook’s current state.

We are now adding in a curation framework that includes support for peer-based notebook review to help provide subjective evaluations of high-value notebooks within our system.

While we actively have the technical pieces in place, we’re still working within our organization to fully define the process we want to use - i.e. what criteria to use when reviewing a notebook.

Happy to provide more insight from our use-case if it’s of interest, but thought I’d at least share some of these links to give one example of a notebook review framework.

6 Likes

@somedave Is there a recent-ish write up anywhere of how your org makes use of nbgallery?

Is the peer review infrastructure mostly about a traditional workflow, or would you also support citation/source validation (e.g. ensuring adequate citation, and that citations are appropriate to the notebook)? I ask because I’ve been working on data source citation/validation at https://github.com/whythawk/whyqd which provides an audit trail for wrangled source data.

1 Like

Is the peer review infrastructure mostly about a traditional workflow, or would you also support citation/source validation (e.g. ensuring adequate citation, and that citations are appropriate to the notebook)?

The citation/source validation at this point would be manual - the reviewer could check whatever they want.

@psychemedia - I can look to work up something more recent, but this JupyterCon talk of mine from September '18 details our use of nbgallery and highlights the recommendation and health monitoring efforts within nbgallery (the curation/review framework came later).

Also possibly of interest would be this repo with our thoughts on dashboards, and this previous discourse post detailing our experience using Jupyter in a large enterprise setting.

3 Likes

Just to clarify: the notebook is replacing the paper, not supplementing it? Would the reviewer possibly re-execute it and therefore need access to more than just the notebook (e.g., environment, dependencies, ala Binder)? Given the Github/OpenJournal approach, I assume this would be fully open peer review where all are comfortable with Git issues and PRs (ala JOSS)?

With a fully-open review process managed via Git and OpenJournal, couldn’t you just make the “repo” a Binder? Once it’s through the review process, it gets published to Zenodo, same as the JOSS software artifacts, and can easily be re-executed later or possibly integrated into the OpenJournal interface via some sort of widget.

There are certainly examples of journals with alternative approaches to this type of peer review, but these typically involve traditional papers with supplemental computational artifacts, non-open journals, some degree of blindness, and integration with commercial review tools and publishing infrastructure.

3 Likes

Just to clarify: the notebook is replacing the paper, not supplementing it?

There is no requirement for a paper at this point. This is just for review of some computational work.

Would the reviewer possibly re-execute it and therefore need access to more than just the notebook (e.g., environment, dependencies, ala Binder)?

yes

Given the Github/OpenJournal approach, I assume this would be fully open peer review where all are comfortable with Git issues and PRs (ala JOSS)?

That would be one option, but maybe there are others too?

With a fully-open review process managed via Git and OpenJournal, couldn’t you just make the “repo” a Binder?

That’s the first comment in Guidelines for submitting a notebook for peer review today

Once it’s through the review process, it gets published to Zenodo, same as the JOSS software artifacts, and can easily be re-executed later or possibly integrated into the OpenJournal interface via some sort of widget.

I think this is possible - if we chose this, a next question could be what tooling would be needed to make this work

This was a great read. I now wish I had buckets of spare time to prepare, get community buy in to run a study on mybinder.org following your ideas :slight_smile:

2 Likes

Lots of great ideas here! I have two side projects which may help in the process:

  • data-vault - by introducing a single ZIP (“vault”) for data and embedding hashsums and timestamps when reading and saving files I aim to increase the reproducibility of the analyses; if used properly (with git and nbdime) allows to trace when the data changed and when submitting for review, one could just share the ZIP (in addition to a git repository). It may not work for researchers who work with more complex data types, but I think that the idea of keeping a central data store and adding hashsums/timestamps may be a useful one.
    Edit: I am now aware that this is similar to another solution - nteract/scrapbook - which is already there.

  • nbpipeline this is a proof of concept for reproducible pipeline of notebooks:

while this repository is not of the quality I would normally share with anyone, I think that it addresses an important issue - going over the code published on GitHub is often like navigating a maze where you cannot know how different pieces connect to each other and in what order things were executed. Sometimes even finding where the actual results are is challenging!

I saw repositories using 01_Data_cleaning.ipynb, 02_Analysis_A.ipynb, etc which might be just fine for smaller repositories - but definetely, enabling users to specify how the how different notebooks relate to each other would be very helpful!

1 Like

I appreciate this coming up here, because we’re also working on this at Gigantum. At this point we have a few models for how this can work.

At the core of our approach is a desire to be more accessible than GitHub and more financially sustainable by ensuring broad portability of both data and compute (as opposed to a single cloud or national / institutional infrastructure). I will point out that Binder has a similar decentralized model, but I think the decentralization is more for administrators (or at least developers) than for end-users. That’s not bad - just different (and I’d be happy to discuss finer points - but for now I’m focused on an overview of my perspective from my experience at Gigantum).

There is the clear and vocal contingent that wants to put stuff on GitHub. There are lots of people who are intimidated or simply find GitHub and the related requirements burdensome. Support for inter-op with external git repositories is a medium term goal for us because of this demand, and I guess a GitHub option is important for any review tool.

I still believe that folks underestimate the impact of cognitive burden on “open science best practices” (and there is reasonable empirical evidence to this underestimation effect in general) - so it’s better for the actual science itself if review systems provide a scaffolded or even automated process. For example, the workflow of the PLOS or Frontiers review system is far more universal than anything I can imagine achieving directly with GitHub. (Presumably open journal also - I’ve not used it! And if I’m behind the times on GitHub based review, I would appreciate pointers!)

Relatedly, if things are easier to set up, the author and/or reviewer can use any extra bandwidth to improve the quality of the work and communication itself.

But I think it’s especially important to make things accessible to hand (not just to look at, but to use). I think Randal Burns did a pretty good job with this project:

https://gigantum.com/randal/forestpacking-sdm2019

This involves benchmarking first on a local machine, then on a standardized AWS reference. Anyone can poke around with these benchmarking results by clicking the “launch jupyterlab button” but they can also paste that URL into an application running locally and get a “launch jupyterlab” button there. This makes it far easier to reproduce a benchmark than it would be on Kubernetes (which is what we and at least Binder are using). Or, you could just look through our complete record of every command sent to the Jupyter kernel and see what the person did and trust their benchmarks. The reviewer can move directly towards subjecting the author to whatever level of scrutiny is desired.

The large data inputs are in an attached forestpacking dataset, so if you just want to pull the project onto your laptop to review results on a plane or in the woods, you can conserve space and leave the datasets behind, or just grab one file, etc.

Anyway, in terms of tools for review per se, I wonder if a review system could be de-coupled from the publishing side of the Open Journal system?

Foundationally, my hope is that we get a variety of projects that have different focus (e.g., empowering end-user, making administration by institutions easier, hard-core developer mode, etc.). This translates into a desire for a review system that’s not tightly coupled to the systems for actually running or inspecting code and data!

1 Like

I realized there’s a question implicit in the above - is Randal’s project a good example of what a review process might help steer authors towards? What are other examples of “good” and (perhaps only sketched in the abstract) “bad” code projects that could be targets or things to avoid in the review process?

I have asked this question before - “good” examples included the re-analysis in Jupyter of the LIGO data (which I won’t link directly because I’m unsure which is the “right” one - but if anyone has trouble finding it, feel free to ask me).

You should take a look at JOSS if you haven’t - All reviews are open and available to read so you can see how this works in practice. Also see Journal of Open Source Software (JOSS): design and first-year review and Publish Your Software: Introducing the Journal of Open Source Software (JOSS) for more discussion about it

I absolutely love JOSS, and I think it nails a number of things. Most importantly, it created a category for what it is - a low-cost, community moderated publication for authors of scientific / research software. Because I’ve trained in certain ways, I’ve enjoyed using the excellent GitHub interface to do reviews for JOSS.

BUT, a hard-learned lesson for me is that the majority of researchers I’ve worked with do not benefit from GitHub - but rather find it confusing.

So that’s why I was more curious about something like the OJS (even though I’ve never used it for real) it’s decluttered and accessible for folks who struggle (or perhaps simply lack the patience) to navigate GitHub. There are also some rather spiffy CMS approaches that use GitHub as a backend - perhaps a system like that would be the best of both worlds? (In case you have no idea what I’m talking about: the first such system I was aware of was prose.io, but more recently, systems like netlifyCMS have been popular.) BUT, perhaps a simpler app that’s closer to a typical review form will be far less of a headache, and be accessible to almost anyone already.

I hope I’m not belaboring the point too much! But after working on accessibility and inclusivity in a variety of situations, I’ve found that it’s a hard point to nail home. And indeed, maybe an important part of the design process will be talking to less technical users (the sort who aren’t terribly inclined to be on the Jupyter Discourse!)

1 Like

Perhaps it is worth mentioning the assumptions that we have for our users. My assumptions are:

  • Users are familiar with Python / R / etc enough to write analysis scripts in their papers
  • Users are already familiar with the Jupyter Notebook, and have used it before
  • Users are motivated enough to want to submit a notebook along with their work

It seems that this is a reasonable kind of user to build infrastructure for at first, because they’re the most likely to actually use and benefit from this infrastructure. In all likelihood, the people who would actually participate in a pilot of this kind are going to be those who are fairly familiar with notebooks (which is fortunately also a pretty large group of people).

That could be a test case to build interest, prototypes, and eventually to make a case that it’s worthwhile to build infrastructure / UI / etc around users that aren’t as familiar or motivated with coding practices.

I think this kind of thing would most-naturally go in waves of development. Start off building infrastructure that makes it possible under-the-hood, perhaps relying on more power-user types to test and use the infrastructure. Try not to make any decisions that totally cut out off from extending functionality. Then if it’s got enough interest, start building out more user-friendly UI for those who don’t want to use GitHub.

1 Like

Is it worth us forming a working group around this? I’ve been working on similar efforts (Kubeflow for reproducible pipelines/declarative services, MLSpec for schemas, MLBox for execution layer), but I would prefer to do this as a unified effort.

Just let me know!

2 Likes

I like the assumptions in your bullet points for a target user model, but I’m a bit worried about:

In general, I think you’re proposing a pretty good plan, @choldgraf. I would argue that it should be considered in the brainstorming phase until you can get folks using it who are of the not-using-GitHub type. If you have even a “draft” design that’s not inclusive, I don’t think that’s setting us up for the kind of thing I think we all eventually want!