Hi Chris,
Thanks for getting this conversation going.
I think this is a fantastic idea, and I agree the Jupyter ecosystem now has the necessary pieces for an amazing publishing platform.
@ltetrel already mentioned we have been working on a project in that direction for a couple of years now, called NeuroLibre, that should be open for submissions very soon. In short, Neurolibre is going to be a preprint service for neuroscience, that hosts Jupyter books and associated data. The point @ltetrel brought up on NeuroLibre is very technical and relates to an issue we are having at the moment., but I would like to share some more general thoughts as well.
-
my experience with myBinder is that start-up time (and start-up success) is quite variable. I think this is an amazing service for the community given that anyone can start a pod. The level of reliability is great given the low bar to entry. For a publishing platform, there would need to be more reliability. This should be possible because the review process will limit the number of submissions. But it may require some technical developments too. This gets me to my next point.
-
At least in my field, papers rely on fairly large datasets, in the order of 10s of GB. Downloading these data to reproduce computations is also a major limiting factor in order to quickly reproduce an analysis. @ltetrel created a mechanism for neurolibre to create a local cache of data at build time GitHub - SIMEXP/Repo2Data: Automatic data fetcher from the web We also have a local docker registry (integration of this in Jupyter was discussed at some point). This means we can reliably spin a pod with all necessary data in a matter of seconds. But that also means we need to store a lot of data long-term in the cloud, which again gets me to my next point.
-
We decided to rely on a local cloud. First, we tried to use the public Canadian high-performance infrastructure but we ran into reliability issues. Recently, we moved to be hosted at McGill University who decided to support our IT through the Canadian open neuroscience platform, which is also funding neurolibre. Cancer computing donated a fairly large number of servers to this effort, which is now set up with openNebula, Kubernetes and Jupyter hub. This solution is not perfect (we haven’t got terraform to work and the setup requires some manual intervention), but it works. Note that we’ve been creating some documentation along the way, even if there is some catch-up to do (NeuroLibre — NeuroLibre v0.1 documentation). The rationale for using this type of infrastructure rather than a commercial cloud is data hosting. Even if we don’t have our cloud hosted on Compute Canada, we are connected to them with a high-speed connection (I believe 100 Gb/s line), and they have the capacity to host tape for very very cheap, which cannot be matched through commercial providers (at least when I did my price analysis a few years back). As this infrastructure is built and maintained as part of a national and university investment anyway, I think it’s an excellent solution in terms of sustainability for data and compute hosting.
-
For submission and review (or in our case technical screening), we have tried working purely with GitHub actions, and have piloted an entire system using that. @emdupre eventually convinced us to build on top of the system used by the journal of open source software (JOSS) instead. The main rationale is to contribute to an existing successful project rather than start something new. I was skeptical about the ease of adopting their system. Elizabeth responded by basically writing the JOSS installation instructions for them. The system is up and running and we are now trying to extend their build system to include the jupyter books, and not just a pdf. Building a Jupyter book with a lot of data is time-consuming, and we need this service to be hosted by our binder hub instance. @ltetrel comment was related to that (hopefully last) development to get everything working.
Those are just thoughts and I realize you may disagree with some of the design decisions we made, in particular relying on an academic cloud. One last point is that the “neurolibre” manifesto includes contributing upstream as a founding principle (in particular Jupyter and JOSS). So the neurolibre team will be happy to contribute to the Jupyter publishing platform as much as possible.
Regarding funding, neurolibre is in the process of renewing CONP and just applied to a Welcome fund. This could have worked for the jupyter publishing platform, but that deadline has passed. I would be happy to help in any way I can to get the Jupyter publishing infrastructure funded.