JupyterHub for a reproducible research platform

I am part of a team that is currently exploring the use of JupyterHub as a base for the next generation of our platform for reproducible research (currently known as BEAT).

Background: We already make use of JupyterHub over Kubernetes for running academic courses on Machine Learning at our institute. We have, therefore, some experience installing and customising this toolset for this purpose.

To what concerns the re-design of BEAT, we feel it makes sense to pursue the JupyterHub over Kubernetes track, but we are currently unsure about what it would take to implement some of our specifications.

Roughly speaking, we would be interested in implementing something equivalent to the notebook interface of Kaggle, while storing notebooks on a Git server (e.g. GitHub/GitLab/…). In this environment, we would like users to be subject to processing quotas (CPU/RAM) and have access to shareable storage “buckets”. Aside from JupyterHub over Kubernetes, we have also tried BinderHub (and even contributed back some patches). However that approach does not cover all of our specs.

I try to summarise here some of the points that interest us:

  • Sharable data volumes (K8s resources) between users, in a user controlled manner: User A would like to share one of its data volumes with User B but not the whole content of its allocated space.
  • Variable quotas: user A has access to X amount of CPU/RAM from K8s cluster, while user B has Y.
  • Custom front-end to notebooks: as in Kaggle, we would like to allow users to be able to change some aspects of the current processing environment (e.g. which data volumes to attach), and to monitor available resource usage (using an iframe was tried but it does not really cover our use case).

From the looks of it, some of these points could be covered by the Jupyter Enterprise Gateway.

We would like avoid re-doing this if somebody is already tackling a similar use case, or work collaboratively if anyone is interested. Naturally, we intend to open-source all of our contributions.

Please let us know what you think of this, your suggestions on how to approach this use-case, and interest in following this up.