We are working on a open paper (in the spirit of distill) but focused on neuroimaging (for now). Any user could have access to a reproducible paper using binder. Anyone could also upload a new article there.
Basically, we do not want the user to have right access to our server, that is why repo2data wold be launch on the server every time any user upload a new repository.
The process could be seen as it:
A user upload his work (a notebook), which is using some databases (from https://openneuro.org/ for example).
He provides in the repo a configuration file data_requirements.json , which specify where the data lives
After the docker image is built, we launch repo2data from our server which will read user’s data_requirements.json
The database is downloaded on a folder /data on our server (if it is not already existing on our server)
/data is accessible as read-only by every users (every notebooks running on our binder).
jupyterhub:
singleuser:
extraVolumes:
- name: shared-data
hostPath:
path: /path/to/shared/data
extraVolumeMounts:
- name: shared-data
mountPath: /data # where each user can reach the shared data
readOnly: true
According to your setup and where you have your data, you can choose appropriate volume type: Volumes | Kubernetes
jupyterhub:
hub:
extraConfig:
myExtraConfig: |
async def my_pre_spawn_hook(spawner):
repo_url = spawner.user_options.get('repo_url')
ref = spawner.user_options.get('image').split(':')[-1] # commit hash
# TODO get data_requirement.json from repo
# TODO run repo2data
c.KubeSpawner.pre_spawn_hook = my_pre_spawn_hook
This will work with 2 conditions:
Hub has access to same data volume, which is mounted to user pods, with write access (not readonly). Here is a complementary example to above config of singleuser:
jupyterhub:
hub:
extraVolumes:
- name: shared-data
hostPath:
path: /path/to/shared/data
extraVolumeMounts:
- name: shared-data
mountPath: /data # where hub can reach the shared data
Your hub image has repo2data and its requirements installed. So you have to extend the hub image and then use your own image in your config: