Follow-up to jupyterhub/binderhub/issues/794
We deployed a test instance on notebooks-test.gesis.org, where you can try the described setup. To save build-time use the pre-built repositories:
- https://github.com/binder-examples/requirements
- https://github.com/binder-examples/conda
- https://github.com/binder-examples/r
Beyond making the code available for everyone interested we plan to introduce this on our production environment and appreciate any feedback and suggestions!
Our goal is to bring persistency to BinderHub. We want to unite the best of JupyterHub and BinderHub. From a user’s point of view we think the way forward is to enable a binder form on the home page of every user on the JupyterHub installation. To achieve this, we added 2 new features to BinderHub, authentication and persistent storage.
Authentication
As a first step Authentication has been introduced and is supported by BinderHub since jupyterhub/binderhub/pull/666. You can get more information about enabling authentication in BinderHub documentation. The config we used on our staging server is as follows:
binderhub:
config:
BinderHub:
auth_enabled: true
jupyterhub:
cull:
# don't cull authenticated users
users: False
custom:
binderauth_enabled: true
hub:
redirectToServer: false
services:
binder:
oauth_redirect_uri: "https://notebooks-test.gesis.org/oauth_callback"
oauth_client_id: "binder-oauth-client-test"
singleuser:
# to make notebook servers aware of hub
cmd: jupyterhub-singleuser
auth:
type: github
github:
callbackUrl: "https://notebooks-test.gesis.org/hub/oauth_callback"
clientId: "###secret###"
clientSecret: "###secret###"
scopes:
- "read:user"
admin:
users: ['bitnik', 'arnim']
Persistent Storage
The overall desiderata for persistence were to enable multiple projects while keeping the behavior and established directory structure of vanilla binder environments. This lead to the following landmarks that guided our development:
- provide each user pod with a PV (Persistent Volume), where multiple projects of a single user can reside, each project in a separate folder
- mount the user’s PV somewhere other than the home folder (e.g.
/projects
), so that users can access files across multiple projects - mount a selected project folder (from user’s PV) into the home folder (
/home/jovyan
) - start a notebook server on
/home/jovyan
which is the default behavior of BinderHub - in the project folder have the same content as provided by
repo2docker
, and not introduce any additional logic. This is particularly important because projects may use further features ofrepo2docker
such as thepostBuild
script. As a consequence, we don’t want to usegit clone
ornbgitpuller
to fetch content in this step. - use
repo2docker
with the default configuration, so we can share output images with other BinderHub deployments, such as at GESIS Notebooks - support the ability to migrate existing users on a JupyterHub without the loss of information
/home/jovyan
is also where repo2docker
clones by default repository content to. So we had to find a way to copy repo content into the PV before it is mounted to the user pod. For this, we decided to use initContainers
which
- has the same image as notebook container
- has the PV containing all of a user’s projects mounted into
/projects/
- deletes project folders if a user deleted any through
Your Projects
table - copies content of the home folder into
/projects/<project_folder_name>
if the<project_folder_name>
folder doesn’t exist
# example
initContainers:
- name: project-manager
image: <image-name-tag-created-by-repo2docker>
volumeMounts:
- mountPath: /projects/
name: volume-bitnik
command:
- /bin/sh
- -c
- <first delete projects, then copy content of current repo>
Once initContainers
is done, the user’s notebook container is ready to start. We can then mount the same PV into 2 different locations, /home/jovyan
with sub-path of the project folder and /projects/
where user can reach all projects:
spec:
containers:
volumeMounts:
- mountPath: /home/jovyan
name: volume-bitnik
subPath: <project_folder_name>
- mountPath: /projects/
name: volume-bitnik
volumes:
- name: volume-bitnik
persistentVolumeClaim:
claimName: claim-bitnik
initContainers
and PVs of the user pod are configured for each user pod during spawn in the start method of PersistentBinderSpawner. The PersistentBinderSpawner customizes KubeSpawner to:
- save all the projects a user has in
Spawner
'sstate
(JSONDict) field under theprojects
key - cache deleted projects under the
deleted_projects
key untill their actual removal - get the image name and tag from
user_options
, which is produced after the build the process of binder - configure
initContainers
as mentioned above - configure PV of the user pod as mentioned above
Notes:
- Users can launch one project at a time on the test instance and have up to 5 projects in total
- When a user launches a repo from
Your Projects
table, the user continues on this project where she/he left, with same image and code-base - Code-base is only copied from image when the project folder is missing in PV
- User can update repository image by using binder form
- User can use
git
ornbgitpuller
to manually update the repository content
Deployment repository
gesiscss/example-binderhub-deployments is a repository where we hold config files for different kinds of BinderHub deployments. Here we want to point to some important files for our persistent BinderHub deployment:
- BinderHub helm chart version
- helm-chart/values.yaml
- auth.yaml
- persistent_storage.yaml
- Custom KubeSpawner and JupyterHub handler for projects:
- PersistentBinderSpawner
- ProjectAPIHandler - to get list of projects or to delete a project of a user
- Custom JupyterHub home template with Binder form
Last but not least we (@arnim and @bitnik) want to thank the incredible Binder community for supporting this with awesome contributions and invaluable advice.