JupyterHub Object Storage S3

Hello JupyterHub Community,

I have currently deploy a jupyterhub on a cluster kubernetes with the kubespawner in order to create a pod for each users. The spawner works great.

I would add persistence for users notebook with dynamic provisionning (kubernetes persistent volume / persistant volume claim) in each pods using an s3 Bucket in order to use the quota limitations of the kubespawner.

So i am looking for a kubernetes object storage plugin to do this, anyone have setup the same things ?

Best Regards

Chris

You can realize this by using the pod lifecycle hooks:

Here is a quick and untested example. But I think you can get the idea…

c.KubeSpawner.lifecycle_hooks = {
    "postStart": {
        "exec": {
            "command": ["/bin/sh", "s3_script.sh", "restore"]
        }
    },
    "preStop": {
        "exec": {
            "command": ["/bin/sh", "s3_script.sh", "backup"]
        }
    }
}

s3_script.sh

#!/bin/bash

HOME=${HOME:-/home/jovyan}
TAR=/tmp/${JUPYTERHUB_USER}.tar.gz

case $1 in
    backup)
        tar czvf ${TAR} ${HOME}
        s3cmd sync [options-here] ${TAR} s3://my-s3-bucket/${JUPYTERHUB_USER}.tar.gz
        ;;

    restore)
        s3cmd sync [options-here] s3://my-s3-bucket/${JUPYTERHUB_USER}.tar.gz ${TAR}
        tar xzvf ${TAR}
        ;;
esac


1 Like

To make this generic, I would pass the S3 credentials via environment variables and store them in Kubernetes secrets. Ideally you create a separate bucket and permissions for each user.

The backup and restore script can easily added to the singleuser image by mounting it from a ConfigMap into the pod.