We have JupyterHub setup on Kubernetes (AWS, kops, self-managed, Zero to JupyterHub with Kubernetes — Zero to JupyterHub with Kubernetes documentation). The JupyterHub setup involves us using custom built images for our environments. We have some internal packages installed inside those images that are essential for our developers to work. Please note that these are legacy systems that cannot be modified now. We have dynamic volume creation enabled as a part of our storage class.
Problem
We know that /home/jovyan
is the home directory that JupyterHub works on. We have configured our values.yaml
to reflect the same:
storage:
capacity: 150Gi
dynamic:
storageClass: jupyter
homeMountPath: /home/jovyan
As per this configuration, data at /home/jovyan
is detailed when a user starts an environment and stores something at this location.
However, we have custom packages, scripts installed at /opt/data
, /opt/apps
as well as /opt/conda
, as a part of the Dockerfile → image build process. The data/changes made in /opt
is not retained like /home/jovyan
.
Things we have tried (and failed)
- If we mount at
/opt/
(which is possible), we will lose all the packages and scripts installed at/opt/
because of the mount. And even if the users run some script at which will re-install those packages, the mount at/opt
will overwrite those packages. - Using an init container at the environment start to copy stuff from
/opt/
to/home/
and maintain a symbolic link. This approach failed because our users (developers) might install more packages (predominantly Python) as a part of their workflow which are breaking the symbolic links and is not feasible. - We tried using a different volume mount for
/opt/
per user, but the problem is that we are already getting a PVC for every user for each of their servers (environments) and there is no way to dynamically name and recognize these volumes. - Tt’s important to note that JupyterHub’s typical deployment with the Zero to JupyterHub Helm chart doesn’t use StatefulSets directly. It uses Deployments and Pods managed by KubeSpawner. Adapting this to use StatefulSets would require significant customization of the JupyterHub setup, which might be complex and could deviate from the standard practices of JupyterHub on Kubernetes.
We wanted to know if there is a better way of achieving this rather than copying stuff around.
tldr; we are mounting at /home/jovyan
, we also want to retain data at /opt/
. This step needs to happen per user per server (environment).