Issues with respect to JupyterHub's volume mount and custom-built images

nsudhanva · November 20, 2023, 9:30am

We have JupyterHub setup on Kubernetes (AWS, kops, self-managed, Zero to JupyterHub with Kubernetes — Zero to JupyterHub with Kubernetes documentation). The JupyterHub setup involves us using custom built images for our environments. We have some internal packages installed inside those images that are essential for our developers to work. Please note that these are legacy systems that cannot be modified now. We have dynamic volume creation enabled as a part of our storage class.

Problem

We know that /home/jovyan is the home directory that JupyterHub works on. We have configured our values.yaml to reflect the same:

  storage:
    capacity: 150Gi
    dynamic:
      storageClass: jupyter
    homeMountPath: /home/jovyan

As per this configuration, data at /home/jovyan is detailed when a user starts an environment and stores something at this location.

However, we have custom packages, scripts installed at /opt/data, /opt/apps as well as /opt/conda, as a part of the Dockerfile → image build process. The data/changes made in /opt is not retained like /home/jovyan.

Things we have tried (and failed)

If we mount at /opt/ (which is possible), we will lose all the packages and scripts installed at /opt/ because of the mount. And even if the users run some script at which will re-install those packages, the mount at /opt will overwrite those packages.
Using an init container at the environment start to copy stuff from /opt/ to /home/ and maintain a symbolic link. This approach failed because our users (developers) might install more packages (predominantly Python) as a part of their workflow which are breaking the symbolic links and is not feasible.
We tried using a different volume mount for /opt/ per user, but the problem is that we are already getting a PVC for every user for each of their servers (environments) and there is no way to dynamically name and recognize these volumes.
Tt’s important to note that JupyterHub’s typical deployment with the Zero to JupyterHub Helm chart doesn’t use StatefulSets directly. It uses Deployments and Pods managed by KubeSpawner. Adapting this to use StatefulSets would require significant customization of the JupyterHub setup, which might be complex and could deviate from the standard practices of JupyterHub on Kubernetes.

We wanted to know if there is a better way of achieving this rather than copying stuff around.

tldr; we are mounting at /home/jovyan, we also want to retain data at /opt/. This step needs to happen per user per server (environment).

manics · November 23, 2023, 12:00am

One option is to mount an addition volume under /opt as well as /home, and somehow populate the volume. This will effectively behave similarly to your statefulset suggestion
Another option is to copy stuff from /opt to /home as you’ve suggested, but in addition to this modify your conda configuration so when users install packages they’re installed to an environment under their home directory instead of /opt
A final option is to make /opt read-only for users, e.g. by making it owned by root. User won’t be able to modify the default conda environment, but they can create new ones which will be automatically create in their home directory.

Topic		Replies	Views
Custom Dockerimage for Jupyterhub on Kubernetes Zero to JupyterHub on Kubernetes jupyterhub , help-wanted , docker	6	3316	September 25, 2020
Mount external NFS ( or local filesystem ) Zero to JupyterHub on Kubernetes help-wanted	12	5404	November 13, 2022
Changing CWD in Jupyterhub Zero to JupyterHub on Kubernetes	3	1110	October 7, 2019
PVCs as home directories? JupyterHub	6	3062	September 30, 2019
Dynamically provision two PVCs for each user? Zero to JupyterHub on Kubernetes help-wanted	5	957	July 13, 2023

Issues with respect to JupyterHub's volume mount and custom-built images

Problem

Things we have tried (and failed)

Related topics