Dynamically provision two PVCs for each user?

Roy_Dragseth · July 2, 2023, 7:56pm

Is it possible to dynamically provide additional per user storage in Z2JH?

The default home dir generation is fine, but in addition each user should have a larger and slower storage area. For instance mounted as /home/datadir and private to each user.

I’m working my way through the Z2JH docs (very good btw) and have found the extraVolumes config, but it seems to be more geared towards precreated and shared storage.

I’m trying to migrate my current setup based on cluster of traditional virtual machines to Azure Kubernetes Service and the plan is to have the user home dir on Azure Disk as per default and have the /home/datadir on Azure Files.

Any guidance is greatly appreciated.

stebo85 · July 3, 2023, 9:17am

Roy_Dragseth · July 3, 2023, 12:40pm

Thanks, but helm balks on the config in the linked post. I feel like I’m missing something obvious here as it does not seem to like the keyword azureFile.

Roy_Dragseth · July 7, 2023, 9:32am

I found a way to achieve something that works quite OK by manipulating the pre_spawn_hook to strip away non-existing PVCs from the pod specification. Somewhat opposite of my initial plan, this way one can create extra volumes for each user which get mounted if the extraVolumes match. In this setup the default home dir creation is untouched and if the PVC private-datadir-{username} exists it get mounted on /home/datadir.

I feel this is rather hackish and a more structured approach should be taken. Maybe the feature request on kubespawner pans out Expose PVC Provisioning Method · Issue #747 · jupyterhub/kubespawner (github.com)

For completeness here’s the config I ended up with:

hub:
  extraConfig:
    checkvolumes: |
      async def check_pvcs(spawner):
        """Remove nonexisting pvcs (except default home dir) from pod specification."""
        # Assumptions:
        #   Default user dir pvc name starts with "volume-"
        #   Default user home path is "/home/jovyan"
        import copy
        spawner.log.info("check_pvcs: Doing a PVC check...")

        existing_pvcs = await spawner.api.list_namespaced_persistent_volume_claim(spawner.namespace)
        existing_pvc_names = [n.metadata.name for n in existing_pvcs.items]
        for i, volume in enumerate(copy.deepcopy(spawner.volumes)):
          expanded_volume = spawner._expand_all(volume)
          evname = expanded_volume['name']
          claimName = expanded_volume['persistentVolumeClaim']['claimName']
          if not (evname.startswith("volume-") or 
                  claimName in existing_pvc_names):
            spawner.log.info("check_pvcs: Removing nonexistant PVC %s" % str(spawner.volumes[i]))
            del spawner.volumes[i]
        filtered_pvc_names = [v['name'] for v in spawner.volumes]
        for i, volume_mount in enumerate(copy.deepcopy(spawner.volume_mounts)):
          expanded_volume_mount = spawner._expand_all(volume_mount)
          mountPath = expanded_volume_mount['mountPath']
          evmname =  expanded_volume_mount['name']
          if not (mountPath == "/home/jovyan" or
                  evmname in filtered_pvc_names):
            spawner.log.info("check_pvcs: Nonexisting PVC, removing mountpath %s" % str(spawner.volume_mounts[i]))
            del spawner.volume_mounts[i]
        
      c.Spawner.pre_spawn_hook = check_pvcs

singleuser:
  image:
    name: jupyter/datascience-notebook
    tag: lab-4.0.2
  storage:
    extraVolumes:
      - name: user-datadir
        persistentVolumeClaim:
          claimName: private-datadir-{username}
    extraVolumeMounts:
      - name: user-datadir
        mountPath: /home/datadir

MarcSkovMadsen · July 11, 2023, 5:05am

Please note Azure Filshares are quite slow. Especially when working with many files as for example a conda environment or Python repository would have. Also its not a good place to store shared writeable files.

We later added a netapp fileshare. Its much faster. Still a noticeable lag when running first cell in a notebook based on a conda environment in a netapp fileshare though.

I dont have backup on the netapp fileshare. I’m told its not technically possible.

Roy_Dragseth · July 13, 2023, 6:20am

Yes, this is why I want to provide two areas per user, one fast and small (~10 GB) for $HOME using Azure Disks and one slow and large (~10TB, maybe 10X this also) datadir using Azure Files. Azure Files is quite OK if you have a few large files you need to access, small file performance is terrible. conda-env create take 40 mins on Azure Files.

Topic		Replies	Views
Create storage extra volume based on the logged-in user name Zero to JupyterHub on Kubernetes	3	486	July 3, 2023
Mount a unique PVC for each user Zero to JupyterHub on Kubernetes	2	721	March 3, 2021
Dynamically Provisioning Additional Storage in JupyterHub with User-specific Configuration Zero to JupyterHub on Kubernetes	1	272	May 5, 2024
Mount Jupyterhub Singleuser PV to extra Container Zero to JupyterHub on Kubernetes how-to , help-wanted	3	898	November 25, 2021
Any way to create a profile list that selects persistent storage per user? Zero to JupyterHub on Kubernetes jupyterhub , help-wanted	4	63	November 20, 2024

Dynamically provision two PVCs for each user?

Related topics