Dynamically provision two PVCs for each user?

Is it possible to dynamically provide additional per user storage in Z2JH?

The default home dir generation is fine, but in addition each user should have a larger and slower storage area. For instance mounted as /home/datadir and private to each user.

I’m working my way through the Z2JH docs (very good btw) and have found the extraVolumes config, but it seems to be more geared towards precreated and shared storage.

I’m trying to migrate my current setup based on cluster of traditional virtual machines to Azure Kubernetes Service and the plan is to have the user home dir on Azure Disk as per default and have the /home/datadir on Azure Files.

Any guidance is greatly appreciated.

Thanks, but helm balks on the config in the linked post. I feel like I’m missing something obvious here as it does not seem to like the keyword azureFile.

I found a way to achieve something that works quite OK by manipulating the pre_spawn_hook to strip away non-existing PVCs from the pod specification. Somewhat opposite of my initial plan, this way one can create extra volumes for each user which get mounted if the extraVolumes match. In this setup the default home dir creation is untouched and if the PVC private-datadir-{username} exists it get mounted on /home/datadir.

I feel this is rather hackish and a more structured approach should be taken. Maybe the feature request on kubespawner pans out Expose PVC Provisioning Method · Issue #747 · jupyterhub/kubespawner (github.com)

For completeness here’s the config I ended up with:

hub:
  extraConfig:
    checkvolumes: |
      async def check_pvcs(spawner):
        """Remove nonexisting pvcs (except default home dir) from pod specification."""
        # Assumptions:
        #   Default user dir pvc name starts with "volume-"
        #   Default user home path is "/home/jovyan"
        import copy
        spawner.log.info("check_pvcs: Doing a PVC check...")

        existing_pvcs = await spawner.api.list_namespaced_persistent_volume_claim(spawner.namespace)
        existing_pvc_names = [n.metadata.name for n in existing_pvcs.items]
        for i, volume in enumerate(copy.deepcopy(spawner.volumes)):
          expanded_volume = spawner._expand_all(volume)
          evname = expanded_volume['name']
          claimName = expanded_volume['persistentVolumeClaim']['claimName']
          if not (evname.startswith("volume-") or 
                  claimName in existing_pvc_names):
            spawner.log.info("check_pvcs: Removing nonexistant PVC %s" % str(spawner.volumes[i]))
            del spawner.volumes[i]
        filtered_pvc_names = [v['name'] for v in spawner.volumes]
        for i, volume_mount in enumerate(copy.deepcopy(spawner.volume_mounts)):
          expanded_volume_mount = spawner._expand_all(volume_mount)
          mountPath = expanded_volume_mount['mountPath']
          evmname =  expanded_volume_mount['name']
          if not (mountPath == "/home/jovyan" or
                  evmname in filtered_pvc_names):
            spawner.log.info("check_pvcs: Nonexisting PVC, removing mountpath %s" % str(spawner.volume_mounts[i]))
            del spawner.volume_mounts[i]
        
      c.Spawner.pre_spawn_hook = check_pvcs

singleuser:
  image:
    name: jupyter/datascience-notebook
    tag: lab-4.0.2
  storage:
    extraVolumes:
      - name: user-datadir
        persistentVolumeClaim:
          claimName: private-datadir-{username}
    extraVolumeMounts:
      - name: user-datadir
        mountPath: /home/datadir

Please note Azure Filshares are quite slow. Especially when working with many files as for example a conda environment or Python repository would have. Also its not a good place to store shared writeable files.

We later added a netapp fileshare. Its much faster. Still a noticeable lag when running first cell in a notebook based on a conda environment in a netapp fileshare though.

I dont have backup on the netapp fileshare. I’m told its not technically possible.

Yes, this is why I want to provide two areas per user, one fast and small (~10 GB) for $HOME using Azure Disks and one slow and large (~10TB, maybe 10X this also) datadir using Azure Files. Azure Files is quite OK if you have a few large files you need to access, small file performance is terrible. conda-env create take 40 mins on Azure Files.

1 Like