Mounting an SMB or NFT Azure File share onto JupyterHub on kubernetes for a shared directory

Hey everyone! New user of discourse here, but at least mildly competent user of JupyterHub (or at least I’d like to think).

I’m essentially trying to use JupyterHub for a shared development environment for a team of about a dozen users (we may expand this to larger/other teams later, but for now I want to get this working for just our team), and one feature that would be extremely helpful, and looks doable, is having a shared directory for notebooks, files, and data. I think I’m pretty close to getting this set-up, but I’m running into a mounting issue that I can’t quite resolve. I’ll quickly explain my setup first and then the issue. I’d really appreciate any help/comments/hints that anyone has!

Setup

Currently, all of this setup is on a Kubernetes cluster in Azure or other Azure-hosted services. We have a resource group with a kubernetes cluster, App Service Domain, DNS Zone, virtual network, container registry (for our custom docker images), and storage account. Everything works fine, except that in the storage account, I have an Azure SMB (server message block, as opposed to NFS) file share that I’ve tried mounting via a PV (persistent volume) and PVC (persistent volume claim) to a JupyterHub server, but to no avail.

I’ve created the PV and PVC successfully using the following YAML files.

For the PV:

apiVersion: v1
kind: PersistentVolume
metadata:
  name: azurefile
spec:
  capacity:
    storage: 10Gi
  accessModes:
    - ReadWriteMany
  azureFile:
    secretName: azure-secret
    shareName: aksshare
    readOnly: false
  mountOptions:
  - dir_mode=0777
  - file_mode=0777
  - uid=1000
  - gid=1000
  - mfsymlinks
  - nobrl

For the PVC:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: azurefile
spec:
  accessModes:
    - ReadWriteMany
  storageClassName: ""
  resources:
    requests:
      storage: 10Gi

and running

kubectl apply -f azurefile-mount-options-pv.yaml
kubectl apply -f azurefile-mount-options-pvc.yaml

resulting in

$ kubectl get pvc azurefile
NAME        STATUS   VOLUME      CAPACITY   ACCESS MODES   STORAGECLASS   AGE
azurefile   Bound    azurefile   10Gi       RWX                           19h

I also created the corresponding kubernetes secret from the storage account key via

 # Get storage account key
STORAGE_KEY=$(az storage account keys list --resource-group $resourceGroupName --account-name $storageAccountName --query "[0].value" -o tsv)

kubectl create secret generic azure-secret \ 
    --from-literal=azurestorageaccountname=$storageAccountName \ 
    --from-literal=azurestorageaccountkey=$STORAGE_KEY

I’m pretty confident that I can verify that this secret was properly created and transmitted (mainly since I previously had a bug where it wasn’t being read, and I was able to fix that).

Lastly, I added the following to my Helm config.py

singleuser:
  storage:
    extraVolumes:
      - name: azure
        persistentVolumeClaim:
          claimName: azurefile
    extraVolumeMounts:
      - name: azure
        mountPath: /home/shared

I can also post my entire Helm config if needed, but I think this is the only relevant portion, and I’m not currently testing on my custom Docker image (although I can if needed), so that shouldn’t be a complicating issue.

Issue

With this setup, I repeatedly get the following error when trying to spawn a server:

2021-07-14T00:56:00Z [Warning] MountVolume.SetUp failed for volume "azurefile" : mount failed: exit status 32 Mounting command: systemd-run Mounting arguments: --description=Kubernetes transient mount for /var/lib/kubelet/pods/dbfc3ed8-9b4a-44bd-94a8-f6f1990c5285/volumes/kubernetes.io~azure-file/azurefile --scope -- mount -t cifs -o dir_mode=0777,file_mode=0777,gid=1000,mfsymlinks,nobrl,uid=1000,vers=3.0,actimeo=30,<masked> //wintermutehdd.file.core.windows.net/aksshare /var/lib/kubelet/pods/dbfc3ed8-9b4a-44bd-94a8-f6f1990c5285/volumes/kubernetes.io~azure-file/azurefile Output: Running scope as unit: run-r952aaf8cbead4524b23895a7836cf070.scope mount error(2): No such file or directory Refer to the mount.cifs(8) manual page (e.g. man mount.cifs)

I get about a dozen of these errors, with Output: Running scope as unit: run-<random number>.scope being the part that changes in every error. Eventually, the entire spawn fails after via a 300 second timeout.

Generally, I’m following a suggestion that I got on the JupyterHub Gitter

and the only real difference I can see is that I’m using an SMB file share instead of an NFS one. I could switch over to NFS (although this requires a “premium” file share), but I don’t see why this couldn’t work with SMB.

I’ve also looked through this post Scaleable JupyetrHub Deployments in Education (Teaching) - Special Topics / Education - Jupyter Community Forum and its linked posts, but I don’t think they exactly solve my problem.

I tried to include all the relevant info that I thought would be needed, but let me know if I missed anything. Again, thank you in advance!

UPDATE (7/14)

I caved and tried to use the premium file share using NFS, but I seem to have just moved the issue around. When trying to create a PV using the following yaml

apiVersion: v1
kind: PersistentVolume
metadata:
  name: shared-nfs-pv
spec:
  capacity:
    storage: 100Gi
  accessModes:
    - ReadWriteMany
  azureFile:
    secretName: azure-secret
    shareName: aksshare
    readOnly: false
  nfs:
    server: wintermutessd.file.core.windows.net:/wintermutessd/wintermutessdshare
    path: /home/shared
    readOnly: false
  storageClassName: premium-nfs
  mountOptions: 
  - dir_mode=0777
  - file_mode=0777
  - uid=1000
  - gid=1000
  - mfsymlinks
  - nobrl

I get the error: Failed to create the persistentvolume 'shared-nfs-pv'. Error: Invalid (422) : PersistentVolume "shared-nfs-pv" is invalid: spec.azureFile: Forbidden: may not specify more than 1 volume type. Removing the azureFile options solves this error, but I feel like it would be necessary to specify the kubernetes secret that I created. Going along with the PV created by removing azureFile, I can create a PVC for it and then try to mount it on jupyterhub using extraVolumes and extraVolumeMounts as before. This seems to be slightly better, as I only get one mounting error instead of many:

Just in case this is relevant, the NFS azure file share is only accessible via a private endpoint, but this should be fine since my kubernetes cluster is running in the same virtual network. In fact, Azure tells me that I could just mount this NFS share on linux with

sudo apt-get -y update
sudo apt-get install nfs-common
sudo mkdir -p /mount/wintermutessd/wintermutessdshare
sudo mount -t nfs wintermutessd.file.core.windows.net:/wintermutessd/wintermutessdshare /mount/wintermutessd/wintermutessdshare -o vers=4,minorversion=1,sec=sys

As always, any ideas welcome :slight_smile:

I don’t have time to read through this now, but I quickly verified that an azureFile kind of PVC has the ability to use ReadWriteMany. That is why I recommended using NFS for a shared folder - because it supports that.

Not sure what causes the issue but I think this is a Kubernetes issue rather than a JupyterHub issue, so it will be reasonable to google the issue generally where the search keywords relate to k8s, azureFile PVs and PVCs.

Gtg!

Sure, don’t worry, I totally understand, and thanks for the help so far!

I actually did cave and got the NFT share (in the UPDATE that I added to my post above), but it either gives me the same error or doesn’t let me use the exact YAML file that I think I should be using. I’ve googled similar errors but nothing quite seems to match exactly, but I’ll keep at it.

Hey, so I actually managed to figure this out using a dynamically allocated Azure file share. I’m writing an internal documentation for this, but I thought I’d post the relevant bit here. I hope this helps people!

Dynamically creating an Azure file share and storage account by defining a PVC and storage class

Here, we’re mainly following the documentation for dynamically creating a PV with Azure Files in AKS. The general idea is to create a storage class that will define what kind of Azure file share we want to create (premium vs. standard and the different redundancy modes) and then create a PVC (persistent volume claim) that adheres to that storage class. Consequently, when JupyterHub tries to mount the PVC we created, it will automatically create a PV (persistent volume) for the PVC to bind to, which will then automatically create a storage account and file share for the PV to actually store filese in. This will all be done in the resource group that backs the one we’re already using (these generally start with “MC_”). Here, we will be using the premium storage class with zone reduntant storage. First, create the storage class to be used (more info on the available tags here can be found in this repository) with the following YAML

kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: shared-premium-azurefile
provisioner: kubernetes.io/azure-file
mountOptions:
  - dir_mode=0777
  - file_mode=0777
  - uid=0
  - gid=0
  - mfsymlinks
  - cache=strict
  - actimeo=30
parameters:
  skuName: Premium_ZRS

Name this file azure-file-sc.yaml and run

kubectl apply -f azure-file-sc.yaml

Next, we will create a PVC which will dynamically provision from our Azure file share (it automatically creates a PV for us). Create the file azure-file-pvc.yaml with the following code

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: shared-premium-azurefile-pvc
spec:
  accessModes:
    - ReadWriteMany
  storageClassName: shared-premium-azurefile
  resources:
    requests:
      storage: 100Gi

and apply it with

kubectl apply -f azure-file-pvc.yaml

This will create the file share and the corresponding PV. We can check that our PVC and storage class were successfully created with

kubectl get storageclass
kubectl get pvc

It might take a couple of minutes for the PVC to bind.

On the Azure side, this is all that has to be done, and the dynamic allocation of the PV and file share are taken care of for us.

Mounting the PVC to JupyterHub in the home directory

JupyterHub, by default, creates a PVC of 10Gi for each new user, but we can also tell it to mount existing PVCs as external volumes (think of this as just plugging in your computer to a shared USB drive). To mount our previously created PVC in the home folder of all of our JupyterHub users, we simply add the following to our config.py Helm config:

singleuser:
  storage:
    extraVolumes:
      - name: azure
        persistentVolumeClaim:
          claimName: shared-premium-azurefile-pvc
    extraVolumeMounts:
      - name: azure
        mountPath: /home/jovyan/shared

Now, when JupyterHub starts up, all users should have a shared directory in their home folders with read and write permission.

1 Like