JupyterHub share folder

Can we have share folder/volumn between pods , which are created by Jupyterhub users

Sure you can. On our setup, we have something like

singleuser:

...

  storage:
    capacity: 10Gi
    extraVolumes:
      - name: shm-volume
        emptyDir:
          medium: Memory
      
      - name: jupyterhub-shared
        persistentVolumeClaim:
          claimName: jupyterhub-shared-volume

      - name: jupyterhub-datasets
        persistentVolumeClaim:
          claimName: jupyterhub-datasets-volume

    extraVolumeMounts:
      # will increase /dev/shm from default 64MB
      # beneficial for pyTorch's DataLoader
      - name: shm-volume
        mountPath: /dev/shm
      
      - name: jupyterhub-shared
        mountPath: /home/jovyan/shared

      - name: jupyterhub-datasets
        mountPath: /home/jovyan/datasets
        readOnly: true

But you need to create volumes manually with something like

---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: jupyterhub-datasets-volume
  namespace: jupyter
spec:
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 1Ti

---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: jupyterhub-shared-volume
  namespace: jupyter
spec:
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 100Gi

The only limitation I’m aware of is that your storage driver needs to support ReadWriteMany (RWX), but I might be wrong. We use storage on top of csi-driver-nfs without an issue.

EDIT #1
Each user will now have a shared/ and datasets/ folder. The content will be seen by all users, but only content in shared/ can be modified.

1 Like

Thanks, but ReadWriteMany is not supporting for us.
We have only ReadWriteOnce.

@kumaraselva, can you specify what storage system you use? It depends on the driver and whether it supports RWX. Check the table in Kubernetes docs on PVCs.

An alternative would be to have a parallel system with NFS or similar, add that as a new type of StorageClass in Kubernetes, and specify it as a storage type for shared volume.

Edit: PVC → StorageClass for the correctness

1 Like

@gcerar we are using TKGI , our support team says TKGI not support ReadWriteMany.

I am exploring NFS, getting not able to connect the NFS server issue

OK. There is documentation from vSphere [link] that states there is support for ReadWriteMany, which can be enabled but implies some security issues. From the docs, I understand their concerns.

Regarding the NFS server. From your statement, it is not clear what would be a problem. If I go with Occam’s razor, I would guess that the NFS server is not in the same network as the K8s cluster (e.g., 192.168.0.0/24). So they must be in the same subnet or explicitly permitted access from other subnets. See this blog post for reference.

1 Like

What have you found to be the easiest way to get data into the readonly shared disk?

Still looking, NFS is best option.

Once mount the NFS pvc, trying to mount the pvc via yaml file. When I am lunching jupyterhub, getting error like “Not able to connect the nfs server”

  • Check firewall settings
  • NFS folder for PVC must be completely empty (not even dotfiles). Otherwise, k8s will complain.
  • NFS folder must have RW permissions for nobody:nogroup.

Thanks will check and update here