I am just setting up a JupyterHub / Kubernetes cluster for our University, and I’m considering NFS for home directory storage.
I found the DataHub writeup - and that made me wonder - what is the current best advice for using NFS on Google Cloud? Is it the data8x approach of Google Cloud Filestore? Or a hand-maintained server? How about the best client connection method? Where would I start in compiling a recipe?
Exploring more - I think I will need separate NFS server - perhaps installed via Helm [1, 2].
But - I’m struggling to orient myself, and I suspect this morass is one that many of y’all have already escaped. Just for example, where should I look for the meaning of parts of the datahub setup, such as the nfsPVC section, and the storage section?
Do y’all set up the NFS server by hand, or do you use Helm? If you use Helm, how do y’all reclaim the storage when restarting the cluster?
The nfsPVC section in the datahub setup is specific to the datahub setup, and abstracts out creation of NFS PersistentVolume and PersistentVolumeClaim parameters.
Adding NFS proved relatively straightfoward. Our config is here:
Updating - I ran into trouble with the default NFS setup above, where internal DNS was failing with moderate number of users requesting pods. Meaning, DNS lookup was failing for nfs-server.jhub.svc.cluster.local under load.
In the end, I started the NFS service, detected the resulting IP, and wrote that into the persistent volume information, before starting my JHub cluster. This has proved stable so far:
But, I worry that this will mean that the NFS may fail if the NFS service has to be restarted, because it might acquire another IP. I’ve set myself to look into starting my cluster in a Virtual Private Cloud and reserving a static internal IP for the NFS pod.