Current advice for NFS home directories on GKE?

I am just setting up a JupyterHub / Kubernetes cluster for our University, and I’m considering NFS for home directory storage.

I found the DataHub writeup - and that made me wonder - what is the current best advice for using NFS on Google Cloud? Is it the data8x approach of Google Cloud Filestore? Or a hand-maintained server? How about the best client connection method? Where would I start in compiling a recipe?

I also read this thread - thanks to dirkcgrunwald for posting that.

1 Like

In my experience Google Cloud Filestore works well. The only/biggest disadvantage is the high starting price for small teams/disk usage.

Yes, good point, I’ve just been exploring. I see the minimum size is 1TB, which translates to a minimum cost to us of $240 / month.

On the other hand, I see that a dedicated g1-small instance, with 350GB of standard zonal disk, would cost about $35 per month.

Exploring more - I think I will need separate NFS server - perhaps installed via Helm [1, 2].

But - I’m struggling to orient myself, and I suspect this morass is one that many of y’all have already escaped. Just for example, where should I look for the meaning of parts of the datahub setup, such as the nfsPVC section, and the storage section?

Do y’all set up the NFS server by hand, or do you use Helm? If you use Helm, how do y’all reclaim the storage when restarting the cluster?

[1] https://github.com/dirkcgrunwald/zero-to-jupyterhub-k3s/tree/master/basic-with-nfs-volumes
[2] https://www.padok.fr/en/blog/readwritemany-nfs-kubernetes

To answer my own questions:

The nfsPVC section in the datahub setup is specific to the datahub setup, and abstracts out creation of NFS PersistentVolume and PersistentVolumeClaim parameters.

Adding NFS proved relatively straightfoward. Our config is here:

with some notes on getting storage working in storage.md. The actual NFS setup is in init_nfs.sh and used in the config.yaml.

Updating - I ran into trouble with the default NFS setup above, where internal DNS was failing with moderate number of users requesting pods. Meaning, DNS lookup was failing for nfs-server.jhub.svc.cluster.local under load.

In the end, I started the NFS service, detected the resulting IP, and wrote that into the persistent volume information, before starting my JHub cluster. This has proved stable so far:

But, I worry that this will mean that the NFS may fail if the NFS service has to be restarted, because it might acquire another IP. I’ve set myself to look into starting my cluster in a Virtual Private Cloud and reserving a static internal IP for the NFS pod.