I’m bringing this question over from gitter because it may be a longer-term discussion.
I have a question for people deploying Z2JH on Google GKE. I’ve deployed an (external) NFS sever using U18.04 on a VM. I can mount the NFS shares on other instances in GCE. However, I can not mount the shared on instances created in GKE node-pools, much less mount them in pods. I can ping the NFS server, but the nfs mount requests appear to just hang. I’m doing this from the U18.04 nodes on which pods are deployed in an attempt to debug why pods themselves can’t mount NFS.
So, my question: If you’ve gotten NFS to work in such a situation, can you share your configurations and/or experience on how you got it to work?
In both cases, I can’t mount NFS on the nodes themselves. Clearly there’s a firewall involved, but I can’t seem find a way to either disable it or allow the local connections.
Thought I would follow up on this. I don’t know if it’s useful to have a “best practices” section of the Z2JH docs, but I think that attaching information about practical deployment details there would save people a lot of time.
In our case, we’re trying to deploy JH to support general computing classes and light computing classes. Our default notebook for students has Python, C++, etc and Microsoft Visual Code. We’ve been using a per-student PV solution since May 2018 but the costs are mounting. The motivation for moving to NFS was cost and improved startup times, which NFS appears to solve. We think cost will drop from $380/mo to $80/mo for storage with similar/better performance.
We’re still working out a full solution, but some things we’ve found useful for our GCE / GKE deployment:
We’re using an external NFS server using e.g. 2TB of standard PV
we switched to using network tag firewall rules where the NFS server is tagged with “nfs-server” and the JH cluster is tagged as “nfs-client”. The firewall rule then allows access to nfs-server from nfs-client. This is much easier to manage than a CIDR based firewall rule
The NFS server exports using all_squash and sets anonuid=1000, anongid=100 which is is the default user/group in our docker-stacks derived containers. This simplifies the container startup because you don’t need to use an initcontainer running as root to chown the directory since all file I/O is then as the specified user. This also eliminates need to use no_root_squash . However, it also means we can’t enforce per-user file system quota using NFS quotas.
We’re not certain this is the best way forward, but we want to roll this out before start of 2020 term.
Having more best-practices/“this is how we did it” content would be great. There is http://z2jh.jupyter.org/en/latest/community/index.html which is meant as a lightweight way to link to resources created by community members.
The reasoning for linking to other people’s work instead of incorporating it in the docs directly is that it will reduce the load on the Z2JH maintainers and that several deployment setups require access to the setup you are describing. Like you need access to AWS to work on the AWS instructions.
I think we can even link to this thread (and make a wiki) as a quick way to get the content into the docs. It would probably need some more words/step-by-step guidance.
Tim - good idea. I’ll try to write up our experience later.
Another NFS specific hack that proved useful is related to NFS shared folders. We wanted teaching assistants and instructors to be able to share e.g. nbgrader databases.
To do this, we used the stanza shown below by adding it to in hub::extraConfig. The configuration file is a JSON file stored on the hub (in same PV as hub sqlite database). The format is e.g.
On a related note, if you’re interested in juicing up the login page, that can also be done using another stanza in hub::extraConfig. The following image shows our login page (we use an image per class to avoid tightly coupling courses images)