Success stories using NFS with Z2JH and K8s?

I’m bringing this question over from gitter because it may be a longer-term discussion.

I have a question for people deploying Z2JH on Google GKE. I’ve deployed an (external) NFS sever using U18.04 on a VM. I can mount the NFS shares on other instances in GCE. However, I can not mount the shared on instances created in GKE node-pools, much less mount them in pods. I can ping the NFS server, but the nfs mount requests appear to just hang. I’m doing this from the U18.04 nodes on which pods are deployed in an attempt to debug why pods themselves can’t mount NFS.

So, my question: If you’ve gotten NFS to work in such a situation, can you share your configurations and/or experience on how you got it to work?

I’m using the configuration at https://github.com/berkeley-dsep-infra/datahub/blob/22022e5cfbf6d610eb01fc49ac2277f9e0645f03/docs/topic/cluster-config.rst and also modified as below (disabling ip-alias and network policy).

In both cases, I can’t mount NFS on the nodes themselves. Clearly there’s a firewall involved, but I can’t seem find a way to either disable it or allow the local connections.

gcloud beta container clusters create
–enable-ip-alias
–enable-autoscaling
–num-nodes 1
–max-nodes=2 --min-nodes=1
–region=us-central1 --node-locations=us-central1-b
–image-type=ubuntu
–disk-size=100 --disk-type=pd-ssd
–machine-type=n1-standard-2
–release-channel regular
–enable-autoupgrade
–enable-autorepair
–no-enable-network-policy
–create-subnetwork=""
–tags=hub-cluster
–node-labels hub.jupyter.org/node-purpose=core
jhub2

gcloud container node-pools create
–machine-type n1-standard-4
–num-nodes 1
–enable-autoscaling
–min-nodes 0 --max-nodes 20
–node-labels hub.jupyter.org/node-purpose=user
–node-taints hub.jupyter.org_dedicated=user:NoSchedule
–region=us-central1
–image-type=ubuntu
–disk-size=100 --disk-type=pd-ssd
–enable-autoupgrade \

Thought I would follow up on this. I don’t know if it’s useful to have a “best practices” section of the Z2JH docs, but I think that attaching information about practical deployment details there would save people a lot of time.

In our case, we’re trying to deploy JH to support general computing classes and light computing classes. Our default notebook for students has Python, C++, etc and Microsoft Visual Code. We’ve been using a per-student PV solution since May 2018 but the costs are mounting. The motivation for moving to NFS was cost and improved startup times, which NFS appears to solve. We think cost will drop from $380/mo to $80/mo for storage with similar/better performance.

We’re still working out a full solution, but some things we’ve found useful for our GCE / GKE deployment:

  • We’re using an external NFS server using e.g. 2TB of standard PV
  • we switched to using network tag firewall rules where the NFS server is tagged with “nfs-server” and the JH cluster is tagged as “nfs-client”. The firewall rule then allows access to nfs-server from nfs-client. This is much easier to manage than a CIDR based firewall rule
  • We used Berkeley’s method of using a priv’d daemon set to mount NFS share once per node ( https://github.com/berkeley-dsep-infra/datahub/blob/a3f40164e3a1ea86d49d134d2f68adeb0d78ed67/hub/templates/nfs-mounter.yaml )
  • The NFS server exports using all_squash and sets anonuid=1000, anongid=100 which is is the default user/group in our docker-stacks derived containers. This simplifies the container startup because you don’t need to use an initcontainer running as root to chown the directory since all file I/O is then as the specified user. This also eliminates need to use no_root_squash . However, it also means we can’t enforce per-user file system quota using NFS quotas.

We’re not certain this is the best way forward, but we want to roll this out before start of 2020 term.

1 Like

Having more best-practices/“this is how we did it” content would be great. There is http://z2jh.jupyter.org/en/latest/community/index.html which is meant as a lightweight way to link to resources created by community members.

The reasoning for linking to other people’s work instead of incorporating it in the docs directly is that it will reduce the load on the Z2JH maintainers and that several deployment setups require access to the setup you are describing. Like you need access to AWS to work on the AWS instructions.

I think we can even link to this thread (and make a wiki) as a quick way to get the content into the docs. It would probably need some more words/step-by-step guidance.

Tim - good idea. I’ll try to write up our experience later.

Another NFS specific hack that proved useful is related to NFS shared folders. We wanted teaching assistants and instructors to be able to share e.g. nbgrader databases.

To do this, we used the stanza shown below by adding it to in hub::extraConfig. The configuration file is a JSON file stored on the hub (in same PV as hub sqlite database). The format is e.g.

{
“csci2400” : [ “grunwald”, “jipa4409” ],
“nbgrader” : [ “grunwald” ]
}

The code that modifies the pod specification on launch is in the stanza below.

kubevol: |                                                                        
  from kubernetes import client                                                   
  from kubespawner.utils import get_k8s_model                                     
  from kubernetes.client.models import ( V1Volume, V1VolumeMount )                
  import json                                                                     
  def modify_pod_hook(spawner, pod):                                              
      try:                                                                        
          with open('/srv/jupyterhub/shared-mounts.json') as json_data:           
            sharedVols = json.load(json_data)                                     
            user=spawner.user.name                                                
            for mnt in sharedVols:                                                
               if user in sharedVols[mnt]:                                        
                 pod.spec.volumes.append(                                         
                   get_k8s_model(V1Volume,                                        
                    { 'name' : mnt,                                               
                      'hostPath': { 'path' : "/home/data/shared/" + mnt,          
                                    'type' : 'DirectoryOrCreate' }                
                    } )                                                           
                 )                                                                
                 # Note implicitly only 1 container...                            
                 pod.spec.containers[0].volume_mounts.append(                     
                   get_k8s_model(V1VolumeMount,                                   
                       { 'name' : mnt, 'mountPath' : '/shared/' + mnt } )         
                 )                                                                
      except Exception as e:                                                      
          spawner.log.info("Exception in shared-mounts" + str(e))                 
          pass                                                                    
      return pod                                                                  
  c.KubeSpawner.modify_pod_hook = modify_pod_hook
1 Like

On a related note, if you’re interested in juicing up the login page, that can also be done using another stanza in hub::extraConfig. The following image shows our login page (we use an image per class to avoid tightly coupling courses images)

The stanza to do this was:

  extraConfig:                                                                                                            
    labhub: |                                                                                                             
      c.KubeSpawner.cmd = ['jupyter-labhub']                                                                              
    kubespawn: |                                                                                                          
      c.KubeSpawner.profile_form_template = """                                                                           
        <script src="https://cdnjs.cloudflare.com/ajax/libs/popper.js/1.14.3/umd/popper.min.js" type="text/javascript" ch\
arset="utf-8"> </script>                                                                                                  
        <script>                                                                                                          
         $(function () { $('[data-toggle="tooltip"]').tooltip({container:'body'}) })                                      
        // JupyterHub 0.8 applied form-control indisciminately to all form elements.                                      
        // Can be removed once we stop supporting JupyterHub 0.8                                                          
        $(document).ready(function() {                                                                                    
            $('#kubespawner-profiles-list input[type="radio"]').removeClass('form-control');                              
        });                                                                                                               
        function picksub(val) {                                                                                           
          var pro=document.getElementsByName("profile");                                                                  
          for (i=0; i < pro.length; i++) { pro[i].checked = false }                                                       
          pro.checked = true;                                                                                             
          document.forms["spawn_form"].submit();                                                                          
        }                                                                                                                 
        </script>                                                                                                         
        <style>                                                                                                           
        /* The profile description should not be bold, even though it is inside the <label> tag */                        
        #kubespawner-profiles-list label p { font-weight: normal; }                                                       
        .image-button { height:100px; width:150px; margin:25px; white-space:wrap;                                         
                        line-height:75px; text-align:center; }                                                            
        </style>                                                                                                          
        <div class='form-group d-flex flex-wrap' id='kubespawner-profiles-list'>                                          
        <div class="btn-group btn-group-lg btn-group-toggle" data-toggle="buttons">                                       
        {% for profile in profile_list %}                                                                                 
        <label class="btn btn-primary image-button text-wrap" autocomplete="off"                                          
                onclick="picksub({{loop.index0}})"                                                                        
                {% if profile.description %}                                                                              
                    data-toggle="tooltip" data-placement="top"                                                            
                    trigger="hover"                                                                                       
                    title="{{ profile.description }}" {% endif %}                                                         
        >                                                                                                                 
                <input type='radio' name='profile' id='profile-item-{{ loop.index0 }}'                                    
                 value='{{ loop.index0 }}' {% if profile.default %}checked{% endif %} />                                  
                <strong>{{ profile.display_name }}</strong>                                                               
        </label>                                                                                                          
        {% endfor %}                                                                                                      
        </div> </div>                                                                                                     
        """
1 Like

Thanks for all these helpful tips.

I also got NFS working, in the end, description here.