It’s been quite a while since this was posted and a lot of progress has been made since. The addition of the option hub.config.GitHubOAuthenticator.populate_teams_in_auth_state has appeared since, which has helped a lot in getting the team and organization information of a user.
Here is a thorough example of how to create and mount team based persistent volumes, so that users belonging to the same GitHub team can write simultaneously to a team-specific shared hard drive even if they are on distinct physical nodes in the cloud. This approach is suitable if you are using a cluster with the cluster auto scaler enabled resulting in users spawning on distinct physical machines.
First, a service that provides networked file storage must be created, e.g. an NFS server. Kubernetes has made their own example of how to setup an NFS server here. Simply put you just need 3 files to create the NFS server:
(File1) Here is the single hard drive that the NFS server uses:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: nfs-pvc-exports
namespace: jhub # Use your own namespace
spec:
accessModes: [ "ReadWriteOnce" ] # NFS handles concurrent writes and reads
resources:
requests:
storage: 200Gi # This is the huge disk, determine the size as needed
(File2) Here is the NFS server itself:
apiVersion: apps/v1
kind: Deployment
metadata:
name: nfs-server
namespace: jhub
spec:
replicas: 1
selector:
matchLabels:
role: nfs-server
template:
metadata:
labels:
role: nfs-server
spec:
containers:
- name: nfs-server
image: k8s.gcr.io/volume-nfs:0.8
ports:
- name: nfs
containerPort: 2049
- name: mountd
containerPort: 20048
- name: rpcbind
containerPort: 111
securityContext:
privileged: true
volumeMounts:
- name: export
mountPath: /exports
volumes:
- name: export
persistentVolumeClaim:
claimName: nfs-pvc-exports
(File3) Here is the service that exposes the ports of the NFS server to all other pods in the cluster:
kind: Service
apiVersion: v1
metadata:
name: nfs-server
namespace: jhub
spec:
ports:
- name: nfs
port: 2049
- name: mountd
port: 20048
- name: rpcbind
port: 111
selector:
role: nfs-server
Now just kubectl apply these 3 files, and you have an NFS server up and running in your cluster. Now to get the organization and team names of a user logging in, make sure to have the following content in your hub’s config.yaml file:
hub:
config:
JupyterHub:
authenticator_class: github
GitHubOAuthenticator:
# If you don't know what the next 3 lines are, read this link from the z2jh official guide:
# https://zero-to-jupyterhub.readthedocs.io/en/latest/administrator/authentication.html#github
client_id: <your-client-id>
client_secret: <your-client-secret>
oauth_callback_url: <https://your-jupyterhub-domain/hub/oauth_callback>
enable_auth_state: true # This enables us to store user information.
populate_teams_in_auth_state: true # This is what saves the team information of a user.
allowed_organizations:
- OrgA:TeamAlpha
- OrgA:TeamBeta
- OrgA:TeamGamma
scope:
- read:org # Must be enabled or the user's team information is un-retrievable.
The above code of course implies that you have setup a Github OAuth App, but this is really simple to do and only takes 5-10min. If in doubt just follow the official z2jh guide as mentioned in a comment above.
Now when users are logging in, we got their team and organization information, so we can mount volumes that are only specific to those teams. To create a persistent volume specific to that team the following template yaml file will be used org-team-template.yaml:
apiVersion: v1
kind: PersistentVolume
metadata:
name: ${ORG_TEAM_NAME}-pv
spec:
capacity:
storage: 1Gi
accessModes:
- ReadWriteMany
nfs:
server: nfs-server.jhub.svc.cluster.local
path: "/${ORG_TEAM_NAME}"
mountOptions:
- nfsvers=4.2
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: ${ORG_TEAM_NAME}-pvc
namespace: jhub
spec:
accessModes:
- ReadWriteMany # Users can use this volume from distinct physical machines
storageClassName: ""
resources:
requests:
storage: 1Gi
volumeName: ${ORG_TEAM_NAME}-pv
Now comes the only manual labor part of this process, which is creating the team volumes.
Human Labor
In a file copy your allowed org and team names and let’s name the file allowed-teams.txt:
OrgA:TeamAlpha
OrgA:TeamBeta
OrgA:TeamGamma
Then get the name of your nfs-server pod with the command: kubectl get pods -n <namespace>
. The nfs-server has a semi-random name like nfs-server-asd38-alks3d
.
Now we just need to loop over the team names in the allowed-teams.txt and create a persistentvolume for each team, and create a corresponding folder with the name of the team on the NFS server. This loop is achieved in the following script, which reads the team names line by line from the allowed-teams.txt:
#!/bin/bash
while read LINE; do
ORG_TEAM_NAME=$LINE
envsubst < org-team-template.yaml | kubectl apply -f -
kubectl exec <nfs-server-pod-name> -n <your-namespace> -- sh -c "mkdir /exports/$ORG_TEAM_NAME && chmod -R 777 /exports/$ORG_TEAM_NAME"
done <allowed-teams.txt
Now a folder exists on the NFS server with the name of the team and a persistent volume exists, now the only part there is left to do, is to add the following code to the config file:
hub:
extraConfig:
# A post was made about "deeply specializing the KubeSpawner" and from here an example of overriding the start function was given:
# Discourse post: https://discourse.jupyter.org/t/advanced-z2jh-deeply-customizing-the-spawner/8432
# Github link: github.com/berkeley-dsep-infra/datahub/blob/21e4a45c9f694578ec297c2947a0537f3bdcaa5b/hub/values.yaml#L296
00-custom_spawner.py: |
from kubespawner import KubeSpawner
from tornado import gen
class CustomSpawner(KubeSpawner):
@gen.coroutine # This is actually old, somehow one should be able to use async functions.
def start(self):
auth_state = yield self.user.get_auth_state()
name_index = 0
for team in auth_state['teams']:
nfs_pv_name = team['organization']['login'] + '.' + team['name'] + '-pv'
nfs_pvc_name = nfs_pv_name + 'c'
self.volumes += [{'name' : str(name_index), 'persistentVolumeClaim' : {'claimName' : nfs_pvc_name}}]
self.volume_mounts += [{'mountPath' : '/home/' + nfs_pv_name[:-3], 'name' : str(name_index)}]
name_index += 1
return (yield super().start())
c.JupyterHub.spawner_class = CustomSpawner
The for loop should read “for each team the user belongs to, mount the corresponding team volume”, and this code will run anytime a user tries to log in. Done.
Finally a question for the Z2JH community and maintainers: could something like this configuration be added to the helm chart, so that it can be easily enabled in the config file?