JupyterHub hub-db-dir PV Question

Hi there!

New-er administrator of a JupyterHub cluster here. We are running a few different clusters for JupyterHub, one which is in AWS, using Amazon’s hosted Kubernetes service. There’s an issue that we’ve begun to hit up against with nodes that auto scale. The hub-db-dir persistent volume that is attached to the hub container will get created, and then not be able to spin up on any nodes since they may be in a different availability zone. Here’s a GH forum where this is discussed:

Has anyone experimented with using a more persistent storage option? Having this pointed to an NFS share like user storage?

Hello,

Had the same desire but for a different reason: Azure disks are slow to provision and I got tired of waiting a minute or two for it to mount each time I upgraded the cluster. Turns out the trick is to create a persistent volume, persistent volume claim, and override the hub-db-dir value in config.yaml.

hub:
  extraVolumes:
    - name: hub-db-dir
      persistentVolumeClaim:
        claimName: userdata-pvc 

See https://github.com/jupyterhub/zero-to-jupyterhub-k8s/issues/421#issuecomment-542981479 for creating the persistant volume and persistant volume claims.

Thanks for the reply @akaszynski! Great to see that I’m not trying to reinvent the wheel. Are you deploying this on Azure using Helm? Looking at your post on Github, I have something similar setup, however I’m still having issues with the deployment.

I have two config files that I pass to run the deployment. First is the PVC for Kubernetes:

apiVersion: v1 
kind: PersistentVolume 
metadata: 
  name: nfs-deploymentname
spec: 
  capacity: 
    storage: 1Gi 
  accessModes: 
    - ReadWriteMany 
  nfs: 
    server: fs-*.efs.*.amazonaws.com
    path: "/" 
 
--- 
kind: PersistentVolumeClaim 
apiVersion: v1 
metadata: 
  name: nfs-deploymentname-pvc
spec: 
  accessModes: 
    - ReadWriteMany 
  storageClassName: "" 
  resources: 
    requests: 
      storage: 1Gi 

Then I have the values.yaml I pass to helm for the config:

hub:  
  extraVolumes:
    - name: hub-db-dir
      persistentVolumeClaim:
        claimName: nfs-spores-pvc 
  resources:
    requests:
      cpu: 0.1
      memory: 256Mi

I’ve put just the hub portion to keep things tidy. When I go to deploy this, Helm has an issue since the hub-db-dir is already being specified in the jupyterhub/jupyterhub helm chart. Did you happen to overcome this? Or did you deploy in a different fashion to circumvent this?

Something must have changed, because when I went to create a second cluster using an identical config.yaml, helm complained about hub-db-dir as well.