JupyterHub hub-db-dir PV Question

Hi there!

New-er administrator of a JupyterHub cluster here. We are running a few different clusters for JupyterHub, one which is in AWS, using Amazon’s hosted Kubernetes service. There’s an issue that we’ve begun to hit up against with nodes that auto scale. The hub-db-dir persistent volume that is attached to the hub container will get created, and then not be able to spin up on any nodes since they may be in a different availability zone. Here’s a GH forum where this is discussed:

Has anyone experimented with using a more persistent storage option? Having this pointed to an NFS share like user storage?

Hello,

Had the same desire but for a different reason: Azure disks are slow to provision and I got tired of waiting a minute or two for it to mount each time I upgraded the cluster. Turns out the trick is to create a persistent volume, persistent volume claim, and override the hub-db-dir value in config.yaml.

hub:
  extraVolumes:
    - name: hub-db-dir
      persistentVolumeClaim:
        claimName: userdata-pvc 

See https://github.com/jupyterhub/zero-to-jupyterhub-k8s/issues/421#issuecomment-542981479 for creating the persistant volume and persistant volume claims.

Thanks for the reply @akaszynski! Great to see that I’m not trying to reinvent the wheel. Are you deploying this on Azure using Helm? Looking at your post on Github, I have something similar setup, however I’m still having issues with the deployment.

I have two config files that I pass to run the deployment. First is the PVC for Kubernetes:

apiVersion: v1 
kind: PersistentVolume 
metadata: 
  name: nfs-deploymentname
spec: 
  capacity: 
    storage: 1Gi 
  accessModes: 
    - ReadWriteMany 
  nfs: 
    server: fs-*.efs.*.amazonaws.com
    path: "/" 
 
--- 
kind: PersistentVolumeClaim 
apiVersion: v1 
metadata: 
  name: nfs-deploymentname-pvc
spec: 
  accessModes: 
    - ReadWriteMany 
  storageClassName: "" 
  resources: 
    requests: 
      storage: 1Gi 

Then I have the values.yaml I pass to helm for the config:

hub:  
  extraVolumes:
    - name: hub-db-dir
      persistentVolumeClaim:
        claimName: nfs-spores-pvc 
  resources:
    requests:
      cpu: 0.1
      memory: 256Mi

I’ve put just the hub portion to keep things tidy. When I go to deploy this, Helm has an issue since the hub-db-dir is already being specified in the jupyterhub/jupyterhub helm chart. Did you happen to overcome this? Or did you deploy in a different fashion to circumvent this?

Something must have changed, because when I went to create a second cluster using an identical config.yaml, helm complained about hub-db-dir as well.

@akaszynski I was able to get around this, after some troubleshooting throughout the day. The workflow I found is that you need to do the initial deployment, without the hub-db-dir being changed at all, and then you can do an upgrade to override the change. So for me it looked like this:

  1. Deploy basic values chart
  2. Update pvc.yaml with PV and PVC for shared data, and PV and PVC for hub-db-dir
  3. Mount hub-db-dir on an ec2 instance and create directory. Set permissions to 777 (haven’t had a chance to narrow down what the least permissions needed are yet)
  4. Run a helm update --install deployment jupyterhub/jupyterhub --namespace namespace -f values.yaml --debug and it should just move it over to the update pvc.

It would be amazing if we didn’t have to deploy, then redeploy in the future - but it’s such a small workflow addition that it outweighs the problems we had before.

Hope that helps!

1 Like

That makes sense now. On my second cluster deployment I ran into the issue, but when developing it on the first cluster I didn’t run into the issue as I was updating rather than installing.

Good find, and thanks for your help!