JupyterHub hub-db-dir PV Question

cbell · September 16, 2019, 5:43pm

Hi there!

New-er administrator of a JupyterHub cluster here. We are running a few different clusters for JupyterHub, one which is in AWS, using Amazon’s hosted Kubernetes service. There’s an issue that we’ve begun to hit up against with nodes that auto scale. The hub-db-dir persistent volume that is attached to the hub container will get created, and then not be able to spin up on any nodes since they may be in a different availability zone. Here’s a GH forum where this is discussed:

Has anyone experimented with using a more persistent storage option? Having this pointed to an NFS share like user storage?

akaszynski · October 17, 2019, 8:24pm

Hello,

Had the same desire but for a different reason: Azure disks are slow to provision and I got tired of waiting a minute or two for it to mount each time I upgraded the cluster. Turns out the trick is to create a persistent volume, persistent volume claim, and override the hub-db-dir value in config.yaml.

hub:
  extraVolumes:
    - name: hub-db-dir
      persistentVolumeClaim:
        claimName: userdata-pvc

See https://github.com/jupyterhub/zero-to-jupyterhub-k8s/issues/421#issuecomment-542981479 for creating the persistant volume and persistant volume claims.

cbell · December 4, 2019, 6:17pm

Thanks for the reply @akaszynski! Great to see that I’m not trying to reinvent the wheel. Are you deploying this on Azure using Helm? Looking at your post on Github, I have something similar setup, however I’m still having issues with the deployment.

I have two config files that I pass to run the deployment. First is the PVC for Kubernetes:

apiVersion: v1 
kind: PersistentVolume 
metadata: 
  name: nfs-deploymentname
spec: 
  capacity: 
    storage: 1Gi 
  accessModes: 
    - ReadWriteMany 
  nfs: 
    server: fs-*.efs.*.amazonaws.com
    path: "/" 
 
--- 
kind: PersistentVolumeClaim 
apiVersion: v1 
metadata: 
  name: nfs-deploymentname-pvc
spec: 
  accessModes: 
    - ReadWriteMany 
  storageClassName: "" 
  resources: 
    requests: 
      storage: 1Gi

Then I have the values.yaml I pass to helm for the config:

hub:  
  extraVolumes:
    - name: hub-db-dir
      persistentVolumeClaim:
        claimName: nfs-spores-pvc 
  resources:
    requests:
      cpu: 0.1
      memory: 256Mi

I’ve put just the hub portion to keep things tidy. When I go to deploy this, Helm has an issue since the hub-db-dir is already being specified in the jupyterhub/jupyterhub helm chart. Did you happen to overcome this? Or did you deploy in a different fashion to circumvent this?

akaszynski · December 10, 2019, 1:01am

Something must have changed, because when I went to create a second cluster using an identical config.yaml, helm complained about hub-db-dir as well.

cbell · December 13, 2019, 6:12pm

@akaszynski I was able to get around this, after some troubleshooting throughout the day. The workflow I found is that you need to do the initial deployment, without the hub-db-dir being changed at all, and then you can do an upgrade to override the change. So for me it looked like this:

Deploy basic values chart
Update pvc.yaml with PV and PVC for shared data, and PV and PVC for hub-db-dir
Mount hub-db-dir on an ec2 instance and create directory. Set permissions to 777 (haven’t had a chance to narrow down what the least permissions needed are yet)
Run a helm update --install deployment jupyterhub/jupyterhub --namespace namespace -f values.yaml --debug and it should just move it over to the update pvc.

It would be amazing if we didn’t have to deploy, then redeploy in the future - but it’s such a small workflow addition that it outweighs the problems we had before.

Hope that helps!

akaszynski · December 13, 2019, 9:15pm

That makes sense now. On my second cluster deployment I ran into the issue, but when developing it on the first cluster I didn’t run into the issue as I was updating rather than installing.

Good find, and thanks for your help!

Topic		Replies	Views
Trouble configuring JupyterHub with Kubernetes on my local cluster Zero to JupyterHub on Kubernetes	1	813	April 16, 2020
Problem using Kubernetes for JupyterHub on a local infrastructure JupyterHub	9	3574	April 16, 2019
Hub pod stuck at pending.How to set persistent volumes and storage to pvc JupyterHub how-to , help-wanted	0	874	March 8, 2021
Dynamic Provisioning of PV on local Cluster JupyterHub	1	393	January 10, 2019
JupyterHub User Home Data on NFS mount JupyterHub community , communication , jupyterhub , how-to , help-wanted	1	2532	May 29, 2020

JupyterHub hub-db-dir PV Question

Related topics