Accessing a Google Storage Bucket mounted as a Persistent Volume

I’m new to both JupyterHub and Kubernetes in general, but so far I’ve successfully managed to follow the Zero to JupyterHub guide and get a cluster up and running on GKE (Autopilot). However, I’d like to be able to mount a GCS Storage Bucket as a shared extraVolume in my config and have hit a roadblock.

Following the instructions here: Access Cloud Storage buckets with the Cloud Storage FUSE CSI driver  |  Google Kubernetes Engine (GKE)  |  Google Cloud I have been able to mount the storage bucket as a PersistentVolume and create an associated PersistentVolumeClaim. I’ve also created the sidecar container for running the GCS FUSE CSI driver, and as far as I can tell (testing it outside of JupyterHub) it seems to be working correctly.

My problem comes when I try to add the extraVolumeMount to my config.yaml. Whenever I enable this option, the server spawner hangs and then fails on timeout with this error message:

[Warning] MountVolume.MountDevice failed for volume "gcs-fuse-csi-pv" : kubernetes.io/csi: attacher.MountDevice failed to create newCsiDriverClient: driver name gcsfuse.csi.storage.gke.io not found in the list of registered CSI drivers

Google searching brings up the following issue on the gcs-fuse-csi-driver
github: Unable to mount on initContainer · Issue #38 · GoogleCloudPlatform/gcs-fuse-csi-driver · GitHub

My best guess as to what is happening is that KubeSpawner is using an initContainer to mount volumes, which the gcs-fuse-csi-driver does not yet support. Is there a workaround for this?

Barring a work-around, what is the current best practice method for mounting a storage bucket?

My config.yaml looks like this:

hub:
  config:
    AzureAdOAuthenticator:
      client_id: <redacted>
      client_secret: <redacted>
      oauth_callback_url: <redacted>
      tenant_id: <redacted>
      allow_all: true
    JupyterHub:
      authenticator_class: azuread

singleuser:
  image:
    # You should replace the "latest" tag with a fixed version from:
    # https://hub.docker.com/r/jupyter/datascience-notebook/tags/
    # Inspect the Dockerfile at:
    # https://github.com/jupyter/docker-stacks/tree/HEAD/datascience-notebook/Dockerfile
    name: jupyter/datascience-notebook
    tag: latest
  # `cmd: null` allows the custom CMD of the Jupyter docker-stacks to be used
  # which performs further customization on startup.
  cpu:
    guarantee: 4
#    limit: 16
  memory:
    guarantee: 16G
#    limit: 64G
  cmd: null
  storage:
    homeMountPath: /home/{username}
    dynamic:
      storageClass: premium-rwo
    extraVolumes:
      - name: scg-datascience-shared
        persistentVolumeClaim:
          claimName: scg-datascience-shared
    extraVolumeMounts:
      - name: scg-datascience-shared
        mountPath: /home/shared
  extraFiles:
    # jupyter_notebook_config reference: https://jupyter-notebook.readthedocs.io/en/stable/config.html
    jupyter_notebook_config.json:
      mountPath: /etc/jupyter/jupyter_notebook_config.json
      # data is a YAML structure here but will be rendered to JSON file as our
      # file extension is ".json".
      data:
        MappingKernelManager:
          # cull_idle_timeout: timeout (in seconds) after which an idle kernel is
          # considered ready to be culled
          cull_idle_timeout: 3600 # default: 0

          # cull_interval: the interval (in seconds) on which to check for idle
          # kernels exceeding the cull timeout value
          cull_interval: 300 # default: 300

          # cull_connected: whether to consider culling kernels which have one
          # or more connections
          cull_connected: true # default: false

          # cull_busy: whether to consider culling kernels which are currently
          # busy running some code
          cull_busy: false # default: false
  serviceAccountName: scg-hub-spawner
  extraAnnotations:
    gke-gcsfuse/volumes: "true"
  cloudMetadata:
    blockWithIptables: false

scheduling:
  userScheduler:
    enabled: true
  podPriority:
    enabled: true
  userPlaceholder:
    enabled: true
    replicas: 1

cull:
  enabled: true
  timeout: 3600
  every: 300

KubeSpawner doesn’t use an initContainer for mounting storage. If you run
kubectl get pod <jupyter-podname> -o yaml
you can verify that.

Can you compare the output of that command with what you’d expect, based on your testing outside JupyterHub?

That’s good to know; probably means my error is somewhere in my config. I’ll try to dump the relevant settings.

Output of kubectl get pod while starting a jupyterhub user server:

NAME                              READY   STATUS    RESTARTS       AGE
continuous-image-puller-972b7     1/1     Running   0              24m
continuous-image-puller-tz5ls     1/1     Running   0              26h
gcs-fuse-csi-static-pvc           2/2     Running   0              3h24m
hub-55974b8bf-rwvff               1/1     Running   0              24m
jupyter-jason-20bates             0/2     Pending   0              8s
proxy-599477cc7c-445qk            1/1     Running   0              24m
user-placeholder-0                1/1     Running   0              26h
user-scheduler-5f9468589b-8v75j   1/1     Running   1 (143m ago)   3h39m
user-scheduler-5f9468589b-xkz5c   1/1     Running   0              3h37m

Output of kubectl get pod jupyter-jason-20bates --namespace=scg-datascience -o yaml from just after the driver error message appears on the spawner screen:

apiVersion: v1
kind: Pod
metadata:
  annotations:
    autopilot.gke.io/resource-adjustment: '{"input":{"containers":[{"limits":{"cpu":"250m","ephemeral-storage":"5Gi","memory":"256Mi"},"requests":{"cpu":"250m","ephemeral-storage":"5Gi","memory":"256Mi"},"name":"gke-gcsfuse-sidecar"},{"requests":{"cpu":"4","memory":"17179869184"},"name":"notebook"}]},"output":{"containers":[{"limits":{"cpu":"250m","ephemeral-storage":"5Gi","memory":"256Mi"},"requests":{"cpu":"250m","ephemeral-storage":"5Gi","memory":"256Mi"},"name":"gke-gcsfuse-sidecar"},{"limits":{"cpu":"4","ephemeral-storage":"1Gi","memory":"17179869184"},"requests":{"cpu":"4","ephemeral-storage":"1Gi","memory":"17179869184"},"name":"notebook"}]},"modified":true}'
    autopilot.gke.io/warden-version: 2.8.16
    gke-gcsfuse/volumes: "true"
    hub.jupyter.org/username: jason bates
  creationTimestamp: "2023-10-26T22:31:47Z"
  labels:
    app: jupyterhub
    chart: jupyterhub-3.1.0
    component: singleuser-server
    heritage: jupyterhub
    hub.jupyter.org/network-access-hub: "true"
    hub.jupyter.org/servername: ""
    hub.jupyter.org/username: jason-20bates
    release: scg-datascience-hub
  name: jupyter-jason-20bates
  namespace: scg-datascience
  resourceVersion: "30407279"
  uid: 3fce8e86-5ecd-45f8-95f7-1747dec089b4
spec:
  affinity:
    nodeAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - preference:
          matchExpressions:
          - key: hub.jupyter.org/node-purpose
            operator: In
            values:
            - user
        weight: 100
  containers:
  - args:
    - --v=5
    image: gke.gcr.io/gcs-fuse-csi-driver-sidecar-mounter:v0.1.6-gke.2@sha256:e77e4a4bf012f899079f8be6a4c4652500d5b25b23b3f79af6d5a215ab9292e3
    imagePullPolicy: IfNotPresent
    name: gke-gcsfuse-sidecar
    resources:
      limits:
        cpu: 250m
        ephemeral-storage: 5Gi
        memory: 256Mi
      requests:
        cpu: 250m
        ephemeral-storage: 5Gi
        memory: 256Mi
    securityContext:
      allowPrivilegeEscalation: false
      capabilities:
        drop:
        - ALL
      readOnlyRootFilesystem: true
      runAsGroup: 65534
      runAsNonRoot: true
      runAsUser: 65534
      seccompProfile:
        type: RuntimeDefault
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /gcsfuse-tmp
      name: gke-gcsfuse-tmp
    - mountPath: /gcsfuse-cache
      name: gke-gcsfuse-cache
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-7svjx
      readOnly: true
  - env:
    - name: CPU_GUARANTEE
      value: "4.0"
    - name: JPY_API_TOKEN
      value: bc85780b1c024df394ceb5b78da11326
    - name: JUPYTERHUB_ACTIVITY_URL
      value: http://hub:8081/hub/api/users/jason%20bates/activity
    - name: JUPYTERHUB_ADMIN_ACCESS
      value: "1"
    - name: JUPYTERHUB_API_TOKEN
      value: bc85780b1c024df394ceb5b78da11326
    - name: JUPYTERHUB_API_URL
      value: http://hub:8081/hub/api
    - name: JUPYTERHUB_BASE_URL
      value: /
    - name: JUPYTERHUB_CLIENT_ID
      value: jupyterhub-user-jason%20bates
    - name: JUPYTERHUB_HOST
    - name: JUPYTERHUB_OAUTH_ACCESS_SCOPES
      value: '["access:servers!server=jason bates/", "access:servers!user=jason bates"]'
    - name: JUPYTERHUB_OAUTH_CALLBACK_URL
      value: /user/jason%20bates/oauth_callback
    - name: JUPYTERHUB_OAUTH_CLIENT_ALLOWED_SCOPES
      value: '[]'
    - name: JUPYTERHUB_OAUTH_SCOPES
      value: '["access:servers!server=jason bates/", "access:servers!user=jason bates"]'
    - name: JUPYTERHUB_SERVER_NAME
    - name: JUPYTERHUB_SERVICE_PREFIX
      value: /user/jason%20bates/
    - name: JUPYTERHUB_SERVICE_URL
      value: http://0.0.0.0:8888/user/jason%20bates/
    - name: JUPYTERHUB_USER
      value: jason bates
    - name: JUPYTER_IMAGE
      value: jupyter/datascience-notebook:latest
    - name: JUPYTER_IMAGE_SPEC
      value: jupyter/datascience-notebook:latest
    - name: MEM_GUARANTEE
      value: "17179869184"
    image: jupyter/datascience-notebook:latest
    imagePullPolicy: IfNotPresent
    lifecycle: {}
    name: notebook
    ports:
    - containerPort: 8888
      name: notebook-port
      protocol: TCP
    resources:
      limits:
        cpu: "4"
        ephemeral-storage: 1Gi
        memory: "17179869184"
      requests:
        cpu: "4"
        ephemeral-storage: 1Gi
        memory: "17179869184"
    securityContext:
      allowPrivilegeEscalation: false
      capabilities:
        drop:
        - NET_RAW
      runAsUser: 1000
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /home/jason-20bates
      name: volume-jason-20bates
    - mountPath: /etc/jupyter/jupyter_notebook_config.json
      name: files
      subPath: jupyter_notebook_config.json
    - mountPath: /home/shared
      name: scg-datascience-shared
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-7svjx
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  nodeName: gk3-scg-datascience-cluster1-pool-3-e2aeda54-gpq2
  preemptionPolicy: PreemptLowerPriority
  priority: 0
  priorityClassName: scg-datascience-hub-default-priority
  restartPolicy: OnFailure
  schedulerName: scg-datascience-hub-user-scheduler
  securityContext:
    fsGroup: 100
    seccompProfile:
      type: RuntimeDefault
  serviceAccount: scg-hub-spawner
  serviceAccountName: scg-hub-spawner
  terminationGracePeriodSeconds: 30
  tolerations:
  - effect: NoSchedule
    key: hub.jupyter.org/dedicated
    operator: Equal
    value: user
  - effect: NoSchedule
    key: hub.jupyter.org_dedicated
    operator: Equal
    value: user
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  - effect: NoSchedule
    key: kubernetes.io/arch
    operator: Equal
    value: amd64
  volumes:
  - emptyDir: {}
    name: gke-gcsfuse-tmp
  - emptyDir: {}
    name: gke-gcsfuse-cache
  - name: volume-jason-20bates
    persistentVolumeClaim:
      claimName: claim-jason-20bates
  - name: files
    secret:
      defaultMode: 420
      items:
      - key: jupyter_notebook_config.json
        path: jupyter_notebook_config.json
      secretName: singleuser
  - name: scg-datascience-shared
    persistentVolumeClaim:
      claimName: scg-datascience-shared
  - name: kube-api-access-7svjx
    projected:
      defaultMode: 420
      sources:
      - serviceAccountToken:
          expirationSeconds: 3607
          path: token
      - configMap:
          items:
          - key: ca.crt
            path: ca.crt
          name: kube-root-ca.crt
      - downwardAPI:
          items:
          - fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
            path: namespace
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2023-10-26T22:33:09Z"
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2023-10-26T22:33:09Z"
    message: 'containers with unready status: [gke-gcsfuse-sidecar notebook]'
    reason: ContainersNotReady
    status: "False"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2023-10-26T22:33:09Z"
    message: 'containers with unready status: [gke-gcsfuse-sidecar notebook]'
    reason: ContainersNotReady
    status: "False"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2023-10-26T22:33:09Z"
    status: "True"
    type: PodScheduled
  containerStatuses:
  - image: gke.gcr.io/gcs-fuse-csi-driver-sidecar-mounter:v0.1.6-gke.2@sha256:e77e4a4bf012f899079f8be6a4c4652500d5b25b23b3f79af6d5a215ab9292e3
    imageID: ""
    lastState: {}
    name: gke-gcsfuse-sidecar
    ready: false
    restartCount: 0
    started: false
    state:
      waiting:
        reason: ContainerCreating
  - image: jupyter/datascience-notebook:latest
    imageID: ""
    lastState: {}
    name: notebook
    ready: false
    restartCount: 0
    started: false
    state:
      waiting:
        reason: ContainerCreating
  hostIP: 10.255.255.52
  phase: Pending
  qosClass: Guaranteed
  startTime: "2023-10-26T22:33:09Z"

And here’s the output from kubectl get pod gcs-fuse-csi-static-pvc --namespace=scg-datascience -o yaml:

apiVersion: v1
kind: Pod
metadata:
  annotations:
    autopilot.gke.io/resource-adjustment: '{"input":{"containers":[{"limits":{"cpu":"250m","ephemeral-storage":"5Gi","memory":"256Mi"},"requests":{"cpu":"250m","ephemeral-storage":"5Gi","memory":"256Mi"},"name":"gke-gcsfuse-sidecar"},{"name":"busybox"}]},"output":{"containers":[{"limits":{"cpu":"250m","ephemeral-storage":"5Gi","memory":"256Mi"},"requests":{"cpu":"250m","ephemeral-storage":"5Gi","memory":"256Mi"},"name":"gke-gcsfuse-sidecar"},{"limits":{"cpu":"500m","ephemeral-storage":"1Gi","memory":"2Gi"},"requests":{"cpu":"500m","ephemeral-storage":"1Gi","memory":"2Gi"},"name":"busybox"}]},"modified":true}'
    autopilot.gke.io/warden-version: 2.7.41
    gke-gcsfuse/volumes: "true"
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"v1","kind":"Pod","metadata":{"annotations":{"gke-gcsfuse/volumes":"true"},"name":"gcs-fuse-csi-static-pvc","namespace":"scg-datascience"},"spec":{"containers":[{"args":["infinity"],"command":["sleep"],"image":"busybox","name":"busybox","volumeMounts":[{"mountPath":"/data","name":"gcs-fuse-csi-static","readOnly":true}]}],"serviceAccountName":"scg-hub-spawner","volumes":[{"name":"gcs-fuse-csi-static","persistentVolumeClaim":{"claimName":"scg-datascience-shared"}}]}}
  creationTimestamp: "2023-10-26T18:58:17Z"
  name: gcs-fuse-csi-static-pvc
  namespace: scg-datascience
  resourceVersion: "30233527"
  uid: 0591c4d8-dc67-4dc2-ad54-96fab4773700
spec:
  containers:
  - args:
    - --v=5
    image: gke.gcr.io/gcs-fuse-csi-driver-sidecar-mounter:v0.1.4-gke.1@sha256:442969f1e565ba63ff22837ce7a530b6cbdb26330140b7f9e1dc23f53f1df335
    imagePullPolicy: IfNotPresent
    name: gke-gcsfuse-sidecar
    resources:
      limits:
        cpu: 250m
        ephemeral-storage: 5Gi
        memory: 256Mi
      requests:
        cpu: 250m
        ephemeral-storage: 5Gi
        memory: 256Mi
    securityContext:
      allowPrivilegeEscalation: false
      capabilities:
        drop:
        - all
      readOnlyRootFilesystem: true
      runAsGroup: 65534
      runAsNonRoot: true
      runAsUser: 65534
      seccompProfile:
        type: RuntimeDefault
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /gcsfuse-tmp
      name: gke-gcsfuse-tmp
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-w4k9r
      readOnly: true
  - args:
    - infinity
    command:
    - sleep
    image: busybox
    imagePullPolicy: Always
    name: busybox
    resources:
      limits:
        cpu: 500m
        ephemeral-storage: 1Gi
        memory: 2Gi
      requests:
        cpu: 500m
        ephemeral-storage: 1Gi
        memory: 2Gi
    securityContext:
      capabilities:
        drop:
        - NET_RAW
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /data
      name: gcs-fuse-csi-static
      readOnly: true
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-w4k9r
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  nodeName: gk3-scg-datascience-cluster1-pool-3-9839970e-tsfk
  preemptionPolicy: PreemptLowerPriority
  priority: 0
  restartPolicy: Always
  schedulerName: gke.io/optimize-utilization-scheduler
  securityContext:
    seccompProfile:
      type: RuntimeDefault
  serviceAccount: scg-hub-spawner
  serviceAccountName: scg-hub-spawner
  terminationGracePeriodSeconds: 30
  tolerations:
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  - effect: NoSchedule
    key: kubernetes.io/arch
    operator: Equal
    value: amd64
  volumes:
  - emptyDir: {}
    name: gke-gcsfuse-tmp
  - name: gcs-fuse-csi-static
    persistentVolumeClaim:
      claimName: scg-datascience-shared
  - name: kube-api-access-w4k9r
    projected:
      defaultMode: 420
      sources:
      - serviceAccountToken:
          expirationSeconds: 3607
          path: token
      - configMap:
          items:
          - key: ca.crt
            path: ca.crt
          name: kube-root-ca.crt
      - downwardAPI:
          items:
          - fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
            path: namespace
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2023-10-26T18:58:17Z"
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2023-10-26T18:58:21Z"
    status: "True"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2023-10-26T18:58:21Z"
    status: "True"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2023-10-26T18:58:17Z"
    status: "True"
    type: PodScheduled
  containerStatuses:
  - containerID: containerd://0a47ea8f4b62743c8f353914b47b1165d03ae12271440b4c5aac2bc1a33bad5f
    image: docker.io/library/busybox:latest
    imageID: docker.io/library/busybox@sha256:3fbc632167424a6d997e74f52b878d7cc478225cffac6bc977eedfe51c7f4e79
    lastState: {}
    name: busybox
    ready: true
    restartCount: 0
    started: true
    state:
      running:
        startedAt: "2023-10-26T18:58:20Z"
  - containerID: containerd://99393c0c097d937a8fea5548aa667956e15bfc34042ef9da640496c5b36466af
    image: sha256:ab595553446669662c0347f4aede77ceb34928db257d77db11971092fca21f2a
    imageID: gke.gcr.io/gcs-fuse-csi-driver-sidecar-mounter@sha256:442969f1e565ba63ff22837ce7a530b6cbdb26330140b7f9e1dc23f53f1df335
    lastState: {}
    name: gke-gcsfuse-sidecar
    ready: true
    restartCount: 0
    started: true
    state:
      running:
        startedAt: "2023-10-26T18:58:18Z"
  hostIP: 10.255.255.30
  phase: Running
  podIP: 10.24.1.153
  podIPs:
  - ip: 10.24.1.153
  qosClass: Guaranteed
  startTime: "2023-10-26T18:58:17Z"