EFS Configuration in dynamic mode with JupyterHub

Hello everyone, I want to make the following question. I am integrating with my team using Zero to JupyterHub on Kubernetes (k8s) within Amazon Elastic Kubernetes Service (EKS) from AWS . We are encountering issues with storage configuration using Amazon Elastic File System (EFS) in dynamic mode. Currently, we are conducting tests with 1000 concurrent users , but upon exceeding this limit, we encounter the following error:

EFS actually has a limit with access point (limit in 1000)

This is my singleuser storage configuration:

singleuser:
  storage:
    type: dynamic
    dynamic:
      storageClass: sc-jupyterhub
      pvcNameTemplate: efs-pvc-{username}
      volumeNameTemplate: efs-pv-{username}
      storageAccessModes: ["ReadWriteOnce"]

This is my storageclass configuration:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: sc-jupyterhub
provisioner: efs.csi.aws.com
parameters:
  provisioningMode: efs-ap
  fileSystemId: "---"
  directoryPerms: 700

And this is my efs claim configuration:

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: #{PersistentVolumeClaimName}#
spec:
  storageClassName: sc-jupyterhub
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 2Gi

Any suggestions or recommendations we should review?

Z2JH makes use of the Kubernetes API to create resources such as pods, storage, etc, which means it’s limited to whatever the underlying infrastructure supports.

Is using a single EFS system and mounting subfolders an option?

1 Like

Thanks for your answer, yes, it is an option to mount subfolders, however, I dont have the documentation for my team to make this change, do you have any suggestion?

I think Setting up EFS storage on AWS — Zero to JupyterHub with Kubernetes documentation contains all the information you need. What’s missing?

EFS actually has a limit with access point (limit in 1000)

Oh, so you created PVCs per user, mapping to “access points” and then ran into this limit.

At 2i2c.org, our solution is to use a single PVC per jupyterhub installation, and let users volumeMount include a subPath specification with their username. That way, i think you wont have per-user access points scaling issues at least.

See configuration for this by looking at infra at GitHub - 2i2c-org/infrastructure: Infrastructure for configuring and deploying our community JupyterHubs., look under helm-charts/basehub/values.yaml and the key nfs, combined with for example config under config/clusters/2i2c-aws-us

1 Like