Issue with PVC Creation After JupyterHub Helm Chart Upgrade

I recently updated the JupyterHub Helm chart on our Dev environment from version 3.3.7 to 4.1.0. However, after the upgrade, we encountered an issue where, when an existing user tries to launch a instance, a new Persistent Volume Claim (PVC) is created with the same name as the Pod, instead of using the existing PVC.

For example, the PVC for an existing user would be something like claim-user-40test-2ecom, but after the upgrade, the new PVC is created with a name like claim-user-test-com---ca189761. The Pod name also appears as jupyter-user-test-com---ca189761.

Interestingly, when I tested the same Helm chart upgrade on our test environment, the behavior was different. In the test environment, when an existing user launches a new instance after the upgrade, it successfully uses the existing PVC, as expected.

One key difference between the Dev and test environments is that the Dev environment was previously upgraded from 3.3.7 to 4.0.0 back in November. However, due to the same issue, we had to revert the changes and continue using 3.3.7. Now that I’ve updated the Dev environment directly to 4.1.0, the issue persists, whereas the test environment (with a fresh upgrade to 4.1.0) works as expected.

Could anyone provide guidance on why this discrepancy is happening, and how I can ensure that the existing PVC is used correctly on the Dev environment after the upgrade?

Thanks in advance for your help!

Is there a chance that back when the dev environment was upgraded, a new pvc was created for this user/server and then after the downgrade the new pvc was not deleted?

For a little background, the problem stems from the fact that:

  1. the default pvc naming scheme was changed, and
  2. the previous version of kubespawner did not persist the pvc name

So in the upgrade, kubespawner has to guess whether to use the old name or the new name. The way the guess works is basically:

  1. if pvc name is persisted (i.e. this is not the first launch since the upgrade), use that, no guessing required, this problem shouldn’t happen again in the future
  2. if no pvc name is persisted (i.e. first launch after upgrade) and a pvc with the new name doesn’t exist, check if the old pvc name exists, and use that
  3. persist whatever we got, so no more guessing after this

So one way you might get a different result is if the previous upgrade created the new pvc, then downgraded to get the old name back, then upgraded again would pick up the new pvc again created during the last upgrade if it hadn’t been deleted after the downgrade. Deleting the unused new pvc should have fixed it. But now that you’ve upgraded, it helpfully remembers the pvc name it found. It’s a bit tricky

I don’t know how many users you have who haven’t launched yet in the upgraded dev environment, but if you can check:

  1. whether the old pvc name exists, and
  2. whether the new pvc name exists

at upgrade time, and relate that to what pvc gets mounted.

I also wrote this script to try to help remedy the situation. I haven’t found the best way to fix it reliably and robustly for all the ways people might configure JupyterHub.

2 Likes

Thanks so much for your detailed explanation @minrk ! I wanted to provide some more context to clarify the situation further and see if we can pinpoint what might be going wrong in the Dev environment.

Context:
After upgrading our Dev environment to JupyterHub 4.0.0 in November 2024, about 10 users tried launching JupyterHub instances. This resulted in the creation of new Persistent Volume Claims (PVCs) instead of using the existing ones. Following the upgrade, we had to delete the newly created PVCs after downgrading back to 3.3.7. Once the downgrade was completed, users were able to use their old PVCs without any issues.

During the downgrade process, we had to delete the old JupyterHub database and create a new one, as downgrades are not allowed due to the database schema changes between versions. After the downgrade, I can confirm that no new PVCs are left in the Dev environment.

The Issue:
I recently upgraded the Dev environment to JupyterHub 4.0.1 (last week). I specifically asked one user to launch a JupyterHub instance who had never launched JupyterHub on Dev after the November 4.0.0 upgrade. Unfortunately, the issue persists: a new PVC is being created instead of using the existing PVC.

What I Expected:
Based on the behavior described in the JupyterHub chart 4.0.1, when an existing user tries to launch a JupyterHub instance, they should be able to use their existing PVC. For a new user, a new PVC should be created following the updated naming scheme.

Confusion:
I am unsure why the behavior in the test environment is different. In the test environment, this issue does not occur, and users are able to use their existing PVCs after upgrading. The problem seems specific to the Dev environment, even though we followed the same steps.

Can you diff your config between test and dev? Maybe there’s some pvc or naming-related setting that’s set in one but not the other, causing the difference. Especially useful to know if e.g. pvc_name_template is set and to what.

Thank you @minrk, I couldn’t find any significant differences in the configurations between the test and dev environments. The only notable distinction is that in the dev environment, there are some extra volumes attached to the single user instances, whereas in the test environment, there is only a single user volume. Do you think this could be relevant, as the additional volumes in dev might be influencing the PVC creation behavior differently from the test environment. To me, it doesn’t seem like it would be the cause.

I don’t think so, but if you share what your volume config looks like, it might help. Any volume-related config and/or singleuser.storage config might be relevant.

Hello @minrk , here’s my volume and singleuser.storage configuration for reference:

default_storage = {
              'volumes': [
                {
                  'name' : 'volume-{username}',
                  'persistentVolumeClaim' : {
                    'claimName' : 'claim-{username}'
                  }
                }
              ],
              'volume_mounts': [
                {
                  'mountPath' : '/home/jovyan',
                  'name' : 'volume-{username}'
                }
              ]
            }


    singleuser:
      storage:
        homeMountPath: /home/jovyan
        dynamic:
          storageClass: encrypted-gp2
          pvcNameTemplate: claim-{username}
          volumeNameTemplate: volume-{username}

Can you provide more context on the default_storage = config and where/how that is applied? So these are the two different configs for the two different deployments?

The legacy template handling would not be applied to the default_storage = ... version, assuming that is setting volumes and volume_mounts directly on KubeSpawner, not the template variables. I think if you change

claimName: 'claim-{username}'

to

claimName: '{pvc_name}'

it should resolve to the correct pvc name even when the template changes. The reason being the legacy detection is applied specially to pvc_name, not to all templates everywhere.

Hello @minrk, I’m using the same configuration for both Development and testing. Here is the configuration I’m using:

## Helm Charts: https://github.com/jupyterhub/helm-chart
# Source Repository: https://github.com/jupyterhub/jupyterhub
# https://github.com/jupyterhub/zero-to-jupyterhub-k8s/blob/master/jupyterhub/values.yaml
apiVersion: helm.toolkit.fluxcd.io/v2beta1
kind: HelmRelease
metadata:
  name: jupyterhub
  namespace: sandbox
spec:
  interval: 5m
  releaseName: jupyterhub
  chart:
    spec:
      chart: jupyterhub
      version: 4.1.0
      sourceRef:
        kind: HelmRepository
        name: jupyterhub-repository
  valuesFrom:
    - kind: Secret
      name: jupyterhub
      valuesKey: values.yaml
      optional: false
  values:
    # hub relates to the hub pod, responsible for running JupyterHub its configured Authenticator class KubeSpawner,
    # and its configured Proxy class ConfigurableHTTPProxy
    hub:
      #cookieSecret - Injected by Flux
      image:
        name: k8s-hub
        tag: 4.1.0
      resources:
        requests:
          cpu: 500m # 0m - 1000m
          memory: 2Gi # 200Mi - 4Gi
      pdb:
        enabled: false
        minAvailable: 1
      # Injected by Flux - Authentication and extraEnv
      networkPolicy:
        enabled: false
      authenticatePrometheus: false # disable authentication for Prometheus endpoint
      initContainers:
        - name: git-clone-templates
          image: alpine/git:latest
          args:
            - clone
            - --single-branch
            - --branch=main
            - --depth=1
            - --
            - https://github.com/earth/sandbox-templates.git
            - /etc/jupyterhub/custom
          securityContext:
            runAsUser: 1000
          volumeMounts:
            - name: custom-templates
              mountPath: /etc/jupyterhub/custom
      extraVolumes:
        - name: custom-templates
          emptyDir: {}
      extraVolumeMounts:
        - name: custom-templates
          mountPath: /etc/jupyterhub/custom
      templatePaths: ['/etc/jupyterhub/custom/templates']
      # can be override - env specific
      templateVars: {}
      config:
        KubeSpawner:
          delete_pvc: false
      # can be override - env specific
      extraConfig: {}
    # proxy relates to the proxy pod, the proxy-public service, and the autohttps pod and proxy-http service.
    proxy:
      #secretToken - Injected by Flux
      chp:
        image:
          name: jupyterhub/configurable-http-proxy
          tag: 4.6.3
        resources:
          requests:
            cpu: 500m # 0m - 1000m
            memory: 256Mi # 100Mi - 600Mi
        networkPolicy:
          enabled: false
        pdb:
          enabled: false
          minAvailable: 1
      traefik:
        image:
          name: traefik
          tag: v2.4.11
        resources:
          requests:
            cpu: 500m # 0m - 1000m
            memory: 512Mi # 100Mi - 1.1Gi
        networkPolicy:
          enabled: false
        pdb:
          enabled: false
          minAvailable: 1
      service:
        type: ClusterIP
      https:
        enabled: true
        type: offload

    #ingress - Injected by Flux

    scheduling:
      userScheduler:
        enabled: true
        resources:
          requests:
            cpu: 30m # 8m - 45m
            memory: 512Mi # 100Mi - 1.5Gi
      podPriority:
        enabled: true
      userPlaceholder:
        enabled: false
      corePods:
        nodeAffinity:
          matchNodePurpose: require
      userPods:
        nodeAffinity:
          matchNodePurpose: require

    # prePuller relates to the hook|continuous-image-puller DaemonsSets
    prePuller:
      continuous:
        enabled: false
      # hook relates to the hook-image-awaiter Job and hook-image-puller DaemonSet
      hook:
        enabled: false
        pullOnlyOnChanges: true
        image:
          name: jupyterhub/k8s-image-awaiter
          tag: 4.1.0

    # cull relates to the jupyterhub-idle-culler service, responsible for evicting inactive singleuser pods.
    # for jupyterhub-idle-culler as documented here:
    # https://github.com/jupyterhub/jupyterhub-idle-culler#as-a-standalone-script
    cull:
      enabled: true
      users: true               # --cull-users
      removeNamedServers: false # --remove-named-servers
      timeout: 10800            # --timeout - 3 hours
      every: 600                # --cull-every - 10 mins
      maxAge: 0                 # --max-age

    # singleuser relates to the configuration of KubeSpawner which runs in the hub pod,
    # and its spawning of user pods such as jupyter-myusername.
    singleuser:
      networkTools:
        image:
          name: jupyterhub/k8s-network-tools
          tag: 4.1.0
      networkPolicy:
        enabled: false
      nodeSelector:
        nodesize: 'L'
      defaultUrl: "/lab"
      memory:
        limit: 15G
        guarantee: 14G
      cpu:
        limit: 1.7
        guarantee: 1.5
      cloudMetadata:
        # block set to true will append a privileged initContainer using the
        # iptables to block the sensitive metadata server at the provided ip.
        blockWithIptables: true
        ip: 169.256.169.255
      image:
        name: earth/sandbox
        tag: 0.0.9
      startTimeout: 600
#      Injected by Flux - using secrets
#      extraEnv:
#        DB_HOSTNAME: ${db_hostname}
#        DB_USERNAME: ${db_username}
#        DB_PASSWORD: ${db_password}
#        DB_DATABASE: ${db_name}
#        AWS_DEFAULT_REGION: ${region}
#        AWS_NO_SIGN_REQUEST: "YES"
      # can be override - env specific
      storage:
        homeMountPath: /home/jovyan
        dynamic:
          storageClass: encrypted-gp2
          pvcNameTemplate: claim-{username}
          volumeNameTemplate: volume-{username}
        extraVolumes:
          - name: notebooks
            emptyDir: {}
          - name: jupyter-notebook-config
            configMap:
              name: jupyter-notebook-config
        extraVolumeMounts:
          - name: notebooks
            mountPath: /notebooks
          - name: jupyter-notebook-config
            mountPath: /etc/jupyter/jupyter_notebook_config.py
            subPath: jupyter_notebook_config.py

and here’s my extra config section:

apiVersion: helm.toolkit.fluxcd.io/v2beta1
kind: HelmRelease
metadata:
  name: jupyterhub
  namespace: sandbox
spec:
  values:
    hub:
      templateVars:

      extraConfig:
        spawner: |
          #!/usr/bin/env python3

          import json
          import os
          import sys
          import base64
          import time
          import requests
          from jupyterhub.handlers import LogoutHandler
          from tornado import web
          from kubernetes_asyncio import client, config
          # install 'cognitojwt' packages to hub container - require to validate user claim
          try:
            import cognitojwt
          except ImportError:
            import subprocess
            subprocess.call([sys.executable, "-m", "pip", "install", "wheel"])
            subprocess.call([sys.executable, "-m", "pip", "install", "--user", "cognitojwt[sync]"])
          finally:
            sys.path.append(os.path.expanduser('~') + "/.local/lib/python3.11/site-packages")
            import cognitojwt

          def enum(**enums):
            return type('Enum', (), enums)

          async def verify_claims(self, user):
            # Retrieve user authentication info, decode, and verify claims
            try:
              auth_state = await user.get_auth_state()
              # self.log.info(f"auth_state: {auth_state}")
              if auth_state is None:
                raise ValueError("auth_state is empty")

              verified_claims = cognitojwt.decode(
                auth_state['access_token'],
                os.getenv('COGNITO_REGION', 'us-west-2'),
                os.getenv('JUPYTERHUB_USERPOOL_ID'),
                testmode=False  # Enable token expiration check
              )
              return verified_claims
            except cognitojwt.CognitoJWTException as err:
              self.log.error(f"Cliam verification issue: {err}")
              raise web.HTTPError(401, "Session is expired!")

          async def custom_options_form(self):
            self.log.info(f"logged in user: {self.user.name}")

            cognito_user_groups = enum(
              DEVELOPMENT='dev-group',
              POWER_USER='power-user-group',
              DEFAULT='default-group',
              NONE='None'
            )

            default_storage = {
              'volumes': [
                {
                  'name' : 'model-2-volume',
                  'persistentVolumeClaim' : {
                    'claimName' : 'model-2-pvc',
                    'readOnly' : True
                  }
                },
                {
                  'name' : 'data',
                  'persistentVolumeClaim' : {
                    'claimName' : 'data-pvc',
                    'readOnly': True
                  }
                },
                {
                  'name' : 'jupyter-notebook-config',
                  'configMap' : {
                    'name' : 'jupyter-notebook-config'
                  }
                },
                {
                  'name' : 'volume-{username}',
                  'persistentVolumeClaim' : {
                    'claimName' : 'claim-{username}'
                  }
                }
              ],
              'volume_mounts': [
                {
                  'mountPath' : '/var/share/models/',
                  'name' : 'model-2-volume'
                },
                {
                  'mountPath' : '/var/share/compact/',
                  'name' : 'data'
                },
                {
                  'mountPath' : '/etc/jupyter/jupyter_notebook_config.py',
                  'subPath' : 'jupyter_notebook_config.py',
                  'name' : 'jupyter-notebook-config'
                },
                {
                  'mountPath' : '/home/jovyan',
                  'name' : 'volume-{username}'
                }
              ]
            }

            def customise_storage_for_profile(kubespawner_profile, storage_override):
                profile = kubespawner_profile.copy()

                for key in ['volumes', 'volume_mounts']:
                    # set storage_override
                    if key not in profile['kubespawner_override']:
                        profile['kubespawner_override'][key] = storage_override[key]
                    else:
                      existing_volume_names = {entry['name'] for entry in profile['kubespawner_override'][key]}
                      for entry in storage_override[key]:
                          # set storage_override if not exists
                          if entry['name'] not in existing_volume_names:
                              profile['kubespawner_override'][key].append(entry)

                    # set default_storage
                    existing_volume_names = {entry['name'] for entry in profile['kubespawner_override'][key]}
                    for entry in default_storage[key]:
                        # set default_storage if not exists
                        if entry['name'] not in existing_volume_names:
                            profile['kubespawner_override'][key].append(entry)

                return profile

            # Add extra labels - labels are used for cilium network policy and cost
            extra_labels = {
              'username': '{username}',
              'hub.jupyter.org/network-access-hub': 'true'
            }

            # setup default profile_list for all users
            default_profile_list = [
              {
                'default': True,
                'display_name': 'Default environment',
                'description': '2 Cores, 16 GB Memory',
                'kubespawner_override': {
                  'mem_guarantee': '12G',
                  'mem_limit': '14G',
                  'cpu_guarantee': 1.2,
                  'cpu_limit': 1.7,
                  'node_selector': {'nodesize': 'L'}
                }
              },
              {
                'default': False,
                'display_name': 'Large environment'
                'description': '4 Cores, 32 GB Memory',
                'kubespawner_override': {
                  'mem_guarantee': '24G',
                  'mem_limit': '29G',
                  'cpu_guarantee': 3.0,
                  'cpu_limit': 3.5,
                  'node_selector': {'nodesize': 'XL'}
                }
              },
            ]
            self.profile_list = default_profile_list

            power_user_profile_list = [
              {
                'default': False,
                'display_name': '2XL default environment - test',
                'description': '7 Cores, 60G Memory',
                'kubespawner_override': {
                  'mem_guarantee': '60G',
                  'mem_limit': '62G',
                  'cpu_guarantee': 7,
                  'cpu_limit': 7,
                  'node_selector': {'nodesize': '2XL'},
                  'image': 'earth/sandbox:latest',
                  'image_pull_policy': 'Always'
                }
              },
              {
                'default': False,
                'display_name': '4XL default environment - test',
                'description': '15 Cores, 100G Memory',
                'kubespawner_override': {
                  'mem_guarantee': '100G',
                  'mem_limit': '100G',
                  'cpu_guarantee': 15,
                  'cpu_limit': 15,
                  'node_selector': {'nodesize': '4XL'},
                  'image': 'earth/sandbox:latest',
                  'image_pull_policy': 'Always'
                }
              },
            ]

            dev_profile_list = [
              {
                'default': False,
                'display_name': 'Unstable environment',
                'description': '2 Cores, 16G Memory',
                'kubespawner_override': {
                  'image': 'earth/sandbox:latest',
                  'image_pull_policy': 'Always',
                  'node_selector': {'nodesize': 'L'}
                }
              },
              {
                'default': False,
                'display_name': 'Unstable environment with sudo',
                'description': '2 Cores, 16G Memory',
                'kubespawner_override': {
                  'image': 'earth/sandbox:sudo-latest',
                  'image_pull_policy': 'Always',
                  'node_selector': {'nodesize': 'L'},
                  'environment': {
                    'EXTRA_REPO': 'https://github.com/e-sensing/sitsnotebooks.git'
                  }
                }
              },
              {
                'default': False,
                'display_name': 'Unstable environment | DockerHub',
                'description': '2 Cores, 16G Memory',
                'kubespawner_override': {
                  'image': 'earth/sandbox:latest',
                  'image_pull_policy': 'Always',
                  'node_selector': {'nodesize': 'L'}
                }
              },
              {
                'default': False,
                'display_name': 'Unstable environment | SITS Jupyter',
                'description': '2 Cores, 16G Memory',
                'kubespawner_override': {
                  'mem_guarantee': '14G',
                  'mem_limit': '14G',
                  'cpu_guarantee': 1.4,
                  'cpu_limit': 1.7,
                  'image': 'brazildatacube/sits-jupyter:latest',
                  'image_pull_policy': 'Always',
                  'node_selector': {'nodesize': 'L'},
                  'environment': {
                    'EXTRA_REPO': 'https://github.com/e-sensing/sitsnotebooks.git',
                    'EXTRA_REPO_PATH': '/tmp/test_repo'
                  }
                }
              },
            ]


            try:
              # Read user access token to collect user group info
              verified_claims = await verify_claims(self, self.user)
              user_group_info = verified_claims.get('cognito:groups', [])
              self.log.info(f"{self.user.name} user belongs to group(s): {(','.join(user_group_info))}")

              # Use logic here to decide how to configure user profile_list based on user-group
              if cognito_user_groups.POWER_USER in user_group_info:
                self.profile_list.extend(power_user_profile_list)

              if cognito_user_groups.DEVELOPMENT in user_group_info:
                self.profile_list.extend(dev_profile_list)

              # Set extra labels
              self.extra_labels = extra_labels

              # Return options_form - Let KubeSpawner inspect profile_list and decide what to return
              return self._options_form_default()
            except (TypeError, IndexError, ValueError, KeyError) as err:
              self.log.error(f"Syntaxt error: {err}")
              raise web.HTTPError(400, "Something went wrong. Coud not load profiles")

          # Set the log level by value or name
          c.JupyterHub.log_level = 'DEBUG'

          # Set cookies - jupyterhub-session-id and jupyterhub-hub-login - to less than a day
          c.Jupyterhub.cookie_max_age_days = 0.90
          c.JupyterHub.tornado_settings['cookie_options'] = dict(expires_days=0.90)

          # Enable debug-logging of the single-user server
          c.Spawner.debug = True

          # Enable debug-logging of the single-user server
          c.LocalProcessSpawner.debug = False
          c.Spawner.cmd = ['jupyterhub-singleuser']

          # displays a notebook with information when launching
          c.Spawner.default_url = '/user/{username}/lab/tree/LandingPage.ipynb'

          # Override spawner timeout - in seconds
          c.KubeSpawner.start_timeout = 600
          c.KubeSpawner.http_timeout = 90

          # Override options_form
          c.KubeSpawner.options_form = custom_options_form

          # all the users are allowed to login
          c.GenericOAuthenticator.allow_all = True

        templates: |
          c.JupyterHub.logo_file = u'/etc/jupyterhub/custom/branding/logo-inline.svg'


    singleuser:
      image:
        name: earth/sandbox
        tag: 0.0.9
      storage:
        extraVolumes:
          - name: notebooks
            emptyDir: {}
          - name: jupyter-notebook-config
            configMap:
              name: jupyter-notebook-config
          - name: model-2-volume
            persistentVolumeClaim:
              claimName: model-2-pvc
              readOnly: true
          - name: data-read
            persistentVolumeClaim:
              claimName: data-pvc
              readOnly: true
        extraVolumeMounts:
          - name: notebooks
            mountPath: /notebooks
          - name: jupyter-notebook-config
            mountPath: /etc/jupyter/jupyter_notebook_config.py
            subPath: jupyter_notebook_config.py
          - name: model-2-volume
            mountPath: /var/share/models
          - name: data-read
            mountPath: /var/share/compact

Hope this helps.