KubeSpawner with idle-culler incorrect volume mount based on recent culled pod

aptroost · October 21, 2020, 6:26am

Context: The JupyterHub config that we are developing uses KubeSpawner to create pods for individual users. Currently each user gets a volume mount based on an Amazon EFS volume with a subpath for that particular user.

Problem: The idle-culler is used to kill the pods after a certain period of time. But a problem arises when the idle-culler is used. When a user pod is culled the next user that logs in gets the correct user details for that pod except that it gets the mounted subpath from the most recent culled user, which is very much a problem, so we disabled the idle-culler.
We validated the subPath that is fed to the KubeSpawner.volume_mounts and this is correct. Next to that, every time any variable is set with user details we deleted the variable after the KubeSpawner applies it to ensure all variables are cleard for the next user. Though, with the idle-culler enabled, the volume_mount subPath in the created pod still gets access to the most recent culled subPath.

Question: How to find the exact issue that influences the KubeSpawner?

The applicable jupyterhub_config.py is as follows:

...
class authHandler(BaseHandler):
    def get(self):
        userid = self.user_authenticated(user)
        ...
        c.KubeSpawner.storage_class = 'jhub-sc'
        pvc_name_template = 'claim-jhub-users'
        volume_name_template = 'volume-jhub-users'
        c.KubeSpawner.pvc_name_template = pvc_name_template
        c.KubeSpawner.volumes = [{
            'name': volume_name_template,
            'persistentVolumeClaim': {
                'claimName': pvc_name_template
            }
        }]
        c.KubeSpawner.volume_mounts = [{
            'mountPath': '/mnt',
            'name': volume_name_template,
            'subPath': 'users/' + str(userid),
            'readOnly': False
        }]
        ...
    ...
...
c.JupyterHub.services = [
    {
        'name': 'idle-culler',
        'admin': True,
        'command': [
            sys.executable,
            '-m', 'jupyterhub_idle_culler',
            '--remove-named-servers=True',
            '--timeout=600'
        ],
    }
]

aptroost · October 23, 2020, 5:37pm

Hypothesis: The only thing that I can understand is that KubeSpawner recognized the variable volume_name_template = 'volume-jhub-users' inside the c.KubeSpawner.volume_mounts as being identical to a previous request, and mounts it with a previously remembered subPath.

Method: Variable volume_name_template = 'volume-jhub-users' to volume_name_template = 'volume-jhub-user-' + str(userid). Also we added a volume_subpath_template = 'users/' + str(userid) so the config now is as follows:

volume_name_template = 'volume-jhub-user-' + str(userid)
volume_subpath_template = 'users/' + str(userid)
c.KubeSpawner.volume_mounts = [{
    'mountPath': '/mnt',
    'name': volume_name_template,
    'subPath': volume_subpath_template,
    'readOnly': False
}]

Question: We will test this thoroughly and post the results back to this thread. If anyone experienced a similar problem, has another hypothesis or can confirm the hypothesis, please reply.

manics · October 24, 2020, 10:54am

The configuration is only loaded once startup, you can’t dynamically change c in a function.

However volume_mounts has special handling to expand the {username} to the actual username, so maybe you can use that instead? See the documentation for config c.KubeSpawner.volume_mounts = List() on https://jupyterhub-kubespawner.readthedocs.io/en/latest/spawner.html

aptroost · October 26, 2020, 8:07am

Makes sense that c can not be changed in a function anymore, thanks!
This made me redo the function and I ended up with the following, which is tested and works very well to resolve the issue described above.

...
def my_pre_spawn_hook(spawner):
    spawner.volumes.extend([{
        'name': 'volume-jhub-user-' + str(spawner.user.name),
        'persistentVolumeClaim': {
            'claimName': 'claim-jhub-users'
        }
    }])
    spawner.volume_mounts.extend([{
        'mountPath': '/mntdir',
        'name': 'volume-jhub-user-' + str(spawner.user.name),
        'subPath': 'users/' + str(spawner.user.name)
    }])
...
class authHandler(BaseHandler):
    def get(self):
        ...
    ...
...
c.JupyterHub.authenticator_class = authHandler
c.KubeSpawner.pre_spawn_hook = my_pre_spawn_hook
...
c.JupyterHub.services = [
    {
        'name': 'idle-culler',
        'admin': True,
        'command': [
            sys.executable,
            '-m', 'jupyterhub_idle_culler',
            '--remove-named-servers=True',
            '--timeout=600'
        ],
    }
]

Topic		Replies	Views
c.KubeSpawner not respecting storage values Zero to JupyterHub on Kubernetes	1	581	July 29, 2021
Mount pv based on script on user pod Zero to JupyterHub on Kubernetes	2	612	December 21, 2019
NamespacedKubeSpawner JupyterHub	1	402	November 29, 2018
Mount Configmap to userpod JupyterHub	5	2705	October 10, 2019
Adding docker volume mounts with SystemUserSpawner? JupyterHub jupyterhub , help-wanted	1	950	July 27, 2020

KubeSpawner with idle-culler incorrect volume mount based on recent culled pod

Related topics