How to use env vars like JUPYTERHUB_USER in initContainers

Background

I am using the JupyterHub Helm chart to deploy on Kubernetes. I would like to generate a hash based on the authenticated username that each user may access from their Jupyter notebooks and terminals. This hash will be unique to their username but they will not know how to generate it. The purpose of this hash is to serve as an authentication token that they can pass to an independent job management API server we have deployed. Because this job system mounts the user’s Jupyter home folder (/home/jovyan) PersistentVolume that is mounted to the Jupyter notebook server itself (so that their spawned jobs can access their user files), I need to ensure that one user cannot mount another users home folder volume. For example, if user alice launches a job but sets the username parameter to bob, then alice would gain access to bob's files because his Jupyter home folder volume would be attached to alice's job containers (which run as Kubernetes Jobs). If, however, alice must also provide her token, then the job API server can run the secret hash function on the submitted username parameter and validate her claim. If the API server cannot reproduce her token based on her specified username parameter, then it will reject her request.

Problem

The only way I know how to generate this token is to define an initContainer with an emptyDir volume shared between the initContainer and the Jupyter server and populate a file with the value at a known location, such as /var/local/jupyterhub_user_token. This shared volume works perfectly; however, I do not know how to pass my custom hash function the JupyterHub username. I tried using the $JUPYTERHUB_USER environment variable in the initContainer command, but that failed.

The jupyterhub Helm chart values section looks like:

singleuser:
  storage:
    extraVolumes:
    - name: user-token
      emptyDir: {}
    extraVolumeMounts:
    - name: user-token
      mountPath: /var/local
  initContainers:
  - name: init
    image: my-image:latest
    # This command causes the Jupyter server pod to fail because 
    # there is no JUPYTERHUB_USER defined in the initContainer environment:
    command: ['/bin/bash', '-c', 'echo "${JUPYTERHUB_USER}-secret-salt-string" | shasum | cut -f1 -d" " > /var/local/jupyterhub_user_token']
    volumeMounts:
    - name: user-token
      mountPath: /var/local

Any help would be appreciated.

Instead of using initContainers, I found that I can use the lifecycleHooks to execute the hash function and achieve an adequate solution:

    lifecycleHooks:
      postStart:
        exec:
          command:
          - "/bin/bash"
          - "-c"
          - >
            echo "${JUPYTERHUB_USER}-secret-salt-string" | shasum | cut -f1 -d" " > /home/jovyan/.jupyterhub_user_token

It would still be nice to get this added as an environment variable so that the user can use os.environ['JUPYTERHUB_USER_TOKEN'] instead of having to read this file to obtain their token.

You can create a subclass of your authenticator and define a pre_spawn_start hook that can modify the environment. For example see Authenticators — JupyterHub 1.3.0 documentation
Note the example uses auth-state but you can ignore that and also ignore the authenticate method.

You can define that subclass in-line in your Z2JH config using hub.extraConfig: Advanced Topics — Zero to JupyterHub with Kubernetes documentation

Hey thanks! Y’all have thought of everything. So far each time I encounter some customization that we need, there is already support for it. That is impressive. I’ll give your suggestion a shot and mark your answer as the solution when I get it working.

1 Like

After iterating an embarassing number of times, I feel like I’m close to a valid configuration but the environment variable is still not being set. In other words, when I spawn a new JupyterLab server and open a terminal, env | grep UPSTREAM shows that the UPSTREAM_TOKEN env var is missing.

Here is the relevant part of my Helm values file:


hub:
  config:
    # See https://zero-to-jupyterhub.readthedocs.io/en/stable/administrator/authentication.html?highlight=CryptKeeper#enable-auth-state
    Authenticator:
      enable_auth_state: true
    CryptKeeper:
      keys:
        - 7d9adbfe2ca4eb9c233a8f2a775f1fe13b13be0185a6b78faa71959bcfe86f81
    JupyterHub:
      # See 
      #   https://docs.gitlab.com/ce/integration/oauth_provider.html
      #   https://oauthenticator.readthedocs.io/en/latest/getting-started.html#gitlab-setup
      authenticator_class: oauthenticator.gitlab.GitLabOAuthenticator
      GitLabOAuthenticator:
        oauth_callback_url: 'https://example.com/jupyter/hub/oauth_callback'
        client_id: '6...3'
        client_secret: '7...a'
        scope:
        - 'read_user'
        - 'read_api'
        allowed_gitlab_groups:
        - '12345678'
  extraConfig:
    # See https://zero-to-jupyterhub.readthedocs.io/en/stable/administrator/advanced.html?highlight=extraconfig#hub-extraconfig 
    uwsAuthConfig: |
      from oauthenticator.gitlab import GitLabOAuthenticator
      class CustomAuthTokenGenerator(GitLabOAuthenticator):
          async def authenticate(self, handler, data=None):
              username = await identify_user(handler, data)
              upstream_token = await token_for_user(username)
              return {
                  'name': username,
                  'auth_state': {
                      'upstream_token': upstream_token,
                  },
              }
          async def pre_spawn_start(self, user, spawner):
              """Pass upstream_token to spawner via environment variable"""
              auth_state = await user.get_auth_state()
              if not auth_state:
                  # auth_state not enabled
                  return
              spawner.environment['UPSTREAM_TOKEN'] = auth_state['upstream_token']
1 Like

I finally got it working! I got some help from this post too.

I have a feeling that the way I’ve done it is not ideal, and I still do not understand certain things. For example, I set enable_auth_state to true in hub.config.Authenticator but that is not sufficient; I had to set this in the hub.extraConfig Python code. Also, where is the hub.auth.custom stuff documented? I just tried inserting the auth config block at different levels of the YAML hierarchy until it worked.

If anyone can help me clean this up so we have a solid example of how to do this for the GitLab authenticator, hopefully it will save someone the hours I just spent!

Here is the relevant part of my Helm values file:


hub:
  auth:
    type: custom
    custom:
      className: "CustomAuthTokenGenerator"
  config:
    # See https://zero-to-jupyterhub.readthedocs.io/en/stable/administrator/authentication.html?highlight=CryptKeeper#enable-auth-state
    Authenticator:
      enable_auth_state: true
    CryptKeeper:
      keys:
        - ff...68
    JupyterHub:
      admin_access: true
      # See 
      #   https://docs.gitlab.com/ce/integration/oauth_provider.html
      #   https://oauthenticator.readthedocs.io/en/latest/getting-started.html#gitlab-setup
      CustomAuthTokenGenerator:
        oauth_callback_url: 'https://example.com/jupyter/hub/oauth_callback'
        client_id: '6c..23'
        client_secret: '76..9a'
        scope:
        - 'read_user'
        - 'read_api'
        allowed_gitlab_groups:
        - '12345678'
  extraConfig:
    # See https://zero-to-jupyterhub.readthedocs.io/en/stable/administrator/advanced.html?highlight=extraconfig#hub-extraconfig 
    uwsAuthConfig.py: |
      from oauthenticator.gitlab import GitLabOAuthenticator
      import hashlib
      class CustomAuthTokenGenerator(GitLabOAuthenticator):
          async def pre_spawn_start(self, user, spawner):
              """Pass upstream_token to spawner via environment variable"""
              auth_state = await user.get_auth_state()
              if not auth_state:
                  # auth_state not enabled
                  return
              try:
                  spawner.environment['GITLAB_ACCESS_TOKEN'] = auth_state['access_token']
                  spawner.environment['GITLAB_USERNAME'] = auth_state['gitlab_user']['username']
              except Exception as e:
                  print('ERROR setting env vars from auth_state')
                  print(str(e))
              try:
                  spawner.environment['UWS_AUTH_TOKEN'] = hashlib.sha1(bytes(f'{auth_state["gitlab_user"]["username"]}-secret-salt-string', 'utf-8')).hexdigest()
              except Exception as e:
                  print('ERROR setting UWS_AUTH_TOKEN from GitLab auth_state: {}'.format(str(e)))
      c.JupyterHub.authenticator_class = CustomAuthTokenGenerator
      # Need to persist auth state in database.
      c.Authenticator.enable_auth_state = True
1 Like

I’m guessing that you needed to set c.Authenticator.enable_auth_state = True where you did because you assigned authenticator_class there also?

I’ve created an issue to document thisL Add example of overiding a built-in authenticator to use auth_State · Issue #2087 · jupyterhub/zero-to-jupyterhub-k8s · GitHub

1 Like

Exactly what i needed, and it works just fine! Thank you so much for this! :slight_smile: