I am using the JupyterHub Helm chart to deploy on Kubernetes. I would like to generate a hash based on the authenticated username that each user may access from their Jupyter notebooks and terminals. This hash will be unique to their username but they will not know how to generate it. The purpose of this hash is to serve as an authentication token that they can pass to an independent job management API server we have deployed. Because this job system mounts the user’s Jupyter home folder (/home/jovyan) PersistentVolume that is mounted to the Jupyter notebook server itself (so that their spawned jobs can access their user files), I need to ensure that one user cannot mount another users home folder volume. For example, if user alice launches a job but sets the username parameter to bob, then alice would gain access to bob's files because his Jupyter home folder volume would be attached to alice's job containers (which run as Kubernetes Jobs). If, however, alice must also provide her token, then the job API server can run the secret hash function on the submitted username parameter and validate her claim. If the API server cannot reproduce her token based on her specified username parameter, then it will reject her request.
Problem
The only way I know how to generate this token is to define an initContainer with an emptyDir volume shared between the initContainer and the Jupyter server and populate a file with the value at a known location, such as /var/local/jupyterhub_user_token. This shared volume works perfectly; however, I do not know how to pass my custom hash function the JupyterHub username. I tried using the $JUPYTERHUB_USER environment variable in the initContainer command, but that failed.
The jupyterhub Helm chart values section looks like:
singleuser:
storage:
extraVolumes:
- name: user-token
emptyDir: {}
extraVolumeMounts:
- name: user-token
mountPath: /var/local
initContainers:
- name: init
image: my-image:latest
# This command causes the Jupyter server pod to fail because
# there is no JUPYTERHUB_USER defined in the initContainer environment:
command: ['/bin/bash', '-c', 'echo "${JUPYTERHUB_USER}-secret-salt-string" | shasum | cut -f1 -d" " > /var/local/jupyterhub_user_token']
volumeMounts:
- name: user-token
mountPath: /var/local
It would still be nice to get this added as an environment variable so that the user can use os.environ['JUPYTERHUB_USER_TOKEN'] instead of having to read this file to obtain their token.
You can create a subclass of your authenticator and define a pre_spawn_start hook that can modify the environment. For example see Authenticators — JupyterHub 1.3.0 documentation
Note the example uses auth-state but you can ignore that and also ignore the authenticate method.
Hey thanks! Y’all have thought of everything. So far each time I encounter some customization that we need, there is already support for it. That is impressive. I’ll give your suggestion a shot and mark your answer as the solution when I get it working.
After iterating an embarassing number of times, I feel like I’m close to a valid configuration but the environment variable is still not being set. In other words, when I spawn a new JupyterLab server and open a terminal, env | grep UPSTREAM shows that the UPSTREAM_TOKEN env var is missing.
Here is the relevant part of my Helm values file:
hub:
config:
# See https://zero-to-jupyterhub.readthedocs.io/en/stable/administrator/authentication.html?highlight=CryptKeeper#enable-auth-state
Authenticator:
enable_auth_state: true
CryptKeeper:
keys:
- 7d9adbfe2ca4eb9c233a8f2a775f1fe13b13be0185a6b78faa71959bcfe86f81
JupyterHub:
# See
# https://docs.gitlab.com/ce/integration/oauth_provider.html
# https://oauthenticator.readthedocs.io/en/latest/getting-started.html#gitlab-setup
authenticator_class: oauthenticator.gitlab.GitLabOAuthenticator
GitLabOAuthenticator:
oauth_callback_url: 'https://example.com/jupyter/hub/oauth_callback'
client_id: '6...3'
client_secret: '7...a'
scope:
- 'read_user'
- 'read_api'
allowed_gitlab_groups:
- '12345678'
extraConfig:
# See https://zero-to-jupyterhub.readthedocs.io/en/stable/administrator/advanced.html?highlight=extraconfig#hub-extraconfig
uwsAuthConfig: |
from oauthenticator.gitlab import GitLabOAuthenticator
class CustomAuthTokenGenerator(GitLabOAuthenticator):
async def authenticate(self, handler, data=None):
username = await identify_user(handler, data)
upstream_token = await token_for_user(username)
return {
'name': username,
'auth_state': {
'upstream_token': upstream_token,
},
}
async def pre_spawn_start(self, user, spawner):
"""Pass upstream_token to spawner via environment variable"""
auth_state = await user.get_auth_state()
if not auth_state:
# auth_state not enabled
return
spawner.environment['UPSTREAM_TOKEN'] = auth_state['upstream_token']
I have a feeling that the way I’ve done it is not ideal, and I still do not understand certain things. For example, I set enable_auth_state to true in hub.config.Authenticator but that is not sufficient; I had to set this in the hub.extraConfig Python code. Also, where is the hub.auth.custom stuff documented? I just tried inserting the auth config block at different levels of the YAML hierarchy until it worked.
If anyone can help me clean this up so we have a solid example of how to do this for the GitLab authenticator, hopefully it will save someone the hours I just spent!
Here is the relevant part of my Helm values file:
hub:
auth:
type: custom
custom:
className: "CustomAuthTokenGenerator"
config:
# See https://zero-to-jupyterhub.readthedocs.io/en/stable/administrator/authentication.html?highlight=CryptKeeper#enable-auth-state
Authenticator:
enable_auth_state: true
CryptKeeper:
keys:
- ff...68
JupyterHub:
admin_access: true
# See
# https://docs.gitlab.com/ce/integration/oauth_provider.html
# https://oauthenticator.readthedocs.io/en/latest/getting-started.html#gitlab-setup
CustomAuthTokenGenerator:
oauth_callback_url: 'https://example.com/jupyter/hub/oauth_callback'
client_id: '6c..23'
client_secret: '76..9a'
scope:
- 'read_user'
- 'read_api'
allowed_gitlab_groups:
- '12345678'
extraConfig:
# See https://zero-to-jupyterhub.readthedocs.io/en/stable/administrator/advanced.html?highlight=extraconfig#hub-extraconfig
uwsAuthConfig.py: |
from oauthenticator.gitlab import GitLabOAuthenticator
import hashlib
class CustomAuthTokenGenerator(GitLabOAuthenticator):
async def pre_spawn_start(self, user, spawner):
"""Pass upstream_token to spawner via environment variable"""
auth_state = await user.get_auth_state()
if not auth_state:
# auth_state not enabled
return
try:
spawner.environment['GITLAB_ACCESS_TOKEN'] = auth_state['access_token']
spawner.environment['GITLAB_USERNAME'] = auth_state['gitlab_user']['username']
except Exception as e:
print('ERROR setting env vars from auth_state')
print(str(e))
try:
spawner.environment['UWS_AUTH_TOKEN'] = hashlib.sha1(bytes(f'{auth_state["gitlab_user"]["username"]}-secret-salt-string', 'utf-8')).hexdigest()
except Exception as e:
print('ERROR setting UWS_AUTH_TOKEN from GitLab auth_state: {}'.format(str(e)))
c.JupyterHub.authenticator_class = CustomAuthTokenGenerator
# Need to persist auth state in database.
c.Authenticator.enable_auth_state = True