GitLab OAuthenticator - Set single user image config

We are using GitLab OAuthenticator which has access token. On successful auth, is it possible to set this access token as config in the single user image spawned by JupyterHub? Something like

c.GitLabConfig.access_token = "< YOUR_ACCESS_TOKEN >"

set on the single user image. If not that, maybe set it as an environment variable on the single user image?

Yes, this is possible. There are two mechanisms involved. The first is auth state, which JupyterHub Authenticators can use to store data that will be encrypted in the database, such as credentials. GitLabOAuthenticator already supports this. Auth state is not enabled by default, however. You need to specify an encryption key to start storing potentially sensitive info. In the helm chart, this is:

auth:
  state:
    enabled: true
    cryptoKey: 'output of e.g. openssl rand -hex 32'

At this point, the access token and other info is now being stored in the database when the user logs in. The last step is to retrieve this value and load it into your user environment. This can be done with the pre_spawn_hook:

async def add_auth_env(spawner):
    auth_state = await spawner.user.get_auth_state()
    if not auth_state:
        spawner.log.warning("No auth state for %s", spawner.user)
        return
    spawner.environment['GITLAB_ACCESS_TOKEN'] = auth_state['access_token']

c.Spawner.pre_spawn_hook = add_auth_env

Now that should do it.

1 Like

Thanks @minrk that makes sense. Just a couple of follow up questions,

  1. Why async/await (just curious)?
  2. Can we make do without persisting the access_key in the Hub? In our case it’s only needed one time when user logs in via JH and spawner spawns single user image right after. Any time after that both login+spawner flow would be repeated together right? So what’s the need for persistence? Maybe I’m not aware of a flow where only spawner is kicked off.

Anyone else trying to achieve the same, in addition to Min’s snippet above, there’s good documentation with example here.

  1. Why async/await (just curious)?

It’s async because get_auth_state is async, and any caller of an async function must also be async. get_auth_state is async because it’s run in a thread to avoid the possibly-slow decryption of auth data from blocking the main application thread.

  1. Can we make do without persisting the access_key in the Hub?

Not reliably, no. You aren’t guaranteed that the Hub process is the same one that ran the authentication, so it must persist somehow. Logging in and spawning do not occur on the same interval. Take this sequence for example:

  1. user logs in, this triggers oauth handshake and sets a cookie for future auth with the hub
  2. user is still logged in with cookie, spawns server
  3. server stops
  4. hub restarts
  5. user still logged in with cookie, spawns server again

Steps 2 and 5 require the access_key, which was only retrieved in step 1. This has to survive across process restarts in order to work. Spawn is not necessarily part of the login flow, nor is login part of the spawn flow most of the time.

Now, JupyterHub 1.0 introduces new features for managing auth state, which may allow you to keep this in-memory and force a re-login on spawn if the last login was from a previous process. This would let you keep this only in-memory, instead of in the database. We’re working on putting the 1.0 beta out, hopefully this week.

2 Likes

@minrk I am trying to debug why:

spawner.user.get_auth_state()

is always returning None for me and this thread seems somewhat related. Do you have any ideas or other threads you can point me to?

will always return None if auth state is not enabled. You need two bits of configuration to enable auth state, which is disabled by default and needs an encryption key to work:

c.Authenticator.enable_auth_state = True
c.CryptKeeper.keys = [b'some-secret-key'] # needed for encryption