Notebook disconnection after few minutes ; custom authenticator and spawner

Hi !

I am deploying a jupyterhub instance, with custom authenticator (using cookies created by an external service to authenticate user, by querying the external service) and custom spawner (podman-based, hand-made to handle many use-cases specific to my organisation such as access levels and shared dirs).

Everything is fine, except one bit : users get disconnected every few minutes.

Base problem

After 3 minutes using her personnal server (single-user notebook, run with podman with an image created from the quay dot io at URI /jupyter/datascience-notebook image), whenever the user tries to load or save a file, the operation fails with a popup : “File Save Error for demo.ipynb” is the bold title, “Forbidden” is the only text, and may be closed with a dismiss button.
When trying to load a file, the bold title is “File Load Error for demo.ipynb”.

By refreshing the page, user loose its (thus) non-saved work, but on the bright side, problem is gone for another 3 minutes.

The problem concern all users, and is found to be independant of other users. (each user get 3mn, whatever the others are experiencing)

Workarounds

It soon appears to us that, if the user go to its lab in another tab of her browser, the problem disappear in the original tab. Files may be saved again, everything is fine for 3 new minutes.

The workaround was simple : use a web extension to automatically, every 3mn, refresh that /hub/home endpoint. That does not please me, as it require an external program that is installed in a very dirty way to bypass organization cybersecurity rules. I cannot rely on that solution for a wannabe official tool.

Observations

I investigated this problem, and i’m running out of idea on how to progress on this issue. But i have some things to point out for someone to help me identify the root cause.

Cookies

When the bug appears, in the notebook (/user/0147user/lab/tree/demo.ipynb for instance), i observe 4 cookies (5 with the one setted by the external authentication service) :

  • 2 _xrsf cookies, slithly different in value. Both have domain equal to jupyter.domain.tld, but path differ (/hub/ and /user/0147user) so is expiration (around one day after now, ans Session).
  • jupyter-hub-login (jupyter.domain.tld and /hub/)
  • jupyter-hub-session-id (jupyter.domain.tld and /)

But when the notebook is in functionnal state (before the problem arise or after applying the workaround and until the problem arise again), a supplementary cookie exists :

  • jupyerhub-user-0147user (jupyter.domain.tld and /user/0147user, expiration set in about one day)

That particular cookie is deleted at the very moment the bug arise, and reappear when the workaround is applied.

I would guess that the cookie is necessary for the notebook to identify itself to the hub, but it is destroyed before having a chance to be updated.

Logs

The cookies disappear at the same time as the following two logs appears in jupyterhub logs :

[W 2024-07-19 13:25:21.780 JupyterHub log:192] 403 GET /hub/api/user (@127.0.0.1) 3.10ms
[W 2024-07-19 13:25:21.782 JupyterHub log:192] 403 GET /hub/api/user (@127.0.0.1) 1.07ms

(even in debug mode, i don’t have more log lines at that moment)

A 403 on /hub/api/user, with @127.0.0.1 as host. That kind of request behave normally in many other cases, such as :

[I 2024-07-19 13:20:19.821 JupyterHub log:192] 200 POST /hub/api/oauth2/token (0147user@127.0.0.1) 19.98ms
[I 2024-07-19 13:20:19.791 JupyterHub log:192] 302 GET /hub/api/oauth2/authorize?client_id=jupyterhub-user-0147usera&redirect_uri=%2Fuser%2F0147user%2Foauth_callback&response_type=code&state=[secret] -> /user/0147user/oauth_callback?code=[secret]&state=[secret] (0147user@10.67.70.73) 11.82ms
[I 2024-07-19 13:59:38.799 JupyterHub log:192] 200 GET /hub/api/user (0136user@127.0.0.1) 2.77ms

Difference is in host : <user>@127.0.0.1 works, but @127.0.0.1 do not. I do not find any counter-example.

My uneducated guess is that jupyterhub want to know about users (probably before calling refresh_user function), and for that tries to join the dedicated endpoint. But is doing so without indicating any explicit user, which for a reason or another is getting it refused (hence 403 forbidden).

I do not know how to test nor investiguate further that hypothesis.

A piece of code

I cannot share the whole codebase easily, but i may provide you some lines that happen to be very similar.

VaultAuth

from jupyterhub.auth import Authenticator
from traitlets import Unicode, Dict
from org import UserInfo


class VaultAuthenticator(Authenticator):
    vault_addr = Unicode('netaddr', help="address of the remote vault instance", config=True)
    vault_token_cookie = Unicode('vault_token', help="name of the cookie containing the vault token", config=True)

    def _user_info(self, user: UserInfo) -> dict:
        return {
            'name': user.username.lower(),
            'auth_state': user.as_json()
        }

    async def authenticate(self, handler, data):
        if auth_token := handler.get_cookie(self.vault_token_cookie):
            if user := UserInfo.from_token(auth_token, self.vault_addr):
                if user.is_valid():
                    self.log.info(f'AUTH USER: {user.name} {user.token[:5] + "*"*len(user.token[5:-5]) + user.token[-5:]} until {user.expire_at}')
                    return self._user_info(user)
        # not logged or not authorized
        handler.redirect('auth.domain.tld/user')

    def is_admin(self, handler, authentication) -> bool:
        "No admin here"
        return False

    async def refresh_user(self, user, handler=None) -> bool:
        """disconnect the user if her vault token has expired

        Returns
           True -- user is valid as-is
           dict -- user needs update with the following values
           False -- user needs to login again

        """
        assert handler, handler
        auth_token, vaultuser = None, None
        if auth_token := handler.get_cookie(self.vault_token_cookie):
            if vaultuser := UserInfo.from_token(auth_token, self.vault_addr):
                if vaultuser.is_valid():
                    self.log.info(f"USER {user.name} IS OK: {vaultuser}")
                    state = await user.get_auth_state()
                    assert isinstance(state, dict), state
                    curr_token = state.get('token')
                    if curr_token == auth_token:
                        self.log.info(f"USER token did not change and is still valid ; everything is ok")
                        return True
                    else:
                        self.log.info(f"USER token has changed ; will update user now")
                        return self._user_info(vaultuser)
        self.log.debug(f"USER {user.name} IS INVALID: {auth_token}, {vaultuser}, {vaultuser.is_valid() if vaultuser else None}")
        return False

jupyterhub_config.py

c = get_config()  #noqa

c.JupyterHub.ip = 'localhost'
c.JupyterHub.port = 6910
c.JupyterHub.hub_port = 6911
c.JupyterHub.base_url = '/'
c.ConfigurableHTTPProxy.api_url = 'http://127.0.0.1:6912'

c.JupyterHub.named_server_limit_per_user = 1
c.JupyterHub.oauth_token_expires_in = 1  # TODO: cannot set less ?
c.Application.log_level = 'DEBUG'
c.Spawner.oauth_client_allowed_scopes = ['self']
c.Authenticator.delete_invalid_users = True

import os
os.environ['JUPYTERHUB_CRYPT_KEY'] = 'd5747730725530898054802519259896211519792c9a8f0ce5d27e3f229001a4'
c.JupyterHub.cookie_secret = 'd5747730725530898054802519259896211519792c9a8f0ce5d27e3f229001a5'
c.JupyterHub.cookie_max_age_days = 0.2
c.ConfigurableHTTPProxy.auth_token = 'd5747730725530898054802519259896211519792c9a8f0ce5d27e3f229001a6'

c.JupyterHub.authenticator_class = 'org.VaultAuthenticator'
c.Authenticator.allow_all = True
c.Authenticator.refresh_pre_spawn = True  # force login at spawn
c.Authenticator.auto_login = True
c.Authenticator.enable_auth_state = True
c.Authenticator.vault_addr = 'https://auth.domain.tld'
c.Authenticator.vault_token_cookie = 'org-id-token-data'
c.Authenticator.auth_page = 'https://auth.domain.tld/user'
c.Authenticator.auth_refresh_age = 3600*4  # number of seconds between two refresh_user call to ensure authentication  # TODO: not working really ; refresh_user got called only if user is visiting /hub :/
c.Authenticator.username_pattern = '[0-9]{4}user'
c.Authenticator.base_url = c.JupyterHub.base_url  # we need that info, and couldn't find it elsewhere  TODO: find it dynamically

c.NotebookApp.password = ''
c.NotebookApp.token = ''


############### SPAWNER ###############
c.JupyterHub.spawner_class = 'org.PodmanSpawner'
c.Spawner.base_url = c.JupyterHub.base_url  # we need that info, and couldn't find it elsewhere  TODO: find it dynamically
c.Spawner.image = 'gitlab-registry.domain.tld/org/notebook'
c.Spawner.http_timeout = 30

Logs regarding the starting of the jupyter notebook

Here is an example of the podman (docker-like) command than run a notebook server :

[D 2024-07-19 13:43:52.163 JupyterHub scopes:1010] Checking access to /hub/spawn/0136user via scope servers!server=0136user/
[D 2024-07-19 13:43:52.164 JupyterHub pages:216] Triggering spawn with default options for 0136user
[D 2024-07-19 13:43:52.164 JupyterHub base:411] Refreshing auth for 0136user
[I 2024-07-19 13:43:52.197 JupyterHub vaultauth:91] USER 0136user IS OK: <0136user is user until 2024-07-19T17:32:47.187448+02:00>
[I 2024-07-19 13:43:52.198 JupyterHub vaultauth:96] USER token did not change and is still valid ; everything is ok
[D 2024-07-19 13:43:52.198 JupyterHub base:1095] Initiating spawn for 0136user
[D 2024-07-19 13:43:52.198 JupyterHub base:1099] 0/100 concurrent spawns
[D 2024-07-19 13:43:52.198 JupyterHub base:1104] 1 active servers
[I 2024-07-19 13:43:52.205 JupyterHub provider:661] Creating oauth client jupyterhub-user-0136user
[D 2024-07-19 13:43:52.210 JupyterHub vaultauth:61] PRE SPAWN for <User(0136user 1/1 running)> spawning by <podmanspawner.PodmanSpawner object at 0x7f233048c4c0>:
[D 2024-07-19 13:43:52.212 JupyterHub user:912] Calling Spawner.start for 0136user
[I 2024-07-19 13:43:52.213 JupyterHub podmanspawner:222] For user <0136user is user until 2024-07-19T17:32:47.187448+02:00>, starting image gitlab-registry.domain.tld/org/notebook at port 14001, mounting /data/jupyterhub/homes/0136user
[I 2024-07-19 13:43:52.340 JupyterHub podmanspawner:295] Spawning via Podman command: podman run -d --rm --user 1065:1065 --group-add users --security-opt label=disable -v /data/jupyterhub/homes/0136user:/home/jovyan/work:rw,U,Z -w /home/jovyan/work --net host --name jnb-0136user-E4TT1AXQ1PEVZMCA1BP7 --env JUPYTERHUB_API_TOKEN=[secret] --env JPY_API_TOKEN=[secret] --env JUPYTERHUB_CLIENT_ID=jupyterhub-user-0136user --env JUPYTERHUB_COOKIE_OPTIONS={"SameSite": "None", "expires_days": 1} --env JUPYTERHUB_COOKIE_HOST_PREFIX_ENABLED=0 --env JUPYTERHUB_HOST= --env JUPYTERHUB_OAUTH_CALLBACK_URL=/user/0136user/oauth_callback --env JUPYTERHUB_OAUTH_SCOPES=["access:servers!server=0136user/", "access:servers!user=0136user"] --env JUPYTERHUB_OAUTH_ACCESS_SCOPES=["access:servers!server=0136user/", "access:servers!user=0136user"] --env JUPYTERHUB_OAUTH_CLIENT_ALLOWED_SCOPES=["self"] --env JUPYTERHUB_USER=0136user --env JUPYTERHUB_SERVER_NAME= --env JUPYTERHUB_API_URL=http://127.0.0.1:6911/hub/api --env JUPYTERHUB_ACTIVITY_URL=http://127.0.0.1:6911/hub/api/users/0136user/activity --env JUPYTERHUB_BASE_URL=/ --env JUPYTERHUB_SERVICE_PREFIX=/user/0136user/ --env JUPYTERHUB_SERVICE_URL=http://127.0.0.1:14001/user/0136user/ --env JUPYTERHUB_PUBLIC_URL= --env JUPYTERHUB_PUBLIC_HUB_URL= --env LANG=fr_FR.utf8 --env JUPYTER_ENABLE_LAB=yes --env JUPYTER_IMAGE_SPEC=gitlab-registry.domain.tld/org/notebook --cgroup-manager=cgroupfs --memory=8G gitlab-registry.domain.tld/org/notebook start-singleuser.py --ServerApp.port=14001 --ServerApp.base_url=/user/0136user/ --ServerApp.root_dir=/home/jovyan/work --ServerApp.password= --ServerApp.token= --JupyterHub.oauth_token_expires_in=14400 --Authenticator.allowed_user=['0136user']
[I 2024-07-19 13:43:52.486 JupyterHub podmanspawner:310] PodmanSpawner.start cid: c29f3ea29f819312a29016b8a4ddf1e5abc12d9663a18c3a8de2d2a09211d22 at port 14001
[D 2024-07-19 13:43:52.489 JupyterHub spawner:1432] Polling subprocess every 30s
[D 2024-07-19 13:43:52.491 JupyterHub utils:292] Waiting 30s for server at http://127.0.0.1:14001/user/0136user/api
[I 2024-07-19 13:43:53.165 JupyterHub log:192] 302 GET /hub/spawn/0136user -> /hub/spawn-pending/0136user (0136user@10.140.140.42) 1004.87ms
[D 2024-07-19 13:43:53.181 JupyterHub scopes:1010] Checking access to /hub/spawn-pending/0136user via scope servers!server=0136user/
[I 2024-07-19 13:43:53.182 JupyterHub pages:397] 0136user is pending spawn
[I 2024-07-19 13:43:53.182 JupyterHub _xsrf_utils:125] Setting new xsrf cookie for b'c008dc696f8749748cc34b8aa68b6fa7:0bdf8ff10a30456aad36821f551d3f06' {'SameSite': 'None', 'expires_days': 1, 'path': '/hub/'}
[I 2024-07-19 13:43:53.183 JupyterHub log:192] 200 GET /hub/spawn-pending/0136user (0136user@10.140.140.42) 6.32ms
[D 2024-07-19 13:43:53.233 JupyterHub scopes:1010] Checking access to /hub/api/users/0136user/server/progress via scope read:servers!server=0136user/
[I 2024-07-19 13:43:54.243 JupyterHub log:192] 200 GET /hub/api (@127.0.0.1) 0.51ms

nginx.conf

Jupyterhub is served behind a nginx reverse-proxy. Other services are served, but here is the final configuration regarding jupyterhub. Note that jupyterhub and its servers are not expected to go through nginx to discuss between them ; as seen in logs, they directly access the api and instances by their local ports.

http {
    sendfile on;
    tcp_nopush on;
    types_hash_max_size 2048;
    include /etc/nginx/mime.types;
    default_type application/octet-stream;
    access_log /var/log/nginx/access.log;
    error_log /var/log/nginx/error.log;
    log_format  main  '$remote_addr - $remote_user [$time_local] "$request" '
        '$status $body_bytes_sent "$http_referer" '
        '"$http_user_agent" "$http_x_forwarded_for"';
    gzip on;
    disable_symlinks off;
    proxy_headers_hash_max_size 512;

    # more security
    server_tokens off;
    client_body_buffer_size 1k;
    client_header_buffer_size 1k;
    client_max_body_size 4M;
    large_client_header_buffers 2 1k;
    add_header X-Frame-Options "SAMEORIGIN";
    add_header Strict-Transport-security "max-age=31536000; includeSubdomains; always";
    add_header Content-Security-Policy "default-src 'self' http: https: data: blob: 'unsafe-inline' 'unsafe-eval'" always;  # cross site scripting protection
    add_header X-XSS-Protection "1; mode=block";
    add_header Referrer-Policy "orgin-when-cross-origin" always;  # Referrer policy
    add_header X-Content-Type-Options "nosniff" always;  # prevention of MIME confusion-based attacks
    proxy_hide_header X-Powered-By;


    map $http_upgrade $connection_upgrade {
        default upgrade;
        ''      close;
    }


    server {
        # SSL configuration
        listen 443 ssl http2;
        listen [::]:443 ssl http2;
        include params_ssl;

        server_name jupyter.domain.tld;

        client_max_body_size 0;
        proxy_buffering off;

        # proxy headers
        proxy_set_header Host $http_host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
        proxy_http_version 1.1;
        proxy_read_timeout 86400;

        # websocket headers
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection $connection_upgrade;
        proxy_set_header X-Scheme $scheme;

        ssl_stapling on;
        ssl_stapling_verify on;

        location / {
            proxy_pass http://localhost:3019;
        }

    }

    server {
        listen 80;
        server_name jupyter.domain.tld;
        return 302 https://$host$request_uri;
    }
}

Other info

Reproduced with :

  • python 3.9
  • jupyterhub5.0.0 (and latest 4.x.x version before)
  • RHEL9

Bibliography

(only two links total autorized, so it will be quick)


Thank you very much for your time ; any help is greatly appreciated :slight_smile:

There’s a good explanation of the JupyterHub login process in

From Application configuration — JupyterHub documentation

These are the tokens stored in cookies when you visit a single-user server or service. When they expire, you must re-authenticate with the Hub, even if your Hub authentication is still valid. If your Hub authentication is valid, logging in may be a transparent redirect as you refresh the page.

What’s your intention behind setting this to 1 second?

Hi ! Thank you for your reply !

Hmmmm. I remember seing somewhere it was days, not seconds, but i can’t find a proof anywhere. That could explain the whole problem.

I will try that in the afternoon, and get back to you.

I just tested. It seems to work properly now, it’s awesome !

Thank you very, very much !

1 Like