Hi !
I am deploying a jupyterhub instance, with custom authenticator (using cookies created by an external service to authenticate user, by querying the external service) and custom spawner (podman-based, hand-made to handle many use-cases specific to my organisation such as access levels and shared dirs).
Everything is fine, except one bit : users get disconnected every few minutes.
Base problem
After 3 minutes using her personnal server (single-user notebook, run with podman with an image created from the quay dot io at URI /jupyter/datascience-notebook image), whenever the user tries to load or save a file, the operation fails with a popup : “File Save Error for demo.ipynb” is the bold title, “Forbidden” is the only text, and may be closed with a dismiss button.
When trying to load a file, the bold title is “File Load Error for demo.ipynb”.
By refreshing the page, user loose its (thus) non-saved work, but on the bright side, problem is gone for another 3 minutes.
The problem concern all users, and is found to be independant of other users. (each user get 3mn, whatever the others are experiencing)
Workarounds
It soon appears to us that, if the user go to its lab in another tab of her browser, the problem disappear in the original tab. Files may be saved again, everything is fine for 3 new minutes.
The workaround was simple : use a web extension to automatically, every 3mn, refresh that /hub/home endpoint. That does not please me, as it require an external program that is installed in a very dirty way to bypass organization cybersecurity rules. I cannot rely on that solution for a wannabe official tool.
Observations
I investigated this problem, and i’m running out of idea on how to progress on this issue. But i have some things to point out for someone to help me identify the root cause.
Cookies
When the bug appears, in the notebook (/user/0147user/lab/tree/demo.ipynb for instance), i observe 4 cookies (5 with the one setted by the external authentication service) :
- 2 _xrsf cookies, slithly different in value. Both have domain equal to jupyter.domain.tld, but path differ (/hub/ and /user/0147user) so is expiration (around one day after now, ans Session).
- jupyter-hub-login (jupyter.domain.tld and /hub/)
- jupyter-hub-session-id (jupyter.domain.tld and /)
But when the notebook is in functionnal state (before the problem arise or after applying the workaround and until the problem arise again), a supplementary cookie exists :
- jupyerhub-user-0147user (jupyter.domain.tld and /user/0147user, expiration set in about one day)
That particular cookie is deleted at the very moment the bug arise, and reappear when the workaround is applied.
I would guess that the cookie is necessary for the notebook to identify itself to the hub, but it is destroyed before having a chance to be updated.
Logs
The cookies disappear at the same time as the following two logs appears in jupyterhub logs :
[W 2024-07-19 13:25:21.780 JupyterHub log:192] 403 GET /hub/api/user (@127.0.0.1) 3.10ms
[W 2024-07-19 13:25:21.782 JupyterHub log:192] 403 GET /hub/api/user (@127.0.0.1) 1.07ms
(even in debug mode, i don’t have more log lines at that moment)
A 403 on /hub/api/user, with @127.0.0.1 as host. That kind of request behave normally in many other cases, such as :
[I 2024-07-19 13:20:19.821 JupyterHub log:192] 200 POST /hub/api/oauth2/token (0147user@127.0.0.1) 19.98ms
[I 2024-07-19 13:20:19.791 JupyterHub log:192] 302 GET /hub/api/oauth2/authorize?client_id=jupyterhub-user-0147usera&redirect_uri=%2Fuser%2F0147user%2Foauth_callback&response_type=code&state=[secret] -> /user/0147user/oauth_callback?code=[secret]&state=[secret] (0147user@10.67.70.73) 11.82ms
[I 2024-07-19 13:59:38.799 JupyterHub log:192] 200 GET /hub/api/user (0136user@127.0.0.1) 2.77ms
Difference is in host : <user>@127.0.0.1
works, but @127.0.0.1
do not. I do not find any counter-example.
My uneducated guess is that jupyterhub want to know about users (probably before calling refresh_user function), and for that tries to join the dedicated endpoint. But is doing so without indicating any explicit user, which for a reason or another is getting it refused (hence 403 forbidden).
I do not know how to test nor investiguate further that hypothesis.
A piece of code
I cannot share the whole codebase easily, but i may provide you some lines that happen to be very similar.
VaultAuth
from jupyterhub.auth import Authenticator
from traitlets import Unicode, Dict
from org import UserInfo
class VaultAuthenticator(Authenticator):
vault_addr = Unicode('netaddr', help="address of the remote vault instance", config=True)
vault_token_cookie = Unicode('vault_token', help="name of the cookie containing the vault token", config=True)
def _user_info(self, user: UserInfo) -> dict:
return {
'name': user.username.lower(),
'auth_state': user.as_json()
}
async def authenticate(self, handler, data):
if auth_token := handler.get_cookie(self.vault_token_cookie):
if user := UserInfo.from_token(auth_token, self.vault_addr):
if user.is_valid():
self.log.info(f'AUTH USER: {user.name} {user.token[:5] + "*"*len(user.token[5:-5]) + user.token[-5:]} until {user.expire_at}')
return self._user_info(user)
# not logged or not authorized
handler.redirect('auth.domain.tld/user')
def is_admin(self, handler, authentication) -> bool:
"No admin here"
return False
async def refresh_user(self, user, handler=None) -> bool:
"""disconnect the user if her vault token has expired
Returns
True -- user is valid as-is
dict -- user needs update with the following values
False -- user needs to login again
"""
assert handler, handler
auth_token, vaultuser = None, None
if auth_token := handler.get_cookie(self.vault_token_cookie):
if vaultuser := UserInfo.from_token(auth_token, self.vault_addr):
if vaultuser.is_valid():
self.log.info(f"USER {user.name} IS OK: {vaultuser}")
state = await user.get_auth_state()
assert isinstance(state, dict), state
curr_token = state.get('token')
if curr_token == auth_token:
self.log.info(f"USER token did not change and is still valid ; everything is ok")
return True
else:
self.log.info(f"USER token has changed ; will update user now")
return self._user_info(vaultuser)
self.log.debug(f"USER {user.name} IS INVALID: {auth_token}, {vaultuser}, {vaultuser.is_valid() if vaultuser else None}")
return False
jupyterhub_config.py
c = get_config() #noqa
c.JupyterHub.ip = 'localhost'
c.JupyterHub.port = 6910
c.JupyterHub.hub_port = 6911
c.JupyterHub.base_url = '/'
c.ConfigurableHTTPProxy.api_url = 'http://127.0.0.1:6912'
c.JupyterHub.named_server_limit_per_user = 1
c.JupyterHub.oauth_token_expires_in = 1 # TODO: cannot set less ?
c.Application.log_level = 'DEBUG'
c.Spawner.oauth_client_allowed_scopes = ['self']
c.Authenticator.delete_invalid_users = True
import os
os.environ['JUPYTERHUB_CRYPT_KEY'] = 'd5747730725530898054802519259896211519792c9a8f0ce5d27e3f229001a4'
c.JupyterHub.cookie_secret = 'd5747730725530898054802519259896211519792c9a8f0ce5d27e3f229001a5'
c.JupyterHub.cookie_max_age_days = 0.2
c.ConfigurableHTTPProxy.auth_token = 'd5747730725530898054802519259896211519792c9a8f0ce5d27e3f229001a6'
c.JupyterHub.authenticator_class = 'org.VaultAuthenticator'
c.Authenticator.allow_all = True
c.Authenticator.refresh_pre_spawn = True # force login at spawn
c.Authenticator.auto_login = True
c.Authenticator.enable_auth_state = True
c.Authenticator.vault_addr = 'https://auth.domain.tld'
c.Authenticator.vault_token_cookie = 'org-id-token-data'
c.Authenticator.auth_page = 'https://auth.domain.tld/user'
c.Authenticator.auth_refresh_age = 3600*4 # number of seconds between two refresh_user call to ensure authentication # TODO: not working really ; refresh_user got called only if user is visiting /hub :/
c.Authenticator.username_pattern = '[0-9]{4}user'
c.Authenticator.base_url = c.JupyterHub.base_url # we need that info, and couldn't find it elsewhere TODO: find it dynamically
c.NotebookApp.password = ''
c.NotebookApp.token = ''
############### SPAWNER ###############
c.JupyterHub.spawner_class = 'org.PodmanSpawner'
c.Spawner.base_url = c.JupyterHub.base_url # we need that info, and couldn't find it elsewhere TODO: find it dynamically
c.Spawner.image = 'gitlab-registry.domain.tld/org/notebook'
c.Spawner.http_timeout = 30
Logs regarding the starting of the jupyter notebook
Here is an example of the podman (docker-like) command than run a notebook server :
[D 2024-07-19 13:43:52.163 JupyterHub scopes:1010] Checking access to /hub/spawn/0136user via scope servers!server=0136user/
[D 2024-07-19 13:43:52.164 JupyterHub pages:216] Triggering spawn with default options for 0136user
[D 2024-07-19 13:43:52.164 JupyterHub base:411] Refreshing auth for 0136user
[I 2024-07-19 13:43:52.197 JupyterHub vaultauth:91] USER 0136user IS OK: <0136user is user until 2024-07-19T17:32:47.187448+02:00>
[I 2024-07-19 13:43:52.198 JupyterHub vaultauth:96] USER token did not change and is still valid ; everything is ok
[D 2024-07-19 13:43:52.198 JupyterHub base:1095] Initiating spawn for 0136user
[D 2024-07-19 13:43:52.198 JupyterHub base:1099] 0/100 concurrent spawns
[D 2024-07-19 13:43:52.198 JupyterHub base:1104] 1 active servers
[I 2024-07-19 13:43:52.205 JupyterHub provider:661] Creating oauth client jupyterhub-user-0136user
[D 2024-07-19 13:43:52.210 JupyterHub vaultauth:61] PRE SPAWN for <User(0136user 1/1 running)> spawning by <podmanspawner.PodmanSpawner object at 0x7f233048c4c0>:
[D 2024-07-19 13:43:52.212 JupyterHub user:912] Calling Spawner.start for 0136user
[I 2024-07-19 13:43:52.213 JupyterHub podmanspawner:222] For user <0136user is user until 2024-07-19T17:32:47.187448+02:00>, starting image gitlab-registry.domain.tld/org/notebook at port 14001, mounting /data/jupyterhub/homes/0136user
[I 2024-07-19 13:43:52.340 JupyterHub podmanspawner:295] Spawning via Podman command: podman run -d --rm --user 1065:1065 --group-add users --security-opt label=disable -v /data/jupyterhub/homes/0136user:/home/jovyan/work:rw,U,Z -w /home/jovyan/work --net host --name jnb-0136user-E4TT1AXQ1PEVZMCA1BP7 --env JUPYTERHUB_API_TOKEN=[secret] --env JPY_API_TOKEN=[secret] --env JUPYTERHUB_CLIENT_ID=jupyterhub-user-0136user --env JUPYTERHUB_COOKIE_OPTIONS={"SameSite": "None", "expires_days": 1} --env JUPYTERHUB_COOKIE_HOST_PREFIX_ENABLED=0 --env JUPYTERHUB_HOST= --env JUPYTERHUB_OAUTH_CALLBACK_URL=/user/0136user/oauth_callback --env JUPYTERHUB_OAUTH_SCOPES=["access:servers!server=0136user/", "access:servers!user=0136user"] --env JUPYTERHUB_OAUTH_ACCESS_SCOPES=["access:servers!server=0136user/", "access:servers!user=0136user"] --env JUPYTERHUB_OAUTH_CLIENT_ALLOWED_SCOPES=["self"] --env JUPYTERHUB_USER=0136user --env JUPYTERHUB_SERVER_NAME= --env JUPYTERHUB_API_URL=http://127.0.0.1:6911/hub/api --env JUPYTERHUB_ACTIVITY_URL=http://127.0.0.1:6911/hub/api/users/0136user/activity --env JUPYTERHUB_BASE_URL=/ --env JUPYTERHUB_SERVICE_PREFIX=/user/0136user/ --env JUPYTERHUB_SERVICE_URL=http://127.0.0.1:14001/user/0136user/ --env JUPYTERHUB_PUBLIC_URL= --env JUPYTERHUB_PUBLIC_HUB_URL= --env LANG=fr_FR.utf8 --env JUPYTER_ENABLE_LAB=yes --env JUPYTER_IMAGE_SPEC=gitlab-registry.domain.tld/org/notebook --cgroup-manager=cgroupfs --memory=8G gitlab-registry.domain.tld/org/notebook start-singleuser.py --ServerApp.port=14001 --ServerApp.base_url=/user/0136user/ --ServerApp.root_dir=/home/jovyan/work --ServerApp.password= --ServerApp.token= --JupyterHub.oauth_token_expires_in=14400 --Authenticator.allowed_user=['0136user']
[I 2024-07-19 13:43:52.486 JupyterHub podmanspawner:310] PodmanSpawner.start cid: c29f3ea29f819312a29016b8a4ddf1e5abc12d9663a18c3a8de2d2a09211d22 at port 14001
[D 2024-07-19 13:43:52.489 JupyterHub spawner:1432] Polling subprocess every 30s
[D 2024-07-19 13:43:52.491 JupyterHub utils:292] Waiting 30s for server at http://127.0.0.1:14001/user/0136user/api
[I 2024-07-19 13:43:53.165 JupyterHub log:192] 302 GET /hub/spawn/0136user -> /hub/spawn-pending/0136user (0136user@10.140.140.42) 1004.87ms
[D 2024-07-19 13:43:53.181 JupyterHub scopes:1010] Checking access to /hub/spawn-pending/0136user via scope servers!server=0136user/
[I 2024-07-19 13:43:53.182 JupyterHub pages:397] 0136user is pending spawn
[I 2024-07-19 13:43:53.182 JupyterHub _xsrf_utils:125] Setting new xsrf cookie for b'c008dc696f8749748cc34b8aa68b6fa7:0bdf8ff10a30456aad36821f551d3f06' {'SameSite': 'None', 'expires_days': 1, 'path': '/hub/'}
[I 2024-07-19 13:43:53.183 JupyterHub log:192] 200 GET /hub/spawn-pending/0136user (0136user@10.140.140.42) 6.32ms
[D 2024-07-19 13:43:53.233 JupyterHub scopes:1010] Checking access to /hub/api/users/0136user/server/progress via scope read:servers!server=0136user/
[I 2024-07-19 13:43:54.243 JupyterHub log:192] 200 GET /hub/api (@127.0.0.1) 0.51ms
nginx.conf
Jupyterhub is served behind a nginx reverse-proxy. Other services are served, but here is the final configuration regarding jupyterhub. Note that jupyterhub and its servers are not expected to go through nginx to discuss between them ; as seen in logs, they directly access the api and instances by their local ports.
http {
sendfile on;
tcp_nopush on;
types_hash_max_size 2048;
include /etc/nginx/mime.types;
default_type application/octet-stream;
access_log /var/log/nginx/access.log;
error_log /var/log/nginx/error.log;
log_format main '$remote_addr - $remote_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" '
'"$http_user_agent" "$http_x_forwarded_for"';
gzip on;
disable_symlinks off;
proxy_headers_hash_max_size 512;
# more security
server_tokens off;
client_body_buffer_size 1k;
client_header_buffer_size 1k;
client_max_body_size 4M;
large_client_header_buffers 2 1k;
add_header X-Frame-Options "SAMEORIGIN";
add_header Strict-Transport-security "max-age=31536000; includeSubdomains; always";
add_header Content-Security-Policy "default-src 'self' http: https: data: blob: 'unsafe-inline' 'unsafe-eval'" always; # cross site scripting protection
add_header X-XSS-Protection "1; mode=block";
add_header Referrer-Policy "orgin-when-cross-origin" always; # Referrer policy
add_header X-Content-Type-Options "nosniff" always; # prevention of MIME confusion-based attacks
proxy_hide_header X-Powered-By;
map $http_upgrade $connection_upgrade {
default upgrade;
'' close;
}
server {
# SSL configuration
listen 443 ssl http2;
listen [::]:443 ssl http2;
include params_ssl;
server_name jupyter.domain.tld;
client_max_body_size 0;
proxy_buffering off;
# proxy headers
proxy_set_header Host $http_host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_http_version 1.1;
proxy_read_timeout 86400;
# websocket headers
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection $connection_upgrade;
proxy_set_header X-Scheme $scheme;
ssl_stapling on;
ssl_stapling_verify on;
location / {
proxy_pass http://localhost:3019;
}
}
server {
listen 80;
server_name jupyter.domain.tld;
return 302 https://$host$request_uri;
}
}
Other info
Reproduced with :
- python 3.9
- jupyterhub5.0.0 (and latest 4.x.x version before)
- RHEL9
Bibliography
(only two links total autorized, so it will be quick)
- explanations on jupyterhub cookies
- Some leads in the official doc, notably the NO_PROXY envvar, which didn’t changed anything in my case
Thank you very much for your time ; any help is greatly appreciated