We’re running Jupyterhub in EKS, and find deploying updates or configuration changes to Jupyterhub diffuclt. When Jupyterhub goes down, it ends up in CrashLoopBackOff because it won’t has trouble communicating with the Proxy Pod. This also causes problems if the jupyterhub service goes down and an infrastructure engineer isn’t around to knock over the Proxy pod as well.
Would someone be able to point me to documentation or otherwise explain what is going on such that the Hub and Proxy pods need to go down and up together?
I assume that the Proxy is generating a key that the Hub fetches? Is it possible to hard code what this key is with environment variables? This would allow us to use external-secrets to store the value in SSM, and then set the value in both pods which would make life so much easier.
To add more detail, in general restarting the hub and proxy separately should be fine, if these config values are stable:
the auth token, by default read from $CONFIGPROXY_AUTH_TOKEN env in both the proxy and hub
the URL for the proxy API (e.g. the hostname of the proxy service)
I’m not really sure how you have them running separately where it works sometimes but these aren’t the case already, but perhaps some detailed error logs would help, if the above notes aren’t sufficient to figure it out.
I assume that the Proxy is generating a key that the Hub fetches?
Not quite. If the proxy api key is not specified, the hub will generate a key and use it. This only works if the proxy is launched as a subprocess of the Hub, though. If CONFIGRPOXY_AUTH_TOKEN is specified.
Are you using the jupyterhub helm chart? If so, it ought to Just Work®. If it’s not, that seems like a bug in the chart (make sure to share some config, chart version, and info about your deployment). To specify the proxy token in the helm chart config, that’s proxy.secretToken:
proxy:
secretToken: "some-secret-value"
which sets $CONFIGPROXY_AUTH_TOKEN for both the hub and proxy.
Wow this is so incredibly helpful! Thank you for the explanation. Looking forward to setting that environment variable ASAP.
We are using the helm chart and are running the Proxy and the Hub separately.
I guess that when just the hub goes down, theres a mismatch between the proxy and the hub. This causes the hub to go down in order to try and get a new token that matches what the proxy has. Unfortunately, this causes a CrashLoopBackOff on the Hub. However, that environment variable will allow us to ensure that both the hub and the proxy always use the same token, which will come from SSM and external-secrets.
It would be helpful to see the errors causing the CrashLoopBackoff so we can know if it’s the URL that’s wrong or the token.
The default token lookup is here, which means the default behavior is to:
generate a value if there isn’t one already defined, and
look up previously generated value (so it’s value is stable for deployments after the first)
That lookup doesn’t work for all deployment tools, and some weird ones that don’t allow helm access to the existing state will regenerate the token on each deploy. In those cases (I still don’t fully understand what they are), you must specify the proxy.secretToken (aka hub.config.ConfigurableHTTPProxy.auth_token).
But if that were the issue, I’d expect the same errors to occur on deploy, not just when the hub pod might restart due to a crash, since the env value does still get set for both pods from the same source on each helm upgrade and the value is in a secret and set in both pods.
Hmm. The issue does not seem to be occurring suddenly, which is odd as I was encountering it yesterday. Currently configuring the GoogleOAuthenticator GBAC, so will share the errors and logs if I encounter them again. Perhaps this is because I haven’t really done a helm deploy in my testing trying to re-produce the issue. I bet that’s why I wasn’t able to force the behavior.
This is what I get when I update the argo-cd application which is managing jupyterhub:
Going to the hub service gives simple HTML Service Unavailable.
The logs then reveal that this is a proxy issue:
$ kubectl logs hub-5fb54c5f7-rrvvn
Loading /usr/local/etc/jupyterhub/secret/values.yaml
No config at /usr/local/etc/jupyterhub/existing-secret/values.yaml
Loading /usr/local/etc/jupyterhub/jupyterhub_config.d config: foo_jupyterhub_config.py
[I 2023-11-14 18:49:53.066 JupyterHub app:2859] Running JupyterHub version 4.0.2
[I 2023-11-14 18:49:53.066 JupyterHub app:2889] Using Authenticator: oauthenticator.google.GoogleOAuthenticator-16.1.1
[I 2023-11-14 18:49:53.066 JupyterHub app:2889] Using Spawner: builtins.FooSpawner
[I 2023-11-14 18:49:53.066 JupyterHub app:2889] Using Proxy: jupyterhub.proxy.ConfigurableHTTPProxy-4.0.2
[I 2023-11-14 18:49:53.259 JupyterHub app:1984] Not using allowed_users. Any authenticated user will be allowed.
[I 2023-11-14 18:49:53.370 JupyterHub provider:661] Updating oauth client service-announcement
[I 2023-11-14 18:49:53.412 JupyterHub provider:661] Updating oauth client service-gallery
[W 2023-11-14 18:49:53.506 JupyterHub app:2726] Allowing service announcement to complete OAuth without confirmation on an authorization web page
[W 2023-11-14 18:49:53.506 JupyterHub app:2726] Allowing service gallery to complete OAuth without confirmation on an authorization web page
[I 2023-11-14 18:49:53.521 JupyterHub app:2928] Initialized 0 spawners in 0.011 seconds
[I 2023-11-14 18:49:53.536 JupyterHub metrics:278] Found 0 active users in the last ActiveUserPeriods.twenty_four_hours
[I 2023-11-14 18:49:53.537 JupyterHub metrics:278] Found 0 active users in the last ActiveUserPeriods.seven_days
[I 2023-11-14 18:49:53.539 JupyterHub metrics:278] Found 0 active users in the last ActiveUserPeriods.thirty_days
[I 2023-11-14 18:49:53.539 JupyterHub app:3142] Not starting proxy
[E 2023-11-14 18:49:53.541 JupyterHub proxy:906] api_request to proxy failed: HTTP 403: Forbidden
[E 2023-11-14 18:49:53.541 JupyterHub app:3382]
Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/jupyterhub/app.py", line 3380, in launch_instance_async
await self.start()
File "/usr/local/lib/python3.9/site-packages/jupyterhub/app.py", line 3146, in start
await self.proxy.get_all_routes()
File "/usr/local/lib/python3.9/site-packages/jupyterhub/proxy.py", line 946, in get_all_routes
resp = await self.api_request('', client=client)
File "/usr/local/lib/python3.9/site-packages/jupyterhub/proxy.py", line 910, in api_request
result = await exponential_backoff(
File "/usr/local/lib/python3.9/site-packages/jupyterhub/utils.py", line 221, in exponential_backoff
ret = await maybe_future(pass_func(*args, **kwargs))
File "/usr/local/lib/python3.9/site-packages/jupyterhub/proxy.py", line 893, in _wait_for_api_request
return await client.fetch(req)
tornado.httpclient.HTTPClientError: HTTP 403: Forbidden
Will update with the environment variable and see if the situation improves.
I think it’s a known issue that helm state-retrieval doesn’t work with ArgoCD, so values will get regenerated over and over again, so you have to set the value explicitly.
Yes, setting proxy.secretToken is what ultimately sets CONFIGPROXY_AUTH_TOKEN. That’s just where the helm chart gets the value, so that’s where you should put it.
Picking this back up after having gotten caught up with other stuff.
You are correct, configuring proxy.secretToken did do the job!
I wonder if proxySecret could also be accompanied by something such as proxySecretRef? This would work as follows:
... # omitted for brevity
proxy:
secretTokenRef:
valueFrom:
secretKeyRef:
name: hub-secrets # created by the user outside of the helm chart
key: hub.config.proxy-auth-token
It would be nice to be able to utilize a static Proxy token that isn’t plaintext (and committed to git).
As I understand things currently, there’s no way to skip the lookup behavior in favor of a secret (as far as I can tell looking at the templates).
This is where the lookup occurs:
And then this is where the hub configures the CONFIGPROXY_AUTH_TOKEN environment variable (the logic is equivalent for the proxy):
So then the presence of a proxy.secretTokenRef would cause that specific lookup not to occur, and would instead configure CONFIGPROXY_AUTH_TOKEN to be set from the referenced secret.