Jupyterhub Proxy and Hub Key

We’re running Jupyterhub in EKS, and find deploying updates or configuration changes to Jupyterhub diffuclt. When Jupyterhub goes down, it ends up in CrashLoopBackOff because it won’t has trouble communicating with the Proxy Pod. This also causes problems if the jupyterhub service goes down and an infrastructure engineer isn’t around to knock over the Proxy pod as well.

  1. Would someone be able to point me to documentation or otherwise explain what is going on such that the Hub and Proxy pods need to go down and up together?
  2. I assume that the Proxy is generating a key that the Hub fetches? Is it possible to hard code what this key is with environment variables? This would allow us to use external-secrets to store the value in SSM, and then set the value in both pods which would make life so much easier.

Thank you!!

You can run JupyterHub and Proxy as separate services. Here are the docs

1 Like

To add more detail, in general restarting the hub and proxy separately should be fine, if these config values are stable:

  1. the auth token, by default read from $CONFIGPROXY_AUTH_TOKEN env in both the proxy and hub
  2. the URL for the proxy API (e.g. the hostname of the proxy service)

I’m not really sure how you have them running separately where it works sometimes but these aren’t the case already, but perhaps some detailed error logs would help, if the above notes aren’t sufficient to figure it out.

I assume that the Proxy is generating a key that the Hub fetches?

Not quite. If the proxy api key is not specified, the hub will generate a key and use it. This only works if the proxy is launched as a subprocess of the Hub, though. If CONFIGRPOXY_AUTH_TOKEN is specified.

Are you using the jupyterhub helm chart? If so, it ought to Just Work®. If it’s not, that seems like a bug in the chart (make sure to share some config, chart version, and info about your deployment). To specify the proxy token in the helm chart config, that’s proxy.secretToken:

proxy:
  secretToken: "some-secret-value"

which sets $CONFIGPROXY_AUTH_TOKEN for both the hub and proxy.

Wow this is so incredibly helpful! Thank you for the explanation. Looking forward to setting that environment variable ASAP.

We are using the helm chart and are running the Proxy and the Hub separately.

I guess that when just the hub goes down, theres a mismatch between the proxy and the hub. This causes the hub to go down in order to try and get a new token that matches what the proxy has. Unfortunately, this causes a CrashLoopBackOff on the Hub. However, that environment variable will allow us to ensure that both the hub and the proxy always use the same token, which will come from SSM and external-secrets.

It would be helpful to see the errors causing the CrashLoopBackoff so we can know if it’s the URL that’s wrong or the token.

The default token lookup is here, which means the default behavior is to:

  1. generate a value if there isn’t one already defined, and
  2. look up previously generated value (so it’s value is stable for deployments after the first)

That lookup doesn’t work for all deployment tools, and some weird ones that don’t allow helm access to the existing state will regenerate the token on each deploy. In those cases (I still don’t fully understand what they are), you must specify the proxy.secretToken (aka hub.config.ConfigurableHTTPProxy.auth_token).

But if that were the issue, I’d expect the same errors to occur on deploy, not just when the hub pod might restart due to a crash, since the env value does still get set for both pods from the same source on each helm upgrade and the value is in a secret and set in both pods.

2 Likes

Hmm. The issue does not seem to be occurring suddenly, which is odd as I was encountering it yesterday. Currently configuring the GoogleOAuthenticator GBAC, so will share the errors and logs if I encounter them again. Perhaps this is because I haven’t really done a helm deploy in my testing trying to re-produce the issue. I bet that’s why I wasn’t able to force the behavior.

This is what I get when I update the argo-cd application which is managing jupyterhub:

Going to the hub service gives simple HTML Service Unavailable.

The logs then reveal that this is a proxy issue:

$ kubectl logs hub-5fb54c5f7-rrvvn
Loading /usr/local/etc/jupyterhub/secret/values.yaml
No config at /usr/local/etc/jupyterhub/existing-secret/values.yaml
Loading /usr/local/etc/jupyterhub/jupyterhub_config.d config: foo_jupyterhub_config.py
[I 2023-11-14 18:49:53.066 JupyterHub app:2859] Running JupyterHub version 4.0.2
[I 2023-11-14 18:49:53.066 JupyterHub app:2889] Using Authenticator: oauthenticator.google.GoogleOAuthenticator-16.1.1
[I 2023-11-14 18:49:53.066 JupyterHub app:2889] Using Spawner: builtins.FooSpawner
[I 2023-11-14 18:49:53.066 JupyterHub app:2889] Using Proxy: jupyterhub.proxy.ConfigurableHTTPProxy-4.0.2
[I 2023-11-14 18:49:53.259 JupyterHub app:1984] Not using allowed_users. Any authenticated user will be allowed.
[I 2023-11-14 18:49:53.370 JupyterHub provider:661] Updating oauth client service-announcement
[I 2023-11-14 18:49:53.412 JupyterHub provider:661] Updating oauth client service-gallery
[W 2023-11-14 18:49:53.506 JupyterHub app:2726] Allowing service announcement to complete OAuth without confirmation on an authorization web page
[W 2023-11-14 18:49:53.506 JupyterHub app:2726] Allowing service gallery to complete OAuth without confirmation on an authorization web page
[I 2023-11-14 18:49:53.521 JupyterHub app:2928] Initialized 0 spawners in 0.011 seconds
[I 2023-11-14 18:49:53.536 JupyterHub metrics:278] Found 0 active users in the last ActiveUserPeriods.twenty_four_hours
[I 2023-11-14 18:49:53.537 JupyterHub metrics:278] Found 0 active users in the last ActiveUserPeriods.seven_days
[I 2023-11-14 18:49:53.539 JupyterHub metrics:278] Found 0 active users in the last ActiveUserPeriods.thirty_days
[I 2023-11-14 18:49:53.539 JupyterHub app:3142] Not starting proxy
[E 2023-11-14 18:49:53.541 JupyterHub proxy:906] api_request to proxy failed: HTTP 403: Forbidden
[E 2023-11-14 18:49:53.541 JupyterHub app:3382]
    Traceback (most recent call last):
      File "/usr/local/lib/python3.9/site-packages/jupyterhub/app.py", line 3380, in launch_instance_async
        await self.start()
      File "/usr/local/lib/python3.9/site-packages/jupyterhub/app.py", line 3146, in start
        await self.proxy.get_all_routes()
      File "/usr/local/lib/python3.9/site-packages/jupyterhub/proxy.py", line 946, in get_all_routes
        resp = await self.api_request('', client=client)
      File "/usr/local/lib/python3.9/site-packages/jupyterhub/proxy.py", line 910, in api_request
        result = await exponential_backoff(
      File "/usr/local/lib/python3.9/site-packages/jupyterhub/utils.py", line 221, in exponential_backoff
        ret = await maybe_future(pass_func(*args, **kwargs))
      File "/usr/local/lib/python3.9/site-packages/jupyterhub/proxy.py", line 893, in _wait_for_api_request
        return await client.fetch(req)
    tornado.httpclient.HTTPClientError: HTTP 403: Forbidden

Will update with the environment variable and see if the situation improves.

Forgot to add, the 403 errors are resolved once the proxy pod is deleted.

That definitely sounds like the proxy token is being changed (e.g. on a helm deploy if the persistent generated value logic doesn’t work).

I would expect setting proxy.secretToken explicitly to fix this.

How are you deploying upgrades? Is it a straightforward helm upgrade?

We’re deploying via the helm chart with ArgoCD.

proxy.secretToken is the same as configuring CONFIGPROXY_AUTH_TOKEN, correct?

I think it’s a known issue that helm state-retrieval doesn’t work with ArgoCD, so values will get regenerated over and over again, so you have to set the value explicitly.

Yes, setting proxy.secretToken is what ultimately sets CONFIGPROXY_AUTH_TOKEN. That’s just where the helm chart gets the value, so that’s where you should put it.