Generic.oauthenticator: 500 Internal Server Error

I have Nextcloud configured as an Oauth2 identity provider, allowing my JupyterHub users to authenticate against their Nextcloud accounts using the GenericOauthenticator using a configuration like this one. This system works smoothly for a number of users as they log in, but then inexplicably I start receiving this error on the JupyterHub login page:

500 : Internal Server Error

This can even happen for the same Nextcloud accounts that authenticated properly just minutes before. The Nextcloud log only says something like:

{"reqId":"C3D8JoWxk3Lw6atnul7Y","level":2,"time":"2021-07-04T01:01:03+00:00","remoteAddr":"141.142.216.48","user":"--","app":"core","method":"POST","url":"/apps/oauth2/api/v1/token","message":"Login failed: 'C5UWHxTlbk7RwvDhsPoL8oZ640mORcJq5po6H4s9zyMy9xyOgoLOL0e66VZqUDKp' (Remote IP: '1.2.3.4')","userAgent":"JupyterHub","version":"21.0.2.1"}

The JupyterHub log shows this:

result = await result
File "/usr/local/lib/python3.8/dist-packages/oauthenticator/oauth2.py", line 224, in get
user = await self.login_user()
File "/usr/local/lib/python3.8/dist-packages/jupyterhub/handlers/base.py", line 749, in login_user
authenticated = await self.authenticate(data)
File "/usr/local/lib/python3.8/dist-packages/jupyterhub/auth.py", line 462, in get_authenticated_user
authenticated = await maybe_future(self.authenticate(handler, data))
File "/usr/local/lib/python3.8/dist-packages/oauthenticator/generic.py", line 155, in authenticate
token_resp_json = await self._get_token(http_client, headers, params)
File "/usr/local/lib/python3.8/dist-packages/oauthenticator/generic.py", line 97, in _get_token
resp = await http_client.fetch(req)
tornado.curl_httpclient.CurlError: HTTP 599: Operation timed out after 20000 milliseconds with 0 bytes received

[E 2021-07-04 01:00:58.488 JupyterHub log:173] {
"X-Real-Ip": "2.3.4.5",
"X-Forwarded-Server": "traefik-5684766988-sj49c",
"X-Forwarded-Proto": "https,http",
"X-Forwarded-Port": "443,80",
"X-Forwarded-Host": "jupyter.example.com",
"X-Forwarded-For": "****",
"Upgrade-Insecure-Requests": "1",
"Te": "trailers",
"Sec-Gpc": "1",
"Dnt": "1",
"Cookie": "_fbp=[secret]; OptanonConsent=[secret]; OptanonAlertBoxClosed=[secret]; _ga_F6CG3GSQTQ=[secret]; _ga=[secret]; _xsrf=[secret]; oauthenticator-state=[secret]",
"Accept-Language": "en-US,en;q=0.5",
"Accept-Encoding": "gzip, deflate, br",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8",
"User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:89.0) Gecko/20100101 Firefox/89.0",
"Host": "jupyter.example.com",
"Connection": "close"
}

I have not changed any configuration between when the auth works and when it fails, making this problem particularly challenging to diagnose. Any ideas from the JupyterHub community would be appreciated.

Hi! Could you turn on debug logging in Z2JH and show us the hub logs for the whole login attempt? It would be particularly useful to see the failing URL (you can redact any private info, but use different placeholders for different secrets).

@manics It took over five logins to finally see the failure again. I’m starting to suspect that the Nextcloud or JupyterHub pod is having some kind of networking issue that perhaps is causing a timeout randomly. I saved all the log messages in this gist for you to examine, with a section for all the successful logins followed by a section with the failed login. (By the way, I thought I enabled debug mode but I don’t see any additional log messages so maybe I need to try applying that config again.)

1 Like

I improved the availability of the Nextcloud deployment by scaling to three replicas instead of one, hoping that the Oauth workflow timeout would be solved by having more replicas available to handle requests. Unfortunately the behavior is the same as before.

By the way, scaling from 1 to 3 Nextcloud replicas was non-trivial. I had to enable and configure Redis to handle session management.

I’m starting to suspect a caching issue. After some amount of time elapses, I am able to resume authentication against the Nextcloud server. I have only noticed this by chance after many hours have elapsed overnight. Here is what the successful attempt logs look like for a user that was unable to authenticate last night but can suddenly authenticate without problems this morning. I find it odd that Nextcloud logs show a failed login even when it succeeds.

Nextcloud:

{
    "reqId": "2GFShKmLIEpifLkcEElD",
    "level": 2,
    "time": "2021-07-05T12:56:13+00:00",
    "remoteAddr": "x.y.216.48",
    "user": "--",
    "app": "core",
    "method": "POST",
    "url": "/apps/oauth2/api/v1/token",
    "message": "Login failed: 'C5UWHxTlbk7RwvDhsPoL8oZ640mORcJq5po6H4s9zyMy9xyOgoLOL0e66VZqUDKp' (Remote IP: 'x.y.216.48')",
    "userAgent": "JupyterHub",
    "version": "21.0.2.1"
}

JupyterHub:

[I 2021-07-05 12:56:00.422 JupyterHub log:181] 302 GET /jupyter -> /jupyter/hub/ (@x.y.234.164) 0.94ms
[I 2021-07-05 12:56:00.452 JupyterHub log:181] 302 GET /jupyter/hub/ -> /jupyter/hub/login?next=%2Fjupyter%2Fhub%2F (@x.y.234.164) 0.77ms
[I 2021-07-05 12:56:00.480 JupyterHub log:181] 200 GET /jupyter/hub/login?next=%2Fjupyter%2Fhub%2F (@x.y.234.164) 1.50ms
[I 2021-07-05 12:56:05.614 JupyterHub oauth2:104] OAuth redirect: 'https://gasc.ncsa.illinois.edu/jupyter/hub/oauth_callback'
[I 2021-07-05 12:56:05.615 JupyterHub log:181] 302 GET /jupyter/hub/oauth_login?next=%2Fjupyter%2Fhub%2F -> https://cloud.gasc.ncsa.illinois.edu/apps/oauth2/authorize?response_type=code&redirect_uri=https%3A%2F%2Fgasc.ncsa.illinois.edu%2Fjupyter%2Fhub%2Foauth_callback&client_id=C5UWHxTlbk7RwvDhsPoL8oZ640mORcJq5po6H4s9zyMy9xyOgoLOL0e66VZqUDKp&state=[secret] (@x.y.234.164) 1.67ms
[I 2021-07-05 12:56:08.282 JupyterHub proxy:319] Checking routes
[I 2021-07-05 12:56:14.002 JupyterHub base:757] User logged in: andrea.ghez
[I 2021-07-05 12:56:14.003 JupyterHub log:181] 302 GET /jupyter/hub/oauth_callback?state=[secret]&code=[secret] -> /jupyter/hub/ (@x.y.234.164) 729.31ms
[I 2021-07-05 12:56:14.066 JupyterHub log:181] 302 GET /jupyter/hub/ -> /jupyter/hub/spawn (andrea.ghez@x.y.234.164) 20.72ms
[I 2021-07-05 12:56:14.193 JupyterHub spawner:1875] Attempting to create pvc claim-andrea-2eghez, with timeout 3
[I 2021-07-05 12:56:14.194 JupyterHub log:181] 302 GET /jupyter/hub/spawn -> /jupyter/hub/spawn-pending/andrea.ghez (andrea.ghez@x.y.234.164) 84.84ms
[I 2021-07-05 12:56:14.201 JupyterHub spawner:1887] PVC claim-andrea-2eghez already exists, so did not create new pvc.
[I 2021-07-05 12:56:14.204 JupyterHub spawner:1840] Attempting to create pod jupyter-andrea-2eghez, with timeout 3
[I 2021-07-05 12:56:14.254 JupyterHub pages:402] andrea.ghez is pending spawn
[I 2021-07-05 12:56:14.255 JupyterHub log:181] 200 GET /jupyter/hub/spawn-pending/andrea.ghez (andrea.ghez@x.y.234.164) 11.98ms
[I 2021-07-05 12:56:18.074 JupyterHub log:181] 200 GET /jupyter/hub/api (@10.42.160.24) 0.93ms
[I 2021-07-05 12:56:18.130 JupyterHub log:181] 200 POST /jupyter/hub/api/users/andrea.ghez/activity (andrea.ghez@10.42.160.24) 42.62ms
[I 2021-07-05 12:56:18.297 JupyterHub base:904] User andrea.ghez took 4.177 seconds to start
[I 2021-07-05 12:56:18.297 JupyterHub proxy:257] Adding user andrea.ghez to proxy /jupyter/user/andrea.ghez/ => http://10.42.160.24:8888
[I 2021-07-05 12:56:18.301 JupyterHub users:664] Server andrea.ghez is ready
[I 2021-07-05 12:56:18.302 JupyterHub log:181] 200 GET /jupyter/hub/api/users/andrea.ghez/server/progress (andrea.ghez@x.y.234.164) 3933.52ms
[I 2021-07-05 12:56:18.342 JupyterHub log:181] 302 GET /jupyter/hub/spawn-pending/andrea.ghez -> /jupyter/user/andrea.ghez/ (andrea.ghez@x.y.234.164) 8.70ms
[I 2021-07-05 12:56:18.450 JupyterHub log:181] 302 GET /jupyter/hub/api/oauth2/authorize?client_id=jupyterhub-user-andrea.ghez&redirect_uri=%2Fjupyter%2Fuser%2Fandrea.ghez%2Foauth_callback&response_type=code&state=[secret] -> /jupyter/user/andrea.ghez/oauth_callback?code=[secret]&state=[secret] (andrea.ghez@x.y.234.164) 34.58ms
[I 2021-07-05 12:56:18.539 JupyterHub log:181] 200 POST /jupyter/hub/api/oauth2/token (andrea.ghez@10.42.160.24) 56.86ms
[I 2021-07-05 12:56:18.576 JupyterHub log:181] 200 GET /jupyter/hub/api/authorizations/token/[secret] (andrea.ghez@10.42.160.24) 31.43ms
[I 2021-07-05 12:56:20.508 JupyterHub log:181] 200 GET /jupyter/hub/api/authorizations/token/[secret] (andrea.ghez@10.42.160.24) 29.31ms
[I 2021-07-05 12:56:39.838 JupyterHub log:181] 200 POST /jupyter/hub/api/users/vera.rubin/activity (vera.rubin@10.42.160.20) 43.00ms

In testing this morning I noticed that successive JupyterHub logins (same user logged into Nextcloud, but logging out and then back in to JupyterHub repeatedly) took increasing amounts of time to complete, and the failed after seven successful logins.

This is probably a coincidence, but I also noticed that the response time (in seconds) of the /hub/oauth_callback page loads roughly followed the Fibonacci sequence 1, 1, 2, 3, 5, 8, 13. If somehow this is by design as some kind of throttling mechanism then the next delay would be over 20 seconds, which would explain the 500 error. This seems far-fetched though :slight_smile: I’m clutching at straws here…

It looks like your debug setting didn’t work, you should see a lot more logs. Is this the setting your used: Debugging — Zero to JupyterHub with Kubernetes documentation ?

Are you able to share your Z2JH config with us (with secrets redacted), your Z2JH version, and details about your K8S cluster e.g. is it a standard public cloud offering or your own server.

Do you have any other applications that use Nextflow Oauth in a similar way to JupyterHub?

I did eventually turn debugging on by putting the config at the correct position in the YAML file.

No, I do not. I am hoping that this project will pave the way for more of our web services authenticating to Nextcloud servers.

The k8s cluster is running on a non-public OpenStack cloud computing platform with one controlplane node and three worker nodes, each having 32 cores and 64 GB RAM. The Helm chart in use is v0.11.1
(https://jupyterhub.github.io/helm-chart/). The detailed deployment information is actually captured in a public git repo here. This is a temporary deployment for a student camp occuring next week, after which we will tear it down. However, this configuration will inform our other JupyterHub deployments.

The latest config is confusing, because I had to revert to local accounts manually provisioned due to the lack of time before people need to use the system. Here is the most recent config where Nextcloud Oauth was used.

If you get time a couple of things you could try are:

  • Spin up another Z2JH instance with a different OAuthenticator (e.g. GitHub), if you still have problems that implies the issue is in JupyterHub or Kubernetes
  • Spin up another Z2JH instance also connected to Nextcloud, if both Z2JH deployments have problems at the same time the problem is more likely to be Nextcloud or Kubernetes.