We've been seeing 503s occasionally for user servers

See the attached screenshot. The server shows on the admin screen as running, but a sequence of stop server/start server returns to a 503. It appears all requests to the server (including stop/start/access) are generating 503s. Stopping and restarting the Hub clears the problem, but is there a less drastic solution? And anyplace in the Hub code we should be looking at?

Thanks
Rick

1 Like

I have also experienced this issue whenever a user times out. It is related to Pod Evictions (memory, node shutodowns) - requires manual hub restart! #223 and 503: service unavailable #1528. This is the hub pod log for user jamesbond in the vicinity of this scenario:

[I 2020-07-12 03:37:51.779 JupyterHub log:174] 200 GET /hub/health (@10.3.16.54) 0.77ms
[I 2020-07-12 03:38:01.778 JupyterHub log:174] 200 GET /hub/health (@10.3.16.54) 0.74ms
[I 2020-07-12 03:38:11.778 JupyterHub log:174] 200 GET /hub/health (@10.3.16.54) 0.72ms
[I 2020-07-12 03:38:21.778 JupyterHub log:174] 200 GET /hub/health (@10.3.16.54) 0.74ms
[I 2020-07-12 03:38:31.779 JupyterHub log:174] 200 GET /hub/health (@10.3.16.54) 0.75ms
[I 2020-07-12 03:38:41.779 JupyterHub log:174] 200 GET /hub/health (@10.3.16.54) 1.35ms
[I 2020-07-12 03:38:41.853 JupyterHub proxy:320] Checking routes
[I 2020-07-12 03:38:48.442 JupyterHub log:174] 200 POST /hub/api/users/jamesbond/activity (jamesbond@10.4.17.7) 18.65ms
[I 2020-07-12 03:38:51.778 JupyterHub log:174] 200 GET /hub/health (@10.3.16.54) 1.02ms
[I 2020-07-12 03:39:01.782 JupyterHub log:174] 200 GET /hub/health (@10.3.16.54) 1.94ms
[I 2020-07-12 03:39:11.778 JupyterHub log:174] 200 GET /hub/health (@10.3.16.54) 1.08ms
[I 2020-07-12 03:39:17.269 JupyterHub log:174] 200 GET /hub/api/authorizations/token/[secret] (jamesbond@10.4.17.7) 12.72ms
[I 2020-07-12 03:39:21.778 JupyterHub log:174] 200 GET /hub/health (@10.3.16.54) 1.07ms
[I 2020-07-12 03:39:31.778 JupyterHub log:174] 200 GET /hub/health (@10.3.16.54) 0.72ms
[I 2020-07-12 03:39:41.780 JupyterHub log:174] 200 GET /hub/health (@10.3.16.54) 0.74ms
[I 2020-07-12 03:39:41.825 JupyterHub proxy:320] Checking routes
[I 2020-07-12 03:39:51.779 JupyterHub log:174] 200 GET /hub/health (@10.3.16.54) 1.09ms
[I 2020-07-12 03:40:01.778 JupyterHub log:174] 200 GET /hub/health (@10.3.16.54) 1.05ms
[I 2020-07-12 03:40:11.778 JupyterHub log:174] 200 GET /hub/health (@10.3.16.54) 1.09ms
[I 2020-07-12 03:40:21.779 JupyterHub log:174] 200 GET /hub/health (@10.3.16.54) 1.09ms
[I 2020-07-12 03:40:31.778 JupyterHub log:174] 200 GET /hub/health (@10.3.16.54) 1.03ms
[I 2020-07-12 03:40:41.779 JupyterHub log:174] 200 GET /hub/health (@10.3.16.54) 1.06ms
[I 2020-07-12 03:40:51.779 JupyterHub log:174] 200 GET /hub/health (@10.3.16.54) 0.94ms
[I 2020-07-12 03:41:01.779 JupyterHub log:174] 200 GET /hub/health (@10.3.16.54) 1.10ms
[E 2020-07-12 03:41:01.814 JupyterHub ioloop:763] Exception in callback functools.partial(<bound method IOLoop._discard_future_result of <tornado.platform.asyncio.AsyncIOMainLoop object at 0x7fd09c7f94a8>>, <Task finished coro=<JupyterHub.update_last_activity() done, defined at /usr/local/lib/python3.6/dist-packages/jupyterhub/app.py:2428> exception=HTTP 599: Connection timed out after 20000 milliseconds>)
    Traceback (most recent call last):
      File "/usr/local/lib/python3.6/dist-packages/tornado/ioloop.py", line 743, in _run_callback
        ret = callback()
      File "/usr/local/lib/python3.6/dist-packages/tornado/ioloop.py", line 767, in _discard_future_result
        future.result()
      File "/usr/local/lib/python3.6/dist-packages/jupyterhub/app.py", line 2430, in update_last_activity
        routes = await self.proxy.get_all_routes()
      File "/usr/local/lib/python3.6/dist-packages/jupyterhub/proxy.py", line 806, in get_all_routes
        resp = await self.api_request('', client=client)
      File "/usr/local/lib/python3.6/dist-packages/jupyterhub/proxy.py", line 774, in api_request
        result = await client.fetch(req)
    tornado.curl_httpclient.CurlError: HTTP 599: Connection timed out after 20000 milliseconds

[I 2020-07-12 03:41:11.778 JupyterHub log:174] 200 GET /hub/health (@10.3.16.54) 1.08ms
[I 2020-07-12 03:41:21.778 JupyterHub log:174] 200 GET /hub/health (@10.3.16.54) 1.03ms
[I 2020-07-12 03:41:31.779 JupyterHub log:174] 200 GET /hub/health (@10.3.16.54) 1.23ms
[I 2020-07-12 03:41:38.842 JupyterHub spawner:1866] Deleting pod jupyter-jamesbond
[W 2020-07-12 03:41:39.010 JupyterHub base:1012] User jamesbond server stopped, with exit code: 1
[I 2020-07-12 03:41:39.010 JupyterHub proxy:282] Removing user jamesbond from proxy (/user/jamesbond/)
[I 2020-07-12 03:41:41.778 JupyterHub log:174] 200 GET /hub/health (@10.3.16.54) 1.07ms
[I 2020-07-12 03:41:51.779 JupyterHub log:174] 200 GET /hub/health (@10.3.16.54) 0.72ms
[E 2020-07-12 03:41:59.012 JupyterHub spawner:1138] Unhandled error in poll callback for <kubespawner.spawner.KubeSpawner object at 0x7fd09437ecc0>
    Traceback (most recent call last):
      File "/usr/local/lib/python3.6/dist-packages/jupyterhub/spawner.py", line 1136, in poll_and_notify
        await maybe_future(callback())
      File "/usr/local/lib/python3.6/dist-packages/jupyterhub/handlers/base.py", line 1016, in user_stopped
        await self.proxy.delete_user(user, server_name)
      File "/usr/local/lib/python3.6/dist-packages/jupyterhub/proxy.py", line 283, in delete_user
        await self.delete_route(routespec)
      File "/usr/local/lib/python3.6/dist-packages/jupyterhub/proxy.py", line 787, in delete_route
        await self.api_request(path, method='DELETE')
      File "/usr/local/lib/python3.6/dist-packages/jupyterhub/proxy.py", line 774, in api_request
        result = await client.fetch(req)
    tornado.curl_httpclient.CurlError: HTTP 599: Connection timed out after 20000 milliseconds

[I 2020-07-12 03:42:01.778 JupyterHub log:174] 200 GET /hub/health (@10.3.16.54) 1.07ms
[E 2020-07-12 03:42:01.813 JupyterHub ioloop:763] Exception in callback functools.partial(<bound method IOLoop._discard_future_result of <tornado.platform.asyncio.AsyncIOMainLoop object at 0x7fd09c7f94a8>>, <Task finished coro=<JupyterHub.update_last_activity() done, defined at /usr/local/lib/python3.6/dist-packages/jupyterhub/app.py:2428> exception=HTTP 599: Connection timed out after 20000 milliseconds>)
    Traceback (most recent call last):
      File "/usr/local/lib/python3.6/dist-packages/tornado/ioloop.py", line 743, in _run_callback
        ret = callback()
      File "/usr/local/lib/python3.6/dist-packages/tornado/ioloop.py", line 767, in _discard_future_result
        future.result()
      File "/usr/local/lib/python3.6/dist-packages/jupyterhub/app.py", line 2430, in update_last_activity
        routes = await self.proxy.get_all_routes()
      File "/usr/local/lib/python3.6/dist-packages/jupyterhub/proxy.py", line 806, in get_all_routes
        resp = await self.api_request('', client=client)
      File "/usr/local/lib/python3.6/dist-packages/jupyterhub/proxy.py", line 774, in api_request
        result = await client.fetch(req)
    tornado.curl_httpclient.CurlError: HTTP 599: Connection timed out after 20000 milliseconds

[I 2020-07-12 03:42:11.778 JupyterHub log:174] 200 GET /hub/health (@10.3.16.54) 0.71ms
[I 2020-07-12 03:42:21.778 JupyterHub log:174] 200 GET /hub/health (@10.3.16.54) 1.03ms
[I 2020-07-12 03:42:31.779 JupyterHub log:174] 200 GET /hub/health (@10.3.16.54) 1.09ms
[I 2020-07-12 03:42:41.779 JupyterHub log:174] 200 GET /hub/health (@10.3.16.54) 1.09ms
[I 2020-07-12 03:42:42.486 JupyterHub log:174] 200 GET /hub/api/users (cull-idle@127.0.0.1) 39.39ms
[I 2020-07-12 03:42:51.779 JupyterHub log:174] 200 GET /hub/health (@10.3.16.54) 1.13ms
[I 2020-07-12 03:43:01.779 JupyterHub log:174] 200 GET /hub/health (@10.3.16.54) 1.09ms
[E 2020-07-12 03:43:01.815 JupyterHub ioloop:763] Exception in callback functools.partial(<bound method IOLoop._discard_future_result of <tornado.platform.asyncio.AsyncIOMainLoop object at 0x7fd09c7f94a8>>, <Task finished coro=<JupyterHub.update_last_activity() done, defined at /usr/local/lib/python3.6/dist-packages/jupyterhub/app.py:2428> exception=HTTP 599: Connection timed out after 20000 milliseconds>)
    Traceback (most recent call last):
      File "/usr/local/lib/python3.6/dist-packages/tornado/ioloop.py", line 743, in _run_callback
        ret = callback()
      File "/usr/local/lib/python3.6/dist-packages/tornado/ioloop.py", line 767, in _discard_future_result
        future.result()
      File "/usr/local/lib/python3.6/dist-packages/jupyterhub/app.py", line 2430, in update_last_activity
        routes = await self.proxy.get_all_routes()
      File "/usr/local/lib/python3.6/dist-packages/jupyterhub/proxy.py", line 806, in get_all_routes
        resp = await self.api_request('', client=client)
      File "/usr/local/lib/python3.6/dist-packages/jupyterhub/proxy.py", line 774, in api_request
        result = await client.fetch(req)
    tornado.curl_httpclient.CurlError: HTTP 599: Connection timed out after 20000 milliseconds

[I 2020-07-12 03:43:11.779 JupyterHub log:174] 200 GET /hub/health (@10.3.16.54) 1.15ms
[I 2020-07-12 03:43:21.783 JupyterHub log:174] 200 GET /hub/health (@10.3.16.54) 0.74ms
[I 2020-07-12 03:43:31.778 JupyterHub log:174] 200 GET /hub/health (@10.3.16.54) 1.06ms
[I 2020-07-12 03:43:36.458 JupyterHub log:174] 302 GET /user/jamesbond/metrics?1594525413795 -> /hub/user/jamesbond/metrics?1594525413795 (jamesbond@10.3.16.55) 26.49ms
[I 2020-07-12 03:43:36.556 JupyterHub log:174] 302 GET /hub/user/jamesbond/metrics?1594525413795 -> /user/jamesbond/metrics?1594525413795=&redirects=1 (jamesbond@10.3.16.56) 4.22ms
[I 2020-07-12 03:43:36.624 JupyterHub log:174] 302 GET /user/jamesbond/metrics?1594525413795=&redirects=1 -> /hub/user/jamesbond/metrics?1594525413795=&redirects=1 (jamesbond@10.3.16.54) 2.68ms
[W 2020-07-12 03:43:36.692 JupyterHub base:1487] Redirect loop detected on /hub/user/jamesbond/metrics?1594525413795=&redirects=1
[I 2020-07-12 03:43:38.695 JupyterHub log:174] 302 GET /hub/user/jamesbond/metrics?1594525413795=&redirects=1 -> /user/jamesbond/metrics?redirects=2 (jamesbond@10.3.16.56) 2005.57ms
[I 2020-07-12 03:43:38.751 JupyterHub log:174] 302 GET /user/jamesbond/metrics?redirects=2 -> /hub/user/jamesbond/metrics?redirects=2 (jamesbond@10.3.16.54) 3.26ms
[W 2020-07-12 03:43:38.812 JupyterHub base:1487] Redirect loop detected on /hub/user/jamesbond/metrics?redirects=2
[I 2020-07-12 03:43:41.779 JupyterHub log:174] 200 GET /hub/health (@10.3.16.54) 1.29ms
[I 2020-07-12 03:43:41.817 JupyterHub proxy:320] Checking routes
[I 2020-07-12 03:43:41.818 JupyterHub proxy:400] Adding default route for Hub: / => http://10.4.23.166:8081
[W 2020-07-12 03:43:41.818 JupyterHub proxy:345] Adding missing route for /user/jamesbond/ (Server(url=http://10.4.17.7:8888/user/jamesbond/, bind_url=http://10.4.17.7:8888/user/jamesbond/))
[I 2020-07-12 03:43:41.819 JupyterHub proxy:262] Adding user jamesbond to proxy /user/jamesbond/ => http://10.4.17.7:8888
[I 2020-07-12 03:43:42.814 JupyterHub log:174] 302 GET /hub/user/jamesbond/metrics?redirects=2 -> /user/jamesbond/metrics?redirects=3 (jamesbond@10.3.16.54) 4004.69ms
[I 2020-07-12 03:43:42.942 JupyterHub log:174] 200 GET /hub/error/503?url=%2Fuser%2Fjamesbond%2Fmetrics%3Fredirects%3D3 (@10.4.19.2) 3.79ms
[I 2020-07-12 03:43:43.314 JupyterHub log:174] 200 GET /hub/error/503?url=%2Fuser%2Fjamesbond%2Fapi%2Fcontents%2Fprojects%2Frethinking-numpyro%2Fnotebooks%3Fcontent%3D1%261594525420685 (@10.4.19.2) 1.54ms
[I 2020-07-12 03:43:48.013 JupyterHub log:174] 200 GET /hub/error/503?url=%2Fuser%2Fjamesbond%2Fmetrics%3F1594525425384 (@10.4.19.2) 1.45ms
[I 2020-07-12 03:43:51.779 JupyterHub log:174] 200 GET /hub/health (@10.3.16.54) 1.12ms
[I 2020-07-12 03:43:53.075 JupyterHub log:174] 200 GET /hub/error/503?url=%2Fuser%2Fjamesbond%2Fmetrics%3F1594525430447 (@10.4.19.2) 1.45ms
[I 2020-07-12 03:43:53.400 JupyterHub log:174] 200 GET /hub/error/503?url=%2Fuser%2Fjamesbond%2Fapi%2Fcontents%2Fprojects%2Frethinking-numpyro%2Fnotebooks%3Fcontent%3D1%261594525430754 (@10.4.19.2) 1.42ms

As suggested in the issue referenced above, restarting the hub deployment with

kubectl rollout restart deployment hub

does allow the user to restart their pod via the interface.

Name:               hub
Namespace:          default
CreationTimestamp:  Thu, 02 Jul 2020 21:34:15 -0400
Labels:             app=jupyterhub
                    chart=jupyterhub-0.9.0
                    component=hub
                    heritage=Helm
                    release=jhub

Containers:
   hub:
    Image:      jupyterhub/k8s-hub:0.9.0

Server Version: version.Info{Major:"1", Minor:"16+", GitVersion:"v1.16.10-gke.8", GitCommit:"fd29f95f189cb16dd8fe0bd5dc8b485dfd049d2c", GitTreeState:"clean", BuildDate:"2020-06-23T23:44:30Z", GoVersion:"go1.13.9b4", Compiler:"gc", Platform:"linux/amd64"}

jupyterhub==1.1.0
2 Likes