Unable to spawn server after login JupyterHub on EKS cluster

Hi
I have installed jupyterhub with my newly created EKS Cluster with helm install, Installation was success and able to get the login page, while login with admin userI’m getting error message like below

Your server is starting up.

You will be redirected automatically when it's ready for you.

100% Complete
Spawn failed: Server at http://10.0.39.146:8888/user/admin/ didn't respond in 30 seconds

Event log
Server requested
2023-10-04T19:22:37.890336Z [Normal] Successfully assigned default/jupyter-admin to ip-10-0-43-55.eu-central-1.compute.internal
2023-10-04T19:22:38Z [Normal] Container image "jupyterhub/k8s-network-tools:3.0.0" already present on machine
2023-10-04T19:22:38Z [Normal] Created container block-cloud-metadata
2023-10-04T19:22:39Z [Normal] Started container block-cloud-metadata
2023-10-04T19:22:39Z [Normal] Container image "jupyterhub/k8s-singleuser-sample:3.0.0" already present on machine
2023-10-04T19:22:39Z [Normal] Created container notebook
2023-10-04T19:22:39Z [Normal] Started container notebook
Spawn failed: Server at http://10.0.39.146:8888/user/admin/ didn't respond in 30 seconds

container logs are:

[I 2023-10-04 19:22:40.877 ServerApp] jupyter_lsp | extension was successfully loaded.
[I 2023-10-04 19:22:40.879 ServerApp] jupyter_server_terminals | extension was successfully loaded.
[I 2023-10-04 19:22:40.897 ServerApp] Skipped non-installed server(s): bash-language-server, dockerfile-language-server-nodejs, javascript-typescript-langserver, jedi-language-server, julia-language-server, pyright, python-language-server, python-lsp-server, r-languageserver, sql-language-server, texlab, typescript-language-server, unified-language-server, vscode-css-languageserver-bin, vscode-html-languageserver-bin, vscode-json-languageserver-bin, yaml-language-server
[E 2023-10-04 19:23:00.898 JupyterHubSingleUser] Failed to connect to my Hub at http://hub:8081/hub/api (attempt 1/5). Is it running?
    Traceback (most recent call last):
      File "/usr/local/lib/python3.11/site-packages/jupyterhub/singleuser/extension.py", line 336, in check_hub_version
        resp = await client.fetch(self.hub_auth.api_url)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    tornado.simple_httpclient.HTTPTimeoutError: Timeout while connecting
[E 2023-10-04 19:23:00.900 ServerApp] Exception in callback functools.partial(<function _HTTPConnection.__init__.<locals>.<lambda> at 0x7f6eff001080>, <Task finished name='Task-3' coro=<_HTTPConnection.run() done, defined at /usr/local/lib/python3.11/site-packages/tornado/simple_httpclient.py:290> exception=gaierror(-3, 'Temporary failure in name resolution')>)
    Traceback (most recent call last):
      File "/usr/local/lib/python3.11/site-packages/tornado/ioloop.py", line 738, in _run_callback
        ret = callback()
              ^^^^^^^^^^
      File "/usr/local/lib/python3.11/site-packages/tornado/simple_httpclient.py", line 287, in <lambda>
        gen.convert_yielded(self.run()), lambda f: f.result()
                                                   ^^^^^^^^^^
      File "/usr/local/lib/python3.11/site-packages/tornado/simple_httpclient.py", line 340, in run
        stream = await self.tcp_client.connect(
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/usr/local/lib/python3.11/site-packages/tornado/tcpclient.py", line 269, in connect
        addrinfo = await self.resolver.resolve(host, port, af)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/usr/local/lib/python3.11/site-packages/tornado/netutil.py", line 433, in resolve
        for fam, _, _, _, address in await asyncio.get_running_loop().getaddrinfo(
                                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/usr/local/lib/python3.11/asyncio/base_events.py", line 867, in getaddrinfo
        return await self.run_in_executor(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/usr/local/lib/python3.11/concurrent/futures/thread.py", line 58, in run
        result = self.fn(*self.args, **self.kwargs)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/usr/local/lib/python3.11/socket.py", line 962, in getaddrinfo
        for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    socket.gaierror: [Errno -3] Temporary failure in name resolution
[C 2023-10-04 19:23:22.283 ServerApp] received signal 15, stopping
[I 2023-10-04 19:23:22.288 ServerApp] Shutting down 7 extensions
[E 2023-10-04 19:23:22.289 JupyterHubSingleUser] Failed to load JupyterHubSingleUser server extension
    Traceback (most recent call last):
      File "/usr/local/lib/python3.11/site-packages/jupyterhub/singleuser/extension.py", line 274, in wrapped
        r = f(self, *args, **kwargs)
            ^^^^^^^^^^^^^^^^^^^^^^^^
      File "/usr/local/lib/python3.11/site-packages/jupyterhub/singleuser/extension.py", line 633, in initialize
        app.io_loop.run_sync(self.check_hub_version)
      File "/usr/local/lib/python3.11/site-packages/tornado/ioloop.py", line 526, in run_sync
        raise TimeoutError("Operation timed out after %s seconds" % timeout)
    TimeoutError: Operation timed out after None seconds

Helm installation
helm upgrade --cleanup-on-fail --install jupyterhub jupyterhub/jupyterhub --version=3.0.0 --values config.yaml

config.yaml

hub:
  config:
    Authenticator:
      admin_users:
        - admin
    JupyterHub:
      admin_access: true
      authenticator_class: dummy
  networkPolicy:
    enabled: false
proxy:
  chp:
    networkPolicy:
      enabled: false
  traefik:
    networkPolicy:
      enabled: false
singleuser:
  startTimeout: 3600
  networkPolicy:
    enabled: false
  serviceAccountName: default

How to fix this issue ? anyone else faced this issue

Seems like the pod where hub is running is not reachable from the single user server pods (where JupyterLab/Notebook is running). You need to check the networking in your EKS cluster.

1 Like

Thanks @mahendrapaipuri for your response. Yes user pod was deploying on my spot instance node group that was the issue. Applied taints on spot instance nodes. All good.

1 Like