Starting on OpenShift having errors on Pod startup "[Errno -3] Temporary failure in name resolution"

Hello there,
I have deployed JupyterHub on openshift using helm template apply for kustomized values.yaml. I went through documentation and configured hub, proxy, services and singleuser as necessary and applied required changes for pod and container security context. Deployment was successful and I am able to login as admin user.

But when I attempt to start with Minimal Notebook environment, I am getting following error. I am using ‘Quay’ for singleuser image.

Can you someone share the pointers? Stuck on this issue for last 48 hrs and browsed through all possible online sources.

socket.gaierror: [Errno -3] Temporary failure in name resolution
[E 2024-01-29 23:53:24.252 JupyterHubSingleUser] Failed to connect to my Hub at http://daut-jupyterhub-hub:8081/hub/api (attempt 1/5). Is it running?

Entered start.sh with args: jupyterhub-singleuser
Running hooks in: /usr/local/bin/start-notebook.d as uid: 1001120000 gid: 1001120000
Done running hooks in: /usr/local/bin/start-notebook.d
There is no entry in /etc/passwd for our UID=1001120000. Attempting to fix...
WARNING: unable to fix missing /etc/passwd entry because we don't have write permission. Try setting gid=0 with "--user=1001120000:0".
Running hooks in: /usr/local/bin/before-notebook.d as uid: 1001120000 gid: 1001120000
Sourcing shell script: /usr/local/bin/before-notebook.d/10activate-conda-env.sh
Done running hooks in: /usr/local/bin/before-notebook.d
Executing the command: jupyterhub-singleuser
[W 2024-01-29 23:53:04.057 ServerApp] A `_jupyter_server_extension_points` function was not found in nbclassic. Instead, a `_jupyter_server_extension_paths` function was found and will be used for now. This function name will be deprecated in future releases of Jupyter Server.
[W 2024-01-29 23:53:04.058 ServerApp] A `_jupyter_server_extension_points` function was not found in notebook_shim. Instead, a `_jupyter_server_extension_paths` function was found and will be used for now. This function name will be deprecated in future releases of Jupyter Server.
[I 2024-01-29 23:53:04.059 ServerApp] jupyter_lsp | extension was successfully linked.
[I 2024-01-29 23:53:04.062 ServerApp] jupyter_server_terminals | extension was successfully linked.
[I 2024-01-29 23:53:04.062 JupyterHubSingleUser] Starting jupyterhub single-user server extension version 4.0.2
[I 2024-01-29 23:53:04.062 JupyterHubSingleUser] Using default url from environment $JUPYTERHUB_DEFAULT_URL: /lab
[I 2024-01-29 23:53:04.064 ServerApp] jupyterhub | extension was successfully linked.
[W 2024-01-29 23:53:04.065 LabApp] 'extra_template_paths' was found in both NotebookApp and ServerApp. This is likely a recent change. This config will only be set in NotebookApp. Please check if you should also config these traits in ServerApp for your purpose.
[I 2024-01-29 23:53:04.068 ServerApp] jupyterlab | extension was successfully linked.
[W 2024-01-29 23:53:04.069 NotebookApp] 'extra_template_paths' was found in both NotebookApp and ServerApp. This is likely a recent change. This config will only be set in NotebookApp. Please check if you should also config these traits in ServerApp for your purpose.
[I 2024-01-29 23:53:04.070 ServerApp] nbclassic | extension was successfully linked.
[W 2024-01-29 23:53:04.071 JupyterNotebookApp] 'extra_template_paths' was found in both NotebookApp and ServerApp. This is likely a recent change. This config will only be set in NotebookApp. Please check if you should also config these traits in ServerApp for your purpose.
[I 2024-01-29 23:53:04.073 ServerApp] notebook | extension was successfully linked.
[I 2024-01-29 23:53:04.218 ServerApp] notebook_shim | extension was successfully linked.
[I 2024-01-29 23:53:04.230 ServerApp] notebook_shim | extension was successfully loaded.
[I 2024-01-29 23:53:04.231 ServerApp] jupyter_lsp | extension was successfully loaded.
[I 2024-01-29 23:53:04.232 ServerApp] jupyter_server_terminals | extension was successfully loaded.
[I 2024-01-29 23:53:04.588 ServerApp] Skipped non-installed server(s): bash-language-server, dockerfile-language-server-nodejs, javascript-typescript-langserver, jedi-language-server, julia-language-server, pyright, python-language-server, python-lsp-server, r-languageserver, sql-language-server, texlab, typescript-language-server, unified-language-server, vscode-css-languageserver-bin, vscode-html-languageserver-bin, vscode-json-languageserver-bin, yaml-language-server
[E 2024-01-29 23:53:24.251 ServerApp] Exception in callback functools.partial(<function _HTTPConnection.__init__.<locals>.<lambda> at 0x7fd96f468a40>, <Task finished name='Task-3' coro=<_HTTPConnection.run() done, defined at /opt/conda/lib/python3.11/site-packages/tornado/simple_httpclient.py:290> exception=gaierror(-3, 'Temporary failure in name resolution')>)
    Traceback (most recent call last):
      File "/opt/conda/lib/python3.11/site-packages/tornado/ioloop.py", line 738, in _run_callback
        ret = callback()
              ^^^^^^^^^^
      File "/opt/conda/lib/python3.11/site-packages/tornado/simple_httpclient.py", line 287, in <lambda>
        gen.convert_yielded(self.run()), lambda f: f.result()
                                                   ^^^^^^^^^^
      File "/opt/conda/lib/python3.11/site-packages/tornado/simple_httpclient.py", line 340, in run
        stream = await self.tcp_client.connect(
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/opt/conda/lib/python3.11/site-packages/tornado/tcpclient.py", line 269, in connect
        addrinfo = await self.resolver.resolve(host, port, af)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/opt/conda/lib/python3.11/site-packages/tornado/netutil.py", line 433, in resolve
        for fam, _, _, _, address in await asyncio.get_running_loop().getaddrinfo(
                                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/opt/conda/lib/python3.11/asyncio/base_events.py", line 867, in getaddrinfo
        return await self.run_in_executor(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/opt/conda/lib/python3.11/concurrent/futures/thread.py", line 58, in run
        result = self.fn(*self.args, **self.kwargs)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/opt/conda/lib/python3.11/socket.py", line 962, in getaddrinfo
        for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    socket.gaierror: [Errno -3] Temporary failure in name resolution
[E 2024-01-29 23:53:24.252 JupyterHubSingleUser] Failed to connect to my Hub at http://daut-jupyterhub-hub:8081/hub/api (attempt 1/5). Is it running?
    Traceback (most recent call last):
      File "/opt/conda/lib/python3.11/site-packages/jupyterhub/singleuser/extension.py", line 336, in check_hub_version
        resp = await client.fetch(self.hub_auth.api_url)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    tornado.simple_httpclient.HTTPTimeoutError: Timeout while connecting

Ideas:

  • Test disabling network policy resource for singleuser pods via singleuser.networkPolicy.enabled=false, it can rule out issues with that (do you know if NetworkPolicy resources are enforced in the cluster? By what?)
  • learn and clarify where the in-cluster DNS server resides in your k8s cluster, and if there is some special port or similar involved

Note that k8s version and more details of your k8s cluster setup may be relevant.

Thanks for quick revert. I have disabled networkPolicy for singleuser and redeployed. No success. Following are the network policies I can see it in the namespace applied.

oc get networkpolicies 
NAME                                       POD-SELECTOR                                                           AGE
daut-jupyterhub-hub                        app=daut-jupyterhub,component=hub,release=release-name                 22m
daut-jupyterhub-proxy                      app=daut-jupyterhub,component=proxy,release=release-name               22m
daut-jupyterhub-singleuser                 app=daut-jupyterhub,component=singleuser-server,release=release-name   165m
daut-network-policy-allow-same-namespace   <none>                                                                 2d1h

daut-network-policy-allow-same-namespace.yaml

spec:
  ingress:
  - from:
    - podSelector: {}
  podSelector: {}
  policyTypes:
  - Ingress
status: {}

daut-jupyterhub-hub .yaml

spec:
  egress:
  - ports:
    - port: 8001
      protocol: TCP
    to:
    - podSelector:
        matchLabels:
          app: daut-jupyterhub
          component: proxy
          release: release-name
  - ports:
    - port: 8888
      protocol: TCP
    to:
    - podSelector:
        matchLabels:
          app: daut-jupyterhub
          component: singleuser-server
          release: release-name
  - ports:
    - port: 53
      protocol: UDP
    - port: 53
      protocol: TCP
    to:
    - ipBlock:
        cidr: 169.254.169.254/32
    - namespaceSelector:
        matchLabels:
          kubernetes.io/metadata.name: kube-system
    - ipBlock:
        cidr: 10.0.0.0/8
    - ipBlock:
        cidr: 172.16.0.0/12
    - ipBlock:
        cidr: 192.168.0.0/16
  - to:
    - ipBlock:
        cidr: 0.0.0.0/0
        except:
        - 10.0.0.0/8
        - 172.16.0.0/12
        - 192.168.0.0/16
        - 169.254.169.254/32
  - to:
    - ipBlock:
        cidr: 10.0.0.0/8
    - ipBlock:
        cidr: 172.16.0.0/12
    - ipBlock:
        cidr: 192.168.0.0/16
  - to:
    - ipBlock:
        cidr: 169.254.169.254/32
  ingress:
  - from:
    - podSelector:
        matchLabels:
          hub.jupyter.org/network-access-hub: "true"
    ports:
    - port: http
      protocol: TCP
  - from:
    - podSelector: {}
  podSelector:
    matchLabels:
      app: daut-jupyterhub
      component: hub
      release: release-name
  policyTypes:
  - Ingress
  - Egress

Btw, thank you so much for the clue. I have cleaned up all the network policies that were created from helm.

Applied the workaround provided in one of the solution to temporarily allow communication within the namespace.

spec:
  ingress:
  - from:
    - podSelector: {}
  podSelector: {}
  policyTypes:
  - Ingress
status: {}