Liveness & readiness probes failed z2jh

Hello,

I deployed z2jh on a multi-nodes k3s cluster (one master and one worker nodes). My hub pod gives the following error in the k describe pod hub-.id:

Events:
  Type     Reason            Age                From               Message
  ----     ------            ----               ----               -------
  Warning  FailedScheduling  71s                default-scheduler  0/2 nodes are available: pod has unbound immediate PersistentVolumeClaims. preemption: 0/2 nodes are available: 2 Preemption is not helpful for scheduling..
  Normal   Scheduled         70s                default-scheduler  Successfully assigned default/hub-56599f97f9-t46qz to k3s-worker-01
  Normal   Pulled            55s                kubelet            Successfully pulled image "bpfrd/nbgrader-hub:latest" in 15.577s (15.577s including waiting)
  Normal   Pulling           11s (x2 over 70s)  kubelet            Pulling image "bpfrd/nbgrader-hub:latest"
  Normal   Pulled            10s                kubelet            Successfully pulled image "bpfrd/nbgrader-hub:latest" in 936ms (937ms including waiting)
  Normal   Created           10s (x2 over 55s)  kubelet            Created container hub
  Normal   Started           10s (x2 over 55s)  kubelet            Started container hub
  Warning  Unhealthy         0s (x4 over 40s)   kubelet            Readiness probe failed: Get "http://10.42.1.7:8081/hub/health": dial tcp 10.42.1.7:8081: connect: connection refused
  Warning  Unhealthy         0s (x4 over 40s)   kubelet            Liveness probe failed: Get "http://10.42.1.7:8081/hub/health": dial tcp 10.42.1.7:8081: connect: connection refused

I already opened the below ips/ports on all nodes which include the ip- specified in the liveness & readiness probes:

sudo firewall-cmd --permanent --add-port=6443/tcp #apiserver
sudo firewall-cmd --permanent --add-port=10250/tcp #metrics
sudo firewall-cmd --permanent --zone=trusted --add-source=10.42.0.0/16 #pods
sudo firewall-cmd --permanent --zone=trusted --add-source=10.43.0.0/16 #services
sudo firewall-cmd --reload

Why do I get this error?

Here is also the output of ‘k logs hub-id’:

Initialized 0 spawners in 0.004 seconds
[I 2024-01-24 16:51:07.436 JupyterHub metrics:278] Found 0 active users in the last ActiveUserPeriods.twenty_four_hours
[I 2024-01-24 16:51:07.437 JupyterHub metrics:278] Found 0 active users in the last ActiveUserPeriods.seven_days
[I 2024-01-24 16:51:07.438 JupyterHub metrics:278] Found 0 active users in the last ActiveUserPeriods.thirty_days
[I 2024-01-24 16:51:07.438 JupyterHub app:3142] Not starting proxy
[D 2024-01-24 16:51:07.439 JupyterHub proxy:880] Proxy: Fetching GET http://proxy-api:8001/api/routes
[W 2024-01-24 16:51:27.461 JupyterHub proxy:899] api_request to the proxy failed with status code 599, retrying...
[W 2024-01-24 16:51:47.591 JupyterHub proxy:899] api_request to the proxy failed with status code 599, retrying...
[E 2024-01-24 16:51:47.591 JupyterHub app:3382]
    Traceback (most recent call last):
      File "/usr/local/lib/python3.11/site-packages/jupyterhub/app.py", line 3380, in launch_instance_async
        await self.start()
      File "/usr/local/lib/python3.11/site-packages/jupyterhub/app.py", line 3146, in start
        await self.proxy.get_all_routes()
      File "/usr/local/lib/python3.11/site-packages/jupyterhub/proxy.py", line 946, in get_all_routes
        resp = await self.api_request('', client=client)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/usr/local/lib/python3.11/site-packages/jupyterhub/proxy.py", line 910, in api_request
        result = await exponential_backoff(
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/usr/local/lib/python3.11/site-packages/jupyterhub/utils.py", line 237, in exponential_backoff
        raise asyncio.TimeoutError(fail_message)
    TimeoutError: Repeated api_request to proxy path "" failed.

[D 2024-01-24 16:51:47.593 JupyterHub application:1028] Exiting application: jupyterhub

and kubectl logs proxy-79bf6b858f-xpgsn:

16:41:32.881 [ConfigProxy] info: Adding route / -> http://hub:8081
16:41:32.904 [ConfigProxy] info: Proxying http://*:8000 to http://hub:8081
16:41:32.904 [ConfigProxy] info: Proxy API at http://*:8001/api/routes
16:41:32.906 [ConfigProxy] info: Route added / -> http://hub:8081

Does anybody have any idea?

best

the pods are all running on the worker node and I have no network polices:

kubectl get po -o wide
NAME                                               READY   STATUS             RESTARTS      AGE    IP           NODE            NOMINATED NODE   READINESS GATES
nfs-subdir-external-provisioner-5f68c587bf-r4xxh   1/1     Running            0             34m    10.42.1.5    k3s-worker-01   <none>           <none>
proxy-79bf6b858f-7zjr9                             1/1     Running            0             100s   10.42.1.10   k3s-worker-01   <none>           <none>
hub-7558f84589-x25gf                               0/1     CrashLoopBackOff   1 (10s ago)   100s   10.42.1.11   k3s-worker-01   <none>           <none>

this sounds like a networking problem at the cluster level, I’m not sure there’s much we can do to help debug custom deployments of kubernetes itself. All I can say is that the hub pod cannot connect to the proxy pod at the specified hostname. You can try to narrow down if it’s DNS or other issues preventing connections between pods, but there’s not a lot we can do from here.

2 Likes

Thanks for your response. The problem is that the k3s multi-node cluster is not installed properly and there’s no connection between pods on different nodes. It persists and apparently, many people have experienced such issues. So, for the time being I will stick to a single-node cluster