Kubernetes - Api_request to the proxy failed with status code 599, retrying

I have a Kubernetes Cluster build using Rancher Kubernetes Engine. I’m trying to deploy JupyterHub in this K8s Cluster via Helm Chart.

My HUB pod is crashing continuously. I’m stuck with this since past 2 days and its just not coming up.

kubectl -n jupyter get pods
NAME                             READY   STATUS             RESTARTS   AGE
hub-59bb8dcf64-2s7m9             0/1     CrashLoopBackOff   6          12m
proxy-66bf4f7f84-rgb8p           1/1     Running            0          12m
user-scheduler-b9774b9fd-cq4kd   1/1     Running            0          12m
user-scheduler-b9774b9fd-n24mw   1/1     Running            0          12m

The error I’m getting is -

kubectl -n jupyter logs hub-59bb8dcf64-2s7m9
No config at /etc/jupyterhub/config/values.yaml
Loading /etc/jupyterhub/secret/values.yaml
[I 2021-02-19 06:38:11.439 JupyterHub app:2349] Running JupyterHub version 1.3.0
[I 2021-02-19 06:38:11.439 JupyterHub app:2379] Using Authenticator: jupyterhub.auth.DummyAuthenticator-1.3.0
[I 2021-02-19 06:38:11.440 JupyterHub app:2379] Using Spawner: kubespawner.spawner.KubeSpawner-0.15.0
[I 2021-02-19 06:38:11.440 JupyterHub app:2379] Using Proxy: jupyterhub.proxy.ConfigurableHTTPProxy-1.3.0
[I 2021-02-19 06:38:11.442 JupyterHub app:1420] Loading cookie_secret from /srv/jupyterhub/jupyterhub_cookie_secret
[W 2021-02-19 06:38:11.487 JupyterHub app:1695] No admin users, admin interface will be unavailable.
[W 2021-02-19 06:38:11.487 JupyterHub app:1696] Add any administrative users to `c.Authenticator.admin_users` in config.
[I 2021-02-19 06:38:11.487 JupyterHub app:1725] Not using allowed_users. Any authenticated user will be allowed.
[I 2021-02-19 06:38:11.570 JupyterHub app:2416] Initialized 0 spawners in 0.002 seconds
[I 2021-02-19 06:38:11.572 JupyterHub app:2628] Not starting proxy
[W 2021-02-19 06:38:14.575 JupyterHub proxy:814] api_request to the proxy failed with status code 599, retrying...
[W 2021-02-19 06:38:17.696 JupyterHub proxy:814] api_request to the proxy failed with status code 599, retrying...
[W 2021-02-19 06:38:20.875 JupyterHub proxy:814] api_request to the proxy failed with status code 599, retrying...
[W 2021-02-19 06:38:23.945 JupyterHub proxy:814] api_request to the proxy failed with status code 599, retrying...
[W 2021-02-19 06:38:27.811 JupyterHub proxy:814] api_request to the proxy failed with status code 599, retrying...
[W 2021-02-19 06:38:32.194 JupyterHub proxy:814] api_request to the proxy failed with status code 599, retrying...
[W 2021-02-19 06:38:39.533 JupyterHub proxy:814] api_request to the proxy failed with status code 599, retrying...
[E 2021-02-19 06:38:39.533 JupyterHub app:2859]
    Traceback (most recent call last):
      File "/usr/local/lib/python3.8/dist-packages/jupyterhub/app.py", line 2857, in launch_instance_async
        await self.start()
      File "/usr/local/lib/python3.8/dist-packages/jupyterhub/app.py", line 2632, in start
        await self.proxy.get_all_routes()
      File "/usr/local/lib/python3.8/dist-packages/jupyterhub/proxy.py", line 861, in get_all_routes
        resp = await self.api_request('', client=client)
      File "/usr/local/lib/python3.8/dist-packages/jupyterhub/proxy.py", line 825, in api_request
        result = await exponential_backoff(
      File "/usr/local/lib/python3.8/dist-packages/jupyterhub/utils.py", line 183, in exponential_backoff
        raise TimeoutError(fail_message)
    TimeoutError: Repeated api_request to proxy path "" failed.

Below are my configs -

helm upgrade --cleanup-on-fail --install jhub jupyterhub/jupyterhub \
		--namespace jupyter  \
		--version=0.11.1 \
		--values values_rke.yaml
proxy:
  secretToken: 'xxxx'
  service:
    type: ClusterIP

prePuller:
  hook:
    enabled: false
  continuous:
    enabled: false
  extraImages: {}

hub:
  strategy:
    type: Recreate
  db:
    pvc:
      storageClassName: rook-ceph-block-ext
  service:
    type: ClusterIP

Can someone help me to debug this. I have been using Z2JH for more than 2yrs on AWS EKS, I never got this issue. I also tested exact same config on AWS EKS, Minikube - its working fine. However, while running the same thing on On-Premise Rancher Kubernetes Platform its failing with the error.

My proxy pod logs are -

kubectl -n jupyter logs proxy-66bf4f7f84-rgb8p
06:29:02.150 [ConfigProxy] info: Adding route / -> http://hub:8081
06:29:02.156 [ConfigProxy] info: Proxying http://:::8000 to http://hub:8081
06:29:02.156 [ConfigProxy] info: Proxy API at http://:::8001/api/routes
06:29:02.159 [ConfigProxy] info: Route added / -> http://hub:8081

Can you run kubectl describe pod ... on your hub and proxy pods?

Does your on-prem rancher come with any non-standard networking configuration such as network controllers with built-in restrictions, or default NetworkPolicies that might interfere?

Thanks so much for replying @manics

I did a describe before too and I realised its failing Readiness and Liveliness probe both.

kubectl -n jupyter describe pod hub-59bb8dcf64-fvk8c
Name:         hub-59bb8dcf64-fvk8c
Namespace:    jupyter
Priority:     0
Node:         bda8/10.77.60.48
Start Time:   Fri, 19 Feb 2021 10:51:40 +0000
Labels:       app=jupyterhub
              component=hub
              hub.jupyter.org/network-access-proxy-api=true
              hub.jupyter.org/network-access-proxy-http=true
              hub.jupyter.org/network-access-singleuser=true
              pod-template-hash=59bb8dcf64
              release=jhub
Annotations:  checksum/config-map: ce8928e0f18c6133a264af378e22ef5e6ca57203ac904e305aef32d64bc55668
              checksum/secret: 71cfe74c6f1f313b9a22d4a4235cb9950ff35b3ebd326d01cf9ec48a4ef45766
              cni.projectcalico.org/podIP: 10.42.23.124/32
              cni.projectcalico.org/podIPs: 10.42.23.124/32
Status:       Running
IP:           10.42.23.124
IPs:
  IP:           10.42.23.124
Controlled By:  ReplicaSet/hub-59bb8dcf64
Containers:
  hub:
    Container ID:  docker://d9ef16ffbfbd25cc293b2e58c7b5cdd3ced48ccc4744e0163ec79acae200b499
    Image:         jupyterhub/k8s-hub:0.11.1
    Image ID:      docker-pullable://jupyterhub/k8s-hub@sha256:b6b4a1a34bf00524533de536eb34c92f9f873e0455ec38526a4cd3d55cccb64e
    Port:          8081/TCP
    Host Port:     0/TCP
    Args:
      jupyterhub
      --config
      /etc/jupyterhub/jupyterhub_config.py
      --upgrade-db
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Fri, 19 Feb 2021 17:19:14 +0000
      Finished:     Fri, 19 Feb 2021 17:19:46 +0000
    Ready:          False
    Restart Count:  72
    Requests:
      cpu:      200m
      memory:   512Mi
    Liveness:   http-get http://:http/hub/health delay=300s timeout=3s period=10s #success=1 #failure=30
    Readiness:  http-get http://:http/hub/health delay=0s timeout=1s period=2s #success=1 #failure=1000
    Environment:
      PYTHONUNBUFFERED:        1
      HELM_RELEASE_NAME:       jhub
      POD_NAMESPACE:           jupyter (v1:metadata.namespace)
      CONFIGPROXY_AUTH_TOKEN:  <set to the key 'proxy.token' in secret 'hub-secret'>  Optional: false
    Mounts:
      /etc/jupyterhub/config/ from config (rw)
      /etc/jupyterhub/jupyterhub_config.py from config (rw,path="jupyterhub_config.py")
      /etc/jupyterhub/secret/ from secret (rw)
      /etc/jupyterhub/z2jh.py from config (rw,path="z2jh.py")
      /srv/jupyterhub from hub-db-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from hub-token-7wbnm (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      hub-config
    Optional:  false
  secret:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  hub-secret
    Optional:    false
  hub-db-dir:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  hub-db-dir
    ReadOnly:   false
  hub-token-7wbnm:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  hub-token-7wbnm
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason     Age                     From     Message
  ----     ------     ----                    ----     -------
  Warning  Unhealthy  25m (x1267 over 6h30m)  kubelet  Readiness probe failed: Get "http://10.42.23.124:8081/hub/health": dial tcp 10.42.23.124:8081: connect: connection refused
  Normal   Pulled     20m (x70 over 6h30m)    kubelet  Container image "jupyterhub/k8s-hub:0.11.1" already present on machine
  Warning  BackOff    12s (x1629 over 6h28m)  kubelet  Back-off restarting failed container
kubectl -n jupyter describe pod proxy-66bf4f7f84-rgb8p
Name:         proxy-66bf4f7f84-rgb8p
Namespace:    jupyter
Priority:     0
Node:         bda9/10.77.60.49
Start Time:   Fri, 19 Feb 2021 06:29:00 +0000
Labels:       app=jupyterhub
              component=proxy
              hub.jupyter.org/network-access-hub=true
              hub.jupyter.org/network-access-singleuser=true
              pod-template-hash=66bf4f7f84
              release=jhub
Annotations:  checksum/hub-secret: 5cf263d47604780f8c85b0a52b156428291b25c291a67e9cd1b33461d94c4dcb
              checksum/proxy-secret: 01ba4719c80b6fe911b091a7c05124b64eeece964e09c058ef8f9805daca546b
              cni.projectcalico.org/podIP: 10.42.25.127/32
              cni.projectcalico.org/podIPs: 10.42.25.127/32
Status:       Running
IP:           10.42.25.127
IPs:
  IP:           10.42.25.127
Controlled By:  ReplicaSet/proxy-66bf4f7f84
Containers:
  chp:
    Container ID:  docker://3ebecddef359ecb01d41c9288f5cfcf89d1c1f7fe05a58e6a2be069008443efc
    Image:         jupyterhub/configurable-http-proxy:4.2.2
    Image ID:      docker-pullable://jupyterhub/configurable-http-proxy@sha256:81bd96729c14110aae677bd603854cab01107be18534d07b97a882e716bcdf7a
    Ports:         8000/TCP, 8001/TCP
    Host Ports:    0/TCP, 0/TCP
    Command:
      configurable-http-proxy
      --ip=::
      --api-ip=::
      --api-port=8001
      --default-target=http://hub:$(HUB_SERVICE_PORT)
      --error-target=http://hub:$(HUB_SERVICE_PORT)/hub/error
      --port=8000
    State:          Running
      Started:      Fri, 19 Feb 2021 06:29:01 +0000
    Ready:          True
    Restart Count:  0
    Requests:
      cpu:      200m
      memory:   512Mi
    Liveness:   http-get http://:http/_chp_healthz delay=60s timeout=1s period=10s #success=1 #failure=3
    Readiness:  http-get http://:http/_chp_healthz delay=0s timeout=1s period=2s #success=1 #failure=3
    Environment:
      CONFIGPROXY_AUTH_TOKEN:  <set to the key 'proxy.token' in secret 'hub-secret'>  Optional: false
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-sbc5t (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             True 
  ContainersReady   True 
  PodScheduled      True 
Volumes:
  default-token-sbc5t:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-sbc5t
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:          <none>

I don’t think in the cluster we use any Non Standard Networking stuff as Rancher Kubernetes Engine is anyhow CNCF certified and its deployed via Rancher Platform (Rancher Docs: Overview of RKE)

It uses CANAL as the Network Provider (CNI) and mostly everything is just disabled as default.

These are the running pods in kube-system namespace

kubectl -n kube-system get pods
NAME                                       READY   STATUS      RESTARTS   AGE
calico-kube-controllers-7fbff695b4-rr2vq   1/1     Running     2          16d
canal-2kv9t                                2/2     Running     0          16d
canal-5qfnj                                2/2     Running     2          16d
canal-6jrd7                                2/2     Running     4          16d
canal-6kw8r                                2/2     Running     0          16d
canal-b99pr                                2/2     Running     0          16d
canal-bjkfs                                2/2     Running     0          16d
canal-ftdts                                2/2     Running     0          15d
canal-kf52p                                2/2     Running     2          16d
canal-ktfbz                                2/2     Running     4          16d
canal-n54gl                                2/2     Running     0          9d
canal-wxn2j                                2/2     Running     0          16d
canal-xdkjb                                2/2     Running     0          16d
coredns-6f85d5fb88-24h5x                   1/1     Running     0          8m25s
coredns-6f85d5fb88-6wfmn                   1/1     Running     0          8m24s
coredns-6f85d5fb88-gh6qq                   1/1     Running     0          8m25s
coredns-6f85d5fb88-ppnd6                   1/1     Running     0          9d
coredns-6f85d5fb88-s99ww                   1/1     Running     0          8m24s
coredns-6f85d5fb88-vnpg5                   1/1     Running     0          8m25s
coredns-autoscaler-79599b9dc6-8ztbq        1/1     Running     0          8m29s
metrics-server-8449844bf-z4dkv             1/1     Running     1          16d
rke-coredns-addon-deploy-job-fppkj         0/1     Completed   0          8m31s
rke-metrics-addon-deploy-job-9m6tz         0/1     Completed   0          16d
rke-network-plugin-deploy-job-l4244        0/1     Completed   0          16d

I’m not a Networking expert but I see these read timeout errors in Flannel Node container within Canal Pods

kubectl -n kube-system logs canal-n54gl -c kube-flannel
I0210 13:53:07.310566       1 main.go:518] Determining IP address of default interface
I0210 13:53:07.310848       1 main.go:531] Using interface with name ens160 and address 10.77.30.114
I0210 13:53:07.310863       1 main.go:548] Defaulting external address to interface address (10.77.30.114)
W0210 13:53:07.310921       1 client_config.go:517] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
I0210 13:53:07.321935       1 kube.go:119] Waiting 10m0s for node controller to sync
I0210 13:53:07.322324       1 kube.go:306] Starting kube subnet manager
I0210 13:53:08.324178       1 kube.go:126] Node controller sync successful
I0210 13:53:08.324236       1 main.go:246] Created subnet manager: Kubernetes Subnet Manager - traefik
I0210 13:53:08.324249       1 main.go:249] Installing signal handlers
I0210 13:53:08.324590       1 main.go:390] Found network config - Backend type: vxlan
I0210 13:53:08.324997       1 vxlan.go:121] VXLAN config: VNI=1 Port=0 GBP=false Learning=false DirectRouting=false
I0210 13:53:08.362389       1 main.go:355] Current network or subnet (10.42.0.0/16, 10.42.26.0/24) is not equal to previous one (0.0.0.0/0, 0.0.0.0/0), trying to recycle old iptables rules
I0210 13:53:08.376398       1 iptables.go:167] Deleting iptables rule: -s 0.0.0.0/0 -d 0.0.0.0/0 -j RETURN
I0210 13:53:08.377194       1 iptables.go:167] Deleting iptables rule: -s 0.0.0.0/0 ! -d 224.0.0.0/4 -j MASQUERADE --random-fully
I0210 13:53:08.377955       1 iptables.go:167] Deleting iptables rule: ! -s 0.0.0.0/0 -d 0.0.0.0/0 -j RETURN
I0210 13:53:08.378717       1 iptables.go:167] Deleting iptables rule: ! -s 0.0.0.0/0 -d 0.0.0.0/0 -j MASQUERADE --random-fully
I0210 13:53:08.379639       1 main.go:305] Setting up masking rules
I0210 13:53:08.380186       1 main.go:313] Changing default FORWARD chain policy to ACCEPT
I0210 13:53:08.380318       1 main.go:321] Wrote subnet file to /run/flannel/subnet.env
I0210 13:53:08.380326       1 main.go:325] Running backend.
I0210 13:53:08.380341       1 main.go:343] Waiting for all goroutines to exit
I0210 13:53:08.380413       1 vxlan_network.go:60] watching for new subnet leases
I0210 13:53:08.384685       1 iptables.go:145] Some iptables rules are missing; deleting and recreating rules
I0210 13:53:08.384694       1 iptables.go:167] Deleting iptables rule: -s 10.42.0.0/16 -d 10.42.0.0/16 -j RETURN
I0210 13:53:08.385093       1 iptables.go:145] Some iptables rules are missing; deleting and recreating rules
I0210 13:53:08.385107       1 iptables.go:167] Deleting iptables rule: -s 10.42.0.0/16 -j ACCEPT
I0210 13:53:08.385826       1 iptables.go:167] Deleting iptables rule: -s 10.42.0.0/16 ! -d 224.0.0.0/4 -j MASQUERADE --random-fully
I0210 13:53:08.386228       1 iptables.go:167] Deleting iptables rule: -d 10.42.0.0/16 -j ACCEPT
I0210 13:53:08.386879       1 iptables.go:167] Deleting iptables rule: ! -s 10.42.0.0/16 -d 10.42.26.0/24 -j RETURN
I0210 13:53:08.387572       1 iptables.go:155] Adding iptables rule: -s 10.42.0.0/16 -j ACCEPT
I0210 13:53:08.388086       1 iptables.go:167] Deleting iptables rule: ! -s 10.42.0.0/16 -d 10.42.0.0/16 -j MASQUERADE --random-fully
I0210 13:53:08.389207       1 iptables.go:155] Adding iptables rule: -s 10.42.0.0/16 -d 10.42.0.0/16 -j RETURN
I0210 13:53:08.389894       1 iptables.go:155] Adding iptables rule: -d 10.42.0.0/16 -j ACCEPT
I0210 13:53:08.391620       1 iptables.go:155] Adding iptables rule: -s 10.42.0.0/16 ! -d 224.0.0.0/4 -j MASQUERADE --random-fully
I0210 13:53:08.393469       1 iptables.go:155] Adding iptables rule: ! -s 10.42.0.0/16 -d 10.42.26.0/24 -j RETURN
I0210 13:53:08.394826       1 iptables.go:155] Adding iptables rule: ! -s 10.42.0.0/16 -d 10.42.0.0/16 -j MASQUERADE --random-fully
E0215 21:29:44.935305       1 streamwatcher.go:109] Unable to decode an event from the watch stream: read tcp 10.77.30.114:41060->10.43.0.1:443: read: connection timed out
E0219 09:27:49.735516       1 streamwatcher.go:109] Unable to decode an event from the watch stream: read tcp 10.77.30.114:49652->10.43.0.1:443: read: connection timed out

I just checked my Calico Kubernetes Controller logs and see below error in there. Do you think this has something to do with it ?

kubectl -n kube-system logs calico-kube-controllers-7fbff695b4-rr2vq
2021-02-03 15:33:53.745 [INFO][1] main.go 88: Loaded configuration from environment config=&config.Config{LogLevel:"info", WorkloadEndpointWorkers:1, ProfileWorkers:1, PolicyWorkers:1, NodeWorkers:1, Kubeconfig:"", DatastoreType:"kubernetes"}
W0203 15:33:53.746969       1 client_config.go:543] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
2021-02-03 15:33:53.747 [INFO][1] main.go 109: Ensuring Calico datastore is initialized
2021-02-03 15:33:53.756 [INFO][1] main.go 149: Getting initial config snapshot from datastore
2021-02-03 15:33:53.806 [INFO][1] main.go 152: Got initial config snapshot
2021-02-03 15:33:53.807 [INFO][1] watchersyncer.go 89: Start called
2021-02-03 15:33:53.807 [INFO][1] main.go 169: Starting status report routine
2021-02-03 15:33:53.807 [INFO][1] main.go 402: Starting controller ControllerType="Node"
2021-02-03 15:33:53.807 [INFO][1] node_controller.go 138: Starting Node controller
2021-02-03 15:33:53.807 [INFO][1] watchersyncer.go 127: Sending status update Status=wait-for-ready
2021-02-03 15:33:53.807 [INFO][1] node_syncer.go 40: Node controller syncer status updated: wait-for-ready
2021-02-03 15:33:53.807 [INFO][1] watchersyncer.go 147: Starting main event processing loop
2021-02-03 15:33:53.807 [INFO][1] watchercache.go 174: Full resync is required ListRoot="/calico/resources/v3/projectcalico.org/nodes"
2021-02-03 15:33:53.807 [INFO][1] resources.go 349: Main client watcher loop
2021-02-03 15:33:53.812 [INFO][1] watchercache.go 271: Sending synced update ListRoot="/calico/resources/v3/projectcalico.org/nodes"
2021-02-03 15:33:53.812 [INFO][1] watchersyncer.go 127: Sending status update Status=resync
2021-02-03 15:33:53.812 [INFO][1] node_syncer.go 40: Node controller syncer status updated: resync
2021-02-03 15:33:53.812 [INFO][1] watchersyncer.go 209: Received InSync event from one of the watcher caches
2021-02-03 15:33:53.812 [INFO][1] watchersyncer.go 221: All watchers have sync'd data - sending data and final sync
2021-02-03 15:33:53.812 [INFO][1] watchersyncer.go 127: Sending status update Status=in-sync
2021-02-03 15:33:53.812 [INFO][1] node_syncer.go 40: Node controller syncer status updated: in-sync
2021-02-03 15:33:53.819 [INFO][1] hostendpoints.go 90: successfully synced all hostendpoints
2021-02-03 15:33:53.907 [INFO][1] node_controller.go 151: Node controller is now running
2021-02-03 15:33:53.907 [INFO][1] ipam.go 45: Synchronizing IPAM data
2021-02-03 15:33:53.943 [INFO][1] ipam.go 190: Node and IPAM data is in sync
2021-02-03 15:35:31.390 [INFO][1] watchercache.go 96: Watch channel closed by remote - recreate watcher ListRoot="/calico/resources/v3/projectcalico.org/nodes"
2021-02-03 15:35:31.390 [INFO][1] resources.go 377: Terminating main client watcher loop
2021-02-03 15:35:31.391 [INFO][1] watchercache.go 243: Failed to create watcher ListRoot="/calico/resources/v3/projectcalico.org/nodes" error=Get "https://10.43.0.1:443/api/v1/nodes?resourceVersion=1504&watch=true": dial tcp 10.43.0.1:443: connect: connection refused performFullResync=false
2021-02-03 15:35:31.391 [INFO][1] watchercache.go 174: Full resync is required ListRoot="/calico/resources/v3/projectcalico.org/nodes"
2021-02-03 15:35:31.393 [ERROR][1] customresource.go 136: Error updating resource Key=KubeControllersConfiguration(default) Name="default" Resource="KubeControllersConfigurations" Value=&v3.KubeControllersConfiguration{TypeMeta:v1.TypeMeta{Kind:"KubeControllersConfiguration", APIVersion:"projectcalico.org/v3"}, ObjectMeta:v1.ObjectMeta{Name:"default", GenerateName:"", Namespace:"", SelfLink:"", UID:"baf3b3a3-8aee-4e3a-86e5-5c42f49d8b37", ResourceVersion:"1228", Generation:0, CreationTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:63747963233, loc:(*time.Location)(0x26a4140)}}, DeletionTimestamp:(*v1.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ClusterName:"", ManagedFields:[]v1.ManagedFieldsEntry(nil)}, Spec:v3.KubeControllersConfigurationSpec{LogSeverityScreen:"Info", HealthChecks:"Enabled", EtcdV3CompactionPeriod:(*v1.Duration)(0xc0003a5ee8), Controllers:v3.ControllersConfig{Node:(*v3.NodeControllerConfig)(0xc000435540), Policy:(*v3.PolicyControllerConfig)(0xc00000e948), WorkloadEndpoint:(*v3.WorkloadEndpointControllerConfig)(0xc00000e968), ServiceAccount:(*v3.ServiceAccountControllerConfig)(0xc00000e958), Namespace:(*v3.NamespaceControllerConfig)(0xc00000e930)}}, Status:v3.KubeControllersConfigurationStatus{RunningConfig:v3.KubeControllersConfigurationSpec{LogSeverityScreen:"Info", HealthChecks:"Enabled", EtcdV3CompactionPeriod:(*v1.Duration)(0xc0000ca1d0), Controllers:v3.ControllersConfig{Node:(*v3.NodeControllerConfig)(0xc00044ad40), Policy:(*v3.PolicyControllerConfig)(nil), WorkloadEndpoint:(*v3.WorkloadEndpointControllerConfig)(nil), ServiceAccount:(*v3.ServiceAccountControllerConfig)(nil), Namespace:(*v3.NamespaceControllerConfig)(nil)}}, EnvironmentVars:map[string]string{"DATASTORE_TYPE":"kubernetes", "ENABLED_CONTROLLERS":"node"}}} error=Put "https://10.43.0.1:443/apis/crd.projectcalico.org/v1/kubecontrollersconfigurations/default": dial tcp 10.43.0.1:443: connect: connection refused
2021-02-03 15:35:31.393 [WARNING][1] runconfig.go 170: unable to perform status update on KubeControllersConfiguration(default) error=Put "https://10.43.0.1:443/apis/crd.projectcalico.org/v1/kubecontrollersconfigurations/default": dial tcp 10.43.0.1:443: connect: connection refused

I Googled for some of your Calico error messages and found this GitHub issue:

1 Like

Thanks so much for the help @manics . The issue was indeed in calico-kube-controllers and once you mentioned about Network Plugin, I too landed up on above Rancher Github issue.

It’s resolved now, post I restarted Calico K8s Controller. Seems it was not able to reach pod internal ips and etcd. Post restart, it got fixed which automatically fixed my Jupyterhub service

Thanks again!

2 Likes

Thanks for reporting back about the solution, and @manics thanks for your very helpful presence! :heart: :tada:

Thank you @dprateek1991 for describing what you did to resolve the issue and @manics for helping out resolve it!

I think this is also the solution for:

@consideRatio , @manics - For higher stability and seamless networking between Jhub and Proxy pod, we also applied following Network Policy on our namespace.

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  creationTimestamp: "2021-02-25T01:30:52Z"
  generation: 1
  managedFields:
  - apiVersion: networking.k8s.io/v1
    fieldsType: FieldsV1
    fieldsV1:
      f:spec:
        f:ingress: {}
        f:policyTypes: {}
    manager: agent
    operation: Update
    time: "2021-02-25T01:30:52Z"
  name: allow-same-namespace
  namespace: jupyter
  resourceVersion: "9643752"
  selfLink: /apis/networking.k8s.io/v1/namespaces/jupyter/networkpolicies/allow-same-namespace
  uid: 7702a1b9-3e52-4f75-920d-2160cc1f93eb
spec:
  ingress:
  - from:
    - podSelector: {}
  podSelector: {}
  policyTypes:
  - Ingress

This network policy is to allow all pods within the same namespace to talk to each other seamlessly. As per my experience of running JupyterHub on AWS EKS and Azure AKS this is not required.

However, for On-Premise Kubernetes Clusters I would recommend to apply this network policy as well in the namespace where JupyterHub is running.

2 Likes

Hello,

Thanks for your post. I get the same error as you in the k describe pod hub-6448cc6597-mbrp5:

Warning  Unhealthy  9m40s (x6 over 10m)   kubelet            Readiness probe failed: Get "http://10.42.2.25:8081/hub/health": dial tcp 10.42.2.25:8081: connect: connection refused
  Warning  Unhealthy  9m40s (x6 over 10m)   kubelet            Liveness probe failed: Get "http://10.42.2.25:8081/hub/health": dial tcp 10.42.2.25:8081: connect: connection refused

But I’m running kubernetes on a k3s cluster with one master & two worker nodes which I set up myself. I don’t have such canal pods in the kube-system namespace

kubectl -n kube-system get pods
NAME                                            READY   STATUS    RESTARTS       AGE
local-path-provisioner-84db5d44d9-ngc7v         1/1     Running   2 (155m ago)   5h30m
svclb-ingress-nginx-controller-6c4c1b36-42pz9   2/2     Running   0              122m
svclb-ingress-nginx-controller-6c4c1b36-6dsfr   2/2     Running   0              122m
svclb-ingress-nginx-controller-6c4c1b36-5hh9d   2/2     Running   0              122m
coredns-6799fbcd5-w98xb                         1/1     Running   1 (155m ago)   5h30m
metrics-server-67c658944b-mzxjq                 1/1     Running   0              29m

I was wondering, would you have any idea to fix this?

best regards,