HTTP 503 Socket Hang Up Error when Z2JK tries to Launch a user's Single User (Jupyter Lab) Server in GKE

Weird problem when deploying Z2JK version 0.10.6 to a GKE cluster managed by Rancher with ISTIO MTLS enabled between the ISTIO Ingress Gateway and the Configurable HTTP Proxy.

Problem occurs when the user tries to login to Jupyterhub. (Hub login page successfully displays)

  1. Login request is received by the ISTIO Ingress Gateway.
  2. ISTIO Ingress Gateway routes the request to the ISTIO side car container running in the Proxy pod.
  3. ISTIO side car container – running in the Proxy pod – routes the request to the CHP container.
  4. CHP Container routes the request to the Hub container running in the Hub pod. There is no ISTIO side car container installed in the Hub pod – just the Hub container is installed in the Hub pod.
  5. Hub container authenticates the user using an oauth spawner.
  6. Hub container spawns the Single User (jupyter lab) Server pod.
  7. Notebook container within the Single User (jupyter lab) server pod starts. See attached log file snippet from the spawned Single User Pod.
  8. Notebook container tries to send a redirect /user//lab? back to the browser.
  9. Browser NEVER gets the redirect request. Log file from CHP container reports a 503 socket hang-up error. See attached log file snippet from the CHP container.

I am also attaching the relevant log file snippet from the Hub container and an image of browser page with the 503 (notice that the url address bar does not show the redirected url from step 8.

.

Any help debugging this problem is greatly appreciated

This is a tricky one with all the components involved. But here’s what I can discern so far from the logs you shared. The following connections appear to be working correctly:

  • chp→hub (serves login page)
  • hub→chp (add route succeeds)
  • hub→singleuser (not 100% sure since it’s not in the logs, but 99%, as it is indicated by the redirect to /user/authzuser6 and GET /user/authzuser6 in the singleuser logs)
  • singleuser→hub (successful POST activity)

But the following doesn’t work:

  • chp→singleuser (socket hangup)

In particular, because the hub can talk to singleuser and chp cannot, that suggests that one major source of error (the connect url being wrong and/or server failure) is not the problem, because the hub and proxy talk to singleuser in the same way with the same info. Instead, it seems there’s something special about the relationship of [proxy, singleuser] that’s different from the relationship of [proxy, hub] and [hub, singleuser], both of which have confirmed bidirectional communication. But the reason is a bit of a mystery to me. It does make me suspect network policies, though. You can try disabling the hub chart’s network policy:

singleuser:
  networkPolicy:
    enabled: false

(and maybe the same for proxy as well). Also maybe verify that network policies or other enforcement mechanisms allow communication from the proxy pod to the singleuser pod, and check any additional network policies you might have added. Unfortunately, kubernetes doesn’t seem to let you ask a pod what network policies are applied to it, you just have to read the policies yourself and trust the system to do what it says it should (spoiler: it may not!).

One possible correction:

  1. Notebook container tries to send a redirect /user/lab? back to the browser.

I believe this request originates at the Hub, checking if the server has started, and is not actually coming from the browser, which is why it’s not following the redirect. It only cares about getting a response, no need to make a second request. I could be wrong, though. You should also be able to verify that 10.77.29.239 is the ip of the hub or proxy (I’m guessing hub).

Solved the problem by adding these lines to the Proxy pod section of the values.yaml file:

proxy:
secretToken: REDACTED
annotations:
sidecar.istio.io/inject: "true"
*** traffic.sidecar.istio.io/excludeInboundPorts: “8001”***
*** traffic.sidecar.istio.io/excludeOutboundPorts: “8081,8888”***