Only One Multiple Named Server Working

Problem Description

I’m following the zero to hero kubernetes guide and I’ve run into an issue trying to get named servers to work. While the hub allows me to spawn multiple named servers, it appears that only the last one started actually responds. All the other servers respond with a 503. The logs don’t appear to be telling me much so I’m not sure where else to turn.

Steps to Reproduce:

I’m using k3s as my kubernetes cluster on Ubuntu 20.04 LTS.

I’m also only slightly modifying the official 1.2.0 helm chart:

helm repo add jupyterhub https://jupyterhub.github.io/helm-chart/
helm repo update
helm show values jupyterhub/jupyterhub > values.yaml

I’ve modified the values.yaml in the following ways (omitted most of it for brevity)

hub:
  allowNamedServers: true
  namedServerLimitPerUser: 10
## ...
cull:
  enabled: false
## ...
debug:
  enabled: true
global:
  safeToShowValues: true

I installed jupyterhub with the following command:

helm upgrade \
  --cleanup-on-fail \
  --install jupyterhub jupyterhub/jupyterhub \
  --namespace jupyter \
  --create-namespace \
  --version=1.2.0 \
  --values=values.yaml

All the pods come up as expected:

$ kubectl get pod -n jupyter
proxy-7478f74f4-b64zw             1/1     Running   0          17s
continuous-image-puller-4lbtq     1/1     Running   0          17s
user-scheduler-6795c686f5-8xh7f   1/1     Running   0          17s
user-scheduler-6795c686f5-psc4p   1/1     Running   0          17s
hub-7865b575cf-xrtg7              1/1     Running   0          17s

For the most part, everything works until I create a second named server for the same user. I login to the UI using a dummy name/password. This automatically creates a server for my user as expected.

If I go to the hub control panel and create a named server test1. This server comes up successfully, and I can access it like normal. However my singleuser server stops working. The only thing I’m seeing is a 503 page:

If I stop my named server, then my singleuser server starts working again. As soon as I turn on my named server again my singleuser server fails with the same error.

This error also happens whenever I start two named servers. Whichever is the last one I started works and the others respond with 503s. The only real log I see out of the hub is:


[D 2022-09-06 00:47:13.946 JupyterHub pages:652] No template for 503
[I 2022-09-06 00:47:13.954 JupyterHub log:189] 200 GET /hub/error/503?url=%2Fuser%2Ftest_user (@10.42.0.158) 9.17ms

The proxy logs are complaining about a connection refused:

00:51:02.611 [ConfigProxy] error: 503 GET /user/test_user connect ECONNREFUSED 10.42.0.164:8888

And the server that is complaining doesn’t have any additional logs, presumably because the connection is not making it to it. But all the pods are still running:

$ kubectl get pod -n jupyter
proxy-7478f74f4-b64zw             1/1     Running   0          20m
continuous-image-puller-4lbtq     1/1     Running   0          20m
user-scheduler-6795c686f5-8xh7f   1/1     Running   0          20m
user-scheduler-6795c686f5-psc4p   1/1     Running   0          20m
hub-7865b575cf-xrtg7              1/1     Running   0          20m
jupyter-test-5fuser--test1        1/1     Running   0          8m54s
jupyter-test-5fuser               1/1     Running   0          8m11s

If I exec into one of the notebook pods, the jupyter process is still running and appears to be listening on the correct port.

I’m at a bit of a loss as to how to continue troubleshooting. Any help would be greatly appreciated.

For some reason I can’t attach the actual log files, so please let me know if there’s anything additional you’d like to see.

It’ll be helpful to see the full debug logs for the hub and the singleuser pods. You should be able to paste in the logs as code blocks in this forum, but you could also upload them to a site such as https://gist.github.com/

Thanks for the quick response. I went ahead and uploaded the full logs to this gist please let me know if there’s anything else you’d like to see.

Very strange indeed! I’m having a little trouble parsing the sequence of events. Here’s what I get:

  • 00:50:39 /user/test_user/test1 route is added, pointing to 10.42.0.170 and you can talk to the test1 server just fine.
  • 00:51:22 /user/test_user/test2 is added, pointing to 10.42.0.172
  • 00:51:32 /user/test_user is stopped from the button on /hub/home
  • 00:51:34 requests to /user/test_user/test1 start failing with 503 Connection refused from proxy to 10.42.0.170

This suggests that something went wrong with the test1 server or the network routing requests to it, but we don’t have logs or pod events logs from this time frame.

A similar sequence occurred around 53:47 - 54:33 with test_user/test1 and test_user/

Here’s a question: do you see the same or similar issue with any two servers (not named servers, but also a second server for a second user)? That would tell me there’s something conflicting in the pods where either the pod spec isn’t sufficiently unique, or there’s something funky in k3s networking.

If you can dump kubectl get pod -o yaml for both pods when one is working and the other has just started to fail, that might be illuminating.


aside:

FWIW, you don’t need to download values and then modify it. All the default values are used by default, and you can create a config.yaml with only your changes:

helm upgrade ... jupyterhub/jupyterhub --values config.yaml

Then you don’t have a file with any values you inherited, so it’s easier to be sure you’re sharing the whole relevant configuration, and easier to make sure you get the right values when you get the latest chart.

1 Like

Thanks for the response.

Yeah the sequence of events is something like this:

  • Boot up the hub
  • Login to the hub as test_user
  • I can access my singleuser pod just fine
    • file: working-singleuser-pod.yaml
  • Click Control Panel
  • add named server called test1
  • I can now access the named server test1 just fine
    • file: working-named-server-1-pod.yaml
  • I have lost access to the singleuser server it 503’s as described
    • file failing-singleuser-pod.yaml
  • Click Control Panel
  • add named server called test2
  • I can now access the named server test2 just fine
    • file working-named-server-2-pod.yaml
  • I have lost access to both the singleuser and test1 servers
    • file failing-named-server-1-pod.yaml
  • If I restart either test1 or singleuser I can access them again, but lose access to test2

What you described is correct. I can access any of the servers, it’s just whatever one happened to be last. Each of these file is now available in a gist

Since this is very mysterious, you might try upgrading to z2jh 2.0 if that’s an option. It’s possible this is related to an issue that has been fixed, as jupyterhub 1.5 is from quite a while ago.

I don’t mind giving that a shot later tonight. Thanks for the help so far.

Another debugging option might be to disable network policies. Since this is k3s, networking may not be as completely functional as a true kubernetes deployment.

Upgrading to 2.0 did not solve the problem, but disabling network policies seems to have done the trick. Specifically, I had to disable the network policies for the singleuser portion of the chart.

OK, we can investigate our network policies, but I suspect this means an incorrect implementation of network policies in the cluster. If you can share your k3s cluster info and chart config, we should hopefully be able to isolate it.

I didn’t do anything related to changing the default k3s set up. So you should be able to replicate my k3s by:

curl -sfL https://get.k3s.io | sh -
# Check for Ready node,
takes maybe 30 seconds
k3s kubectl get node

Or you can see more detailed information on the k3s website. If you have something specific you’d like to know about the cluster I’m happy to provide it.

The chart config is listed above, the only change I’ve made to make things work is:

singleuser:
  networkPolicy:
    enabled: false

Thanks for the fast responses!