Only One Multiple Named Server Working

codescratch_logan · September 6, 2022, 1:09am

Problem Description

I’m following the zero to hero kubernetes guide and I’ve run into an issue trying to get named servers to work. While the hub allows me to spawn multiple named servers, it appears that only the last one started actually responds. All the other servers respond with a 503. The logs don’t appear to be telling me much so I’m not sure where else to turn.

Steps to Reproduce:

I’m using k3s as my kubernetes cluster on Ubuntu 20.04 LTS.

I’m also only slightly modifying the official 1.2.0 helm chart:

helm repo add jupyterhub https://jupyterhub.github.io/helm-chart/
helm repo update
helm show values jupyterhub/jupyterhub > values.yaml

I’ve modified the values.yaml in the following ways (omitted most of it for brevity)

hub:
  allowNamedServers: true
  namedServerLimitPerUser: 10
## ...
cull:
  enabled: false
## ...
debug:
  enabled: true
global:
  safeToShowValues: true

I installed jupyterhub with the following command:

helm upgrade \
  --cleanup-on-fail \
  --install jupyterhub jupyterhub/jupyterhub \
  --namespace jupyter \
  --create-namespace \
  --version=1.2.0 \
  --values=values.yaml

All the pods come up as expected:

$ kubectl get pod -n jupyter
proxy-7478f74f4-b64zw             1/1     Running   0          17s
continuous-image-puller-4lbtq     1/1     Running   0          17s
user-scheduler-6795c686f5-8xh7f   1/1     Running   0          17s
user-scheduler-6795c686f5-psc4p   1/1     Running   0          17s
hub-7865b575cf-xrtg7              1/1     Running   0          17s

For the most part, everything works until I create a second named server for the same user. I login to the UI using a dummy name/password. This automatically creates a server for my user as expected.

If I go to the hub control panel and create a named server test1. This server comes up successfully, and I can access it like normal. However my singleuser server stops working. The only thing I’m seeing is a 503 page:

If I stop my named server, then my singleuser server starts working again. As soon as I turn on my named server again my singleuser server fails with the same error.

This error also happens whenever I start two named servers. Whichever is the last one I started works and the others respond with 503s. The only real log I see out of the hub is:


[D 2022-09-06 00:47:13.946 JupyterHub pages:652] No template for 503
[I 2022-09-06 00:47:13.954 JupyterHub log:189] 200 GET /hub/error/503?url=%2Fuser%2Ftest_user (@10.42.0.158) 9.17ms

The proxy logs are complaining about a connection refused:

00:51:02.611 [ConfigProxy] error: 503 GET /user/test_user connect ECONNREFUSED 10.42.0.164:8888

And the server that is complaining doesn’t have any additional logs, presumably because the connection is not making it to it. But all the pods are still running:

$ kubectl get pod -n jupyter
proxy-7478f74f4-b64zw             1/1     Running   0          20m
continuous-image-puller-4lbtq     1/1     Running   0          20m
user-scheduler-6795c686f5-8xh7f   1/1     Running   0          20m
user-scheduler-6795c686f5-psc4p   1/1     Running   0          20m
hub-7865b575cf-xrtg7              1/1     Running   0          20m
jupyter-test-5fuser--test1        1/1     Running   0          8m54s
jupyter-test-5fuser               1/1     Running   0          8m11s

If I exec into one of the notebook pods, the jupyter process is still running and appears to be listening on the correct port.

I’m at a bit of a loss as to how to continue troubleshooting. Any help would be greatly appreciated.

For some reason I can’t attach the actual log files, so please let me know if there’s anything additional you’d like to see.

manics · September 6, 2022, 5:37pm

It’ll be helpful to see the full debug logs for the hub and the singleuser pods. You should be able to paste in the logs as code blocks in this forum, but you could also upload them to a site such as https://gist.github.com/

codescratch_logan · September 6, 2022, 11:42pm

Thanks for the quick response. I went ahead and uploaded the full logs to this gist please let me know if there’s anything else you’d like to see.

minrk · September 8, 2022, 12:17pm

Very strange indeed! I’m having a little trouble parsing the sequence of events. Here’s what I get:

00:50:39 /user/test_user/test1 route is added, pointing to 10.42.0.170 and you can talk to the test1 server just fine.
00:51:22 /user/test_user/test2 is added, pointing to 10.42.0.172
00:51:32 /user/test_user is stopped from the button on /hub/home
00:51:34 requests to /user/test_user/test1 start failing with 503 Connection refused from proxy to 10.42.0.170

This suggests that something went wrong with the test1 server or the network routing requests to it, but we don’t have logs or pod events logs from this time frame.

A similar sequence occurred around 53:47 - 54:33 with test_user/test1 and test_user/

Here’s a question: do you see the same or similar issue with any two servers (not named servers, but also a second server for a second user)? That would tell me there’s something conflicting in the pods where either the pod spec isn’t sufficiently unique, or there’s something funky in k3s networking.

If you can dump kubectl get pod -o yaml for both pods when one is working and the other has just started to fail, that might be illuminating.

aside:

FWIW, you don’t need to download values and then modify it. All the default values are used by default, and you can create a config.yaml with only your changes:

helm upgrade ... jupyterhub/jupyterhub --values config.yaml

Then you don’t have a file with any values you inherited, so it’s easier to be sure you’re sharing the whole relevant configuration, and easier to make sure you get the right values when you get the latest chart.

codescratch_logan · September 9, 2022, 12:34pm

Thanks for the response.

Yeah the sequence of events is something like this:

Boot up the hub
Login to the hub as test_user
I can access my singleuser pod just fine
- file: working-singleuser-pod.yaml
Click Control Panel
add named server called test1
I can now access the named server test1 just fine
- file: working-named-server-1-pod.yaml
I have lost access to the singleuser server it 503’s as described
- file failing-singleuser-pod.yaml
Click Control Panel
add named server called test2
I can now access the named server test2 just fine
- file working-named-server-2-pod.yaml
I have lost access to both the singleuser and test1 servers
- file failing-named-server-1-pod.yaml
If I restart either test1 or singleuser I can access them again, but lose access to test2

What you described is correct. I can access any of the servers, it’s just whatever one happened to be last. Each of these file is now available in a gist

minrk · September 9, 2022, 12:45pm

Since this is very mysterious, you might try upgrading to z2jh 2.0 if that’s an option. It’s possible this is related to an issue that has been fixed, as jupyterhub 1.5 is from quite a while ago.

codescratch_logan · September 9, 2022, 12:54pm

I don’t mind giving that a shot later tonight. Thanks for the help so far.

minrk · September 9, 2022, 5:39pm

Another debugging option might be to disable network policies. Since this is k3s, networking may not be as completely functional as a true kubernetes deployment.

codescratch_logan · September 13, 2022, 1:56am

Upgrading to 2.0 did not solve the problem, but disabling network policies seems to have done the trick. Specifically, I had to disable the network policies for the singleuser portion of the chart.

minrk · September 15, 2022, 4:44pm

OK, we can investigate our network policies, but I suspect this means an incorrect implementation of network policies in the cluster. If you can share your k3s cluster info and chart config, we should hopefully be able to isolate it.

codescratch_logan · September 16, 2022, 12:35am

I didn’t do anything related to changing the default k3s set up. So you should be able to replicate my k3s by:

curl -sfL https://get.k3s.io | sh -
# Check for Ready node,
takes maybe 30 seconds
k3s kubectl get node

Or you can see more detailed information on the k3s website. If you have something specific you’d like to know about the cluster I’m happy to provide it.

The chart config is listed above, the only change I’ve made to make things work is:

singleuser:
  networkPolicy:
    enabled: false

Thanks for the fast responses!

dgkn · December 15, 2022, 7:22am

Hey @codescratch_logan, could you find a solution to the problem ?

I have a similar problem on a basic installation of k3s and jupyterhub helm chart v2.0.0. For me, I can only reach the last deployed pod of all the pods scheduled by the hub. So if a new user logs in with the dummy authenticator for example, I can only reach the last one. Even though the “failing” pods are running in the background and if I dismiss the error I can actually execute cells.

I think it has to do with network policies because it seems to be working fine without them. But this isn’t the ideal solution.

dgkn · December 15, 2022, 1:41pm

Just in case someone struggles with this issue.

It seems that default CNI for k3s (flannel) does not apply network policies correctly. I was able to get it to work by switching to calico. Seems fine so far.

Topic		Replies	Views
Zero to jupyterhub on k3s, some user containers spawn but not others Zero to JupyterHub on Kubernetes	5	1457	December 13, 2023
How do I enable multi-server UI with JupyterHub on Kubernetes? Zero to JupyterHub on Kubernetes	5	3040	April 1, 2019
Singleuser pod unable to register to hub on k8s Zero to JupyterHub on Kubernetes help-wanted	10	944	August 25, 2023
Multiple jupyterhub servers per user ("named servers") does not work JupyterHub	8	3727	October 10, 2019
Hub is failing to start single user server Zero to JupyterHub on Kubernetes community , jupyterhub , how-to , help-wanted	6	1707	May 11, 2023

Only One Multiple Named Server Working

Problem Description

Steps to Reproduce:

Related topics