Error 503 socket hang up during single-notebook server start up

Hi all,

I started recently to look into Jupyterhub and play around with some setup possibilities, the current setup is a docker-swarm cluster on 2 VMs with traefik as reverse proxy (on master node) and a separate jupyterhub instance (on master node) with dockerSwarm spawner.
This works fine, a thing that caught my eye was that during single-notebook server start up jupyterhub logs error 503 GET socket hang up, the server start regardless but I am concerned that could lead to problems when more users will start using it (currently I am the only one using it while setting it up).

My logs, docker-compose files are the following:

Jupyterhub logs
jupyterhub_jupyterhub.1.x0uy0ekegjgd@vm    | [I 2024-12-06 08:30:18.614 JupyterHub base:1124] User my_user took 13.344 seconds to start
jupyterhub_jupyterhub.1.x0uy0ekegjgd@vm    | [I 2024-12-06 08:30:18.614 JupyterHub proxy:331] Adding user my_user to proxy /user/my_user/ => http://jupyter-my_user:8888
jupyterhub_jupyterhub.1.x0uy0ekegjgd@vm    | 08:30:18.617 [ConfigProxy] info: Adding route /user/my_user -> http://jupyter-my_user:8888
jupyterhub_jupyterhub.1.x0uy0ekegjgd@vm    | 08:30:18.618 [ConfigProxy] info: Route added /user/my_user -> http://jupyter-my_user:8888
jupyterhub_jupyterhub.1.x0uy0ekegjgd@vm    | [I 2024-12-06 08:30:18.619 JupyterHub users:899] Server my_user is ready
jupyterhub_jupyterhub.1.x0uy0ekegjgd@vm    | 08:30:18.619 [ConfigProxy] info: 201 POST /api/routes/user/my_user 
jupyterhub_jupyterhub.1.x0uy0ekegjgd@vm    | [I 2024-12-06 08:30:18.620 JupyterHub log:192] 200 GET /hub/api/users/my_user/server/progress?_xsrf=[secret] (my_user@MY_IP) 12279.89ms
jupyterhub_jupyterhub.1.x0uy0ekegjgd@vm    | [I 2024-12-06 08:30:18.655 JupyterHub log:192] 302 GET /hub/spawn-pending/my_user?_xsrf=[secret] -> /user/my_user/ (my_user@MY_IP) 11.84ms
jupyterhub_jupyterhub.1.x0uy0ekegjgd@vm    | [I 2024-12-06 08:30:18.702 JupyterHub log:192] 302 GET /hub/api/oauth2/authorize?client_id=jupyterhub-user-my_user&redirect_uri=%2Fuser%2Fmy_user%2Foauth_callback&response_type=code&state=[secret] -> /user/my_user/oauth_callback?code=[secret]&state=[secret] (my_user@MY_IP) 21.35ms
jupyterhub_jupyterhub.1.x0uy0ekegjgd@vm    | [I 2024-12-06 08:30:18.747 JupyterHub log:192] 200 POST /hub/api/oauth2/token (my_user@10.0.1.252) 36.20ms
jupyterhub_jupyterhub.1.x0uy0ekegjgd@vm    | [I 2024-12-06 08:30:18.756 JupyterHub log:192] 200 GET /hub/api/user (my_user@10.0.1.252) 7.37ms
jupyterhub_jupyterhub.1.x0uy0ekegjgd@vm    | 08:30:18.863 [ConfigProxy] error: 503 GET /user/my_user/static/lab/7730.7e3a9fb140d2d55a51fc.js socket hang up
jupyterhub_jupyterhub.1.x0uy0ekegjgd@vm    | 08:30:18.866 [ConfigProxy] error: 503 GET /user/my_user/static/lab/2160.8e96aa5b6f6d451bf57d.js socket hang up
jupyterhub_jupyterhub.1.x0uy0ekegjgd@vm    | [I 2024-12-06 08:30:18.867 JupyterHub _xsrf_utils:125] Setting new xsrf cookie for b'None:ZUCz-XldA2vbVNEM9pOfZ0zzkE3aj1fcvOPoODRatSI=' {'path': '/hub/', 'max_age': 3600}
jupyterhub_jupyterhub.1.x0uy0ekegjgd@vm    | [I 2024-12-06 08:30:18.877 JupyterHub log:192] 200 GET /hub/error/503?url=%2Fuser%2Fmy_user%2Fstatic%2Flab%2F7730.7e3a9fb140d2d55a51fc.js%3Fv%3D7e3a9fb140d2d55a51fc (@10.0.1.250) 11.56ms
jupyterhub_jupyterhub.1.x0uy0ekegjgd@vm    | [I 2024-12-06 08:30:18.878 JupyterHub _xsrf_utils:125] Setting new xsrf cookie for b'None:ZUCz-XldA2vbVNEM9pOfZ0zzkE3aj1fcvOPoODRatSI=' {'path': '/hub/', 'max_age': 3600}
jupyterhub_jupyterhub.1.x0uy0ekegjgd@vm    | [I 2024-12-06 08:30:18.880 JupyterHub log:192] 200 GET /hub/error/503?url=%2Fuser%2Fmy_user%2Fstatic%2Flab%2F2160.8e96aa5b6f6d451bf57d.js%3Fv%3D8e96aa5b6f6d451bf57d (@10.0.1.250) 1.59ms

Traefik logs
traefik_traefik.1.ie97nxehqrsb@vm    | 2024-12-06T08:30:18Z DBG github.com/traefik/traefik/v3/pkg/server/service/loadbalancer/wrr/wrr.go:196 > Service selected by WRR: 2ec410e6d17c3d32
traefik_traefik.1.ie97nxehqrsb@vm    | MY_IP - - [06/Dec/2024:08:30:18 +0000] "GET /hub/spawn-pending/my_user?_xsrf=MnwxOjB8MTA6MTczMzQ3Mzc3M3w1Ol94c3JmfDg4Ok5UUXpPR1JqTW1KbU5UVTROR0UwTVdJM1ptRmhOR0l5WWpJNE1qQTVOREU2WVRNd1kyTmtaVGMxTlRjMk5ETXdaR0kyT1dWbE9Ua3dNbVZpTkRReE9EST18YTdmMDQ1MDYwOWVlYWU0ODI1MzllOWNiNzg5ZjJkMWI2Y2U2YTgxN2Q3MjQ3NjUyMjBjOWRkNjgzZjg3YTlhNQ HTTP/2.0" 302 0 "-" "-" 219 "jupyterhub-https@swarm" "http://10.0.1.250:8000" 17ms
traefik_traefik.1.ie97nxehqrsb@vm    | 2024-12-06T08:30:18Z DBG github.com/traefik/traefik/v3/pkg/server/service/loadbalancer/wrr/wrr.go:196 > Service selected by WRR: 2ec410e6d17c3d32
traefik_traefik.1.ie97nxehqrsb@vm    | MY_IP - - [06/Dec/2024:08:30:18 +0000] "GET /user/my_user/ HTTP/2.0" 302 0 "-" "-" 220 "jupyterhub-https@swarm" "http://10.0.1.250:8000" 8ms
traefik_traefik.1.ie97nxehqrsb@vm    | 2024-12-06T08:30:18Z DBG github.com/traefik/traefik/v3/pkg/server/service/loadbalancer/wrr/wrr.go:196 > Service selected by WRR: 2ec410e6d17c3d32
traefik_traefik.1.ie97nxehqrsb@vm    | MY_IP - - [06/Dec/2024:08:30:18 +0000] "GET /user/my_user/lab? HTTP/2.0" 302 0 "-" "-" 221 "jupyterhub-https@swarm" "http://10.0.1.250:8000" 4ms
traefik_traefik.1.ie97nxehqrsb@vm    | 2024-12-06T08:30:18Z DBG github.com/traefik/traefik/v3/pkg/server/service/loadbalancer/wrr/wrr.go:196 > Service selected by WRR: 2ec410e6d17c3d32
traefik_traefik.1.ie97nxehqrsb@vm    | MY_IP - - [06/Dec/2024:08:30:18 +0000] "GET /hub/api/oauth2/authorize?client_id=jupyterhub-user-my_user&redirect_uri=%2Fuser%2Fmy_user%2Foauth_callback&response_type=code&state=A7AwPYMieX8mp5LM8i0mQQ HTTP/2.0" 302 0 "-" "-" 222 "jupyterhub-https@swarm" "http://10.0.1.250:8000" 24ms
traefik_traefik.1.ie97nxehqrsb@vm    | 2024-12-06T08:30:18Z DBG github.com/traefik/traefik/v3/pkg/server/service/loadbalancer/wrr/wrr.go:196 > Service selected by WRR: 2ec410e6d17c3d32
traefik_traefik.1.ie97nxehqrsb@vm    | MY_IP - - [06/Dec/2024:08:30:18 +0000] "GET /user/my_user/oauth_callback?code=pBjDbEP5nvBlFpUkWVgXhR8Mm1wCUe&state=A7AwPYMieX8mp5LM8i0mQQ HTTP/2.0" 302 0 "-" "-" 223 "jupyterhub-https@swarm" "http://10.0.1.250:8000" 51ms
traefik_traefik.1.ie97nxehqrsb@vm    | 2024-12-06T08:30:18Z DBG github.com/traefik/traefik/v3/pkg/server/service/loadbalancer/wrr/wrr.go:196 > Service selected by WRR: 2ec410e6d17c3d32
traefik_traefik.1.ie97nxehqrsb@vm    | MY_IP - - [06/Dec/2024:08:30:18 +0000] "GET /user/my_user/lab? HTTP/2.0" 200 4572 "-" "-" 224 "jupyterhub-https@swarm" "http://10.0.1.250:8000" 10ms
traefik_traefik.1.ie97nxehqrsb@vm    | 2024-12-06T08:30:18Z DBG github.com/traefik/traefik/v3/pkg/server/service/loadbalancer/wrr/wrr.go:196 > Service selected by WRR: 2ec410e6d17c3d32
traefik_traefik.1.ie97nxehqrsb@vm    | 2024-12-06T08:30:18Z DBG github.com/traefik/traefik/v3/pkg/server/service/loadbalancer/wrr/wrr.go:196 > Service selected by WRR: 2ec410e6d17c3d32
traefik_traefik.1.ie97nxehqrsb@vm    | MY_IP - - [06/Dec/2024:08:30:18 +0000] "GET /user/my_user/lab/extensions/jupyterlab_pygments/static/remoteEntry.5cbb9d2323598fbda535.js HTTP/2.0" 304 0 "-" "-" 225 "jupyterhub-https@swarm" "http://10.0.1.250:8000" 7ms
traefik_traefik.1.ie97nxehqrsb@vm    | MY_IP - - [06/Dec/2024:08:30:18 +0000] "GET /user/my_user/lab/extensions/@jupyter-notebook/lab-extension/static/remoteEntry.04dfa589925e7e7c6a3d.js HTTP/2.0" 304 0 "-" "-" 226 "jupyterhub-https@swarm" "http://10.0.1.250:8000" 10ms
traefik_traefik.1.ie97nxehqrsb@vm    | 2024-12-06T08:30:18Z DBG github.com/traefik/traefik/v3/pkg/server/service/loadbalancer/wrr/wrr.go:196 > Service selected by WRR: 2ec410e6d17c3d32
traefik_traefik.1.ie97nxehqrsb@vm    | 2024-12-06T08:30:18Z DBG github.com/traefik/traefik/v3/pkg/server/service/proxy.go:100 > 499 Client Closed Request error="context canceled"
traefik_traefik.1.ie97nxehqrsb@vm    | 2024-12-06T08:30:18Z DBG github.com/traefik/traefik/v3/pkg/server/service/loadbalancer/wrr/wrr.go:196 > Service selected by WRR: 2ec410e6d17c3d32
traefik_traefik.1.ie97nxehqrsb@vm    | MY_IP - - [06/Dec/2024:08:30:18 +0000] "GET /user/my_user/static/lab/7730.7e3a9fb140d2d55a51fc.js?v=7e3a9fb140d2d55a51fc HTTP/2.0" 499 21 "-" "-" 227 "jupyterhub-https@swarm" "http://10.0.1.250:8000" 1ms
traefik_traefik.1.ie97nxehqrsb@vm    | 2024-12-06T08:30:18Z DBG github.com/traefik/traefik/v3/pkg/server/service/proxy.go:100 > 499 Client Closed Request error="context canceled"

Traefik docker compose
version: '3.3'

services:

  traefik:
    # Use the latest v3.0.x Traefik image available
    image: traefik:v3.0
    ports:
      # Listen on port 80, default for HTTP, necessary to redirect to HTTPS
      - target: 80
        published: 80
        mode: host
      # Listen on port 443, default for HTTPS
      - target: 443
        published: 443
        mode: host
    deploy:
      placement:
        constraints:
          # Make the traefik service run only on the node with this label
          # as the node with it has the volume for the certificates
          - node.labels.traefik.main-node == true
      labels:
        # Enable Traefik for this service, to make it available in the public network
        - traefik.enable=true
        
        # Use the docker swarm overlay network (declared below)
        - traefik.docker.network=jupyterhub_net
        
        # Use the custom label "traefik.constraint-label=traefik-public"
        # This public Traefik will only use services with this label
        # That way you can add other internal Traefik instances per stack if needed
        - traefik.constraint-label=traefik-public
          
        # https-redirect middleware to redirect HTTP to HTTPS
        # It can be re-used by other stacks in other Docker Compose files
        - traefik.http.middlewares.https-redirect.redirectscheme.scheme=https
        - traefik.http.middlewares.https-redirect.redirectscheme.permanent=true
        
        # traefik-http set up only to use the middleware to redirect to https
        - traefik.http.routers.traefik-public-http.rule=Host(`MY_IP`)
        - traefik.http.routers.traefik-public-http.entrypoints=http
        - traefik.http.routers.traefik-public-http.middlewares=https-redirect
        
        # traefik-https the actual router using HTTPS
        - traefik.http.routers.traefik-public-https.rule=Host(`MY_IP`)
        - traefik.http.routers.traefik-public-https.entrypoints=https
        - traefik.http.routers.traefik-public-https.tls=true
        
        # Use the special Traefik service api@internal with the web UI/Dashboard
        - traefik.http.routers.traefik-public-https.service=api@internal
        
        # Define the port inside of the Docker service to use
        - traefik.http.services.traefik-public.loadbalancer.server.port=8080

    volumes:
      # Add Docker as a mounted volume, so that Traefik can read the labels of other services
      - /var/run/docker.sock:/var/run/docker.sock:ro
      # Mount the certificates
      - /etc/letsencrypt/archive/jupyterhub/fullchain1.pem:/certificates/fullchain.pem
      - /etc/letsencrypt/archive/jupyterhub/privkey1.pem:/certificates/privkey.pem
      - /home/vmadmin/docker_swarm/traefik/certs-traefik.yml:/etc/traefik/dynamic/certs-traefik.yml

    command:
      # Enable Docker in Traefik, so that it reads labels from Docker services
      - --providers.docker
      # Add a constraint to only use services with the label "traefik.constraint-label=traefik-public"
      - --providers.docker.constraints=Label(`traefik.constraint-label`, `traefik-public`)
      # Do not expose all Docker services, only the ones explicitly exposed
      - --providers.docker.exposedbydefault=false
      # Enable Docker Swarm mode
      - --providers.swarm.endpoint=unix:///var/run/docker.sock

      - --providers.file.filename=/etc/traefik/dynamic/certs-traefik.yml
   
      # Create an entrypoint "http" listening on port 80
      - --entrypoints.http.address=:80
      # Create an entrypoint "https" listening on port 443
      - --entrypoints.https.address=:443

      # Enable the access log, with HTTP requests
      - --accesslog
      # Enable the Traefik log, for configurations and errors
      - --log.level=DEBUG
      - --log
      # Enable the Dashboard and API
      - --api
    networks:
      # Use the public network created to be shared between Traefik and
      # any other service that needs to be publicly available with HTTPS
      - jupyterhub_net

networks:
  # Use the previously created public network "traefik-public", shared with other
  # services that need to be publicly available via this Traefik
  jupyterhub_net:
    external: true
Jupyterhub docker compose
version: "3"

services:
  jupyterhub:
    image: "jupyterhub-docker-swarm-custom:5.2.1"

    ports:
      - target: 8000
        published: 8000
        mode: host
    
    deploy:
      placement:
        constraints:
          # place hub on master node
          - node.labels.traefik.main-node == true
          - node.role == manager

      labels:
        - traefik.enable=true
        - traefik.constraint-label=traefik-public

        # create router rule HTTPS
        - traefik.http.routers.jupyterhub-https.rule=Host(`my_domain.com`) || Host(`www.my_domain.com`)
        - traefik.http.routers.jupyterhub-https.entrypoints=https
        - traefik.http.routers.jupyterhub-https.tls=true

        # create router rule HTTP redirect rule
        - traefik.http.middlewares.https-redirect.redirectscheme.scheme=https
        - traefik.http.middlewares.https-redirect.redirectscheme.permanent=true
        
        # traefik-http set up only to use the middleware to redirect to https
        - traefik.http.routers.jupyterhub-http.rule=Host(`my_domain.com`) || Host(`www.my_domain.com`)
        - traefik.http.routers.jupyterhub-http.entrypoints=http
        - traefik.http.routers.jupyterhub-http.middlewares=https-redirect


        # add exposed port for traefik to see (does not get it from docker swarm)
        - traefik.http.services.jupyterhub.loadbalancer.server.port=8000

        - traefik.docker.network=jupyterhub_net

    volumes:
      - jupyterhub_pv:/mnt/jupyterhub
      - /var/run/docker.sock:/var/run/docker.sock
      - jupyterhub:/srv/jupyterhub

    networks:
      - jupyterhub_net

    environment:
      DOCKER_NETWORK_NAME: jupyterhub_net

volumes:
  jupyterhub_pv:
    driver: local
    driver_opts:
      type: none
      o: bind
      device: /mnt/jupyterhub

  jupyterhub:
    driver: local
    driver_opts:
      type: none
      o: bind
      device: /srv/jupyterhub

networks:
  jupyterhub_net:
    external: true

Therefore I am curious, what could be the cause of this error?
How could I fix this?
Did anyone experience a similar problem and can share if this lead to further problems?

Thank you in advance for any help or hints!

Sharing logs from the single-user server container might help as well. It’s possible this socket hangup is actually because the client is closing the connection before the request completed, and not going to result in any user-facing errors at all. This kind of thing can be caused by page refreshed with outstanding requests, for example (such errors shouldn’t be logged as a 503, so it might not be that, or it might be a bug in the logging and/or error handling of the proxy).

Since you are already deploying traefik, you might consider also deploying jupyterhub-traefik-proxy to handle the proxying in traefik itself, to remove the default configurable-http-proxy from the mix.

Yeah there are no user-facing errors as you mentioned, the single-user-notebook server starts as expected.

Thank you for the hint, I will deploy jupyterhub-traefik-proxy and monitor the deployment for a bit, in case it happens again I will also provide single-user server logs.

I deployed the jupyterhub-traefik-proxy managed by jupyterhub and the error did not occur anymore (thanks for suggestion!).

When the Hub manages the proxy, is there a way that I could increase the number of proxies (if more users would use JHub, to be able to loadbalance between them), or would it be better to manage the proxies externally?

When the Hub manages the proxy, no, it can only manage a single process, so a single traefik replica. But you can if you manage the replicas externally with a deployment tool. Kubernetes and docker-compose make this pretty simple (replica count is just a number), I’m not sure what other tools there are for it. But to have multiple traefik replicas, you need to run a load balancer in front of them (e.g. another instance of traefik!) to distribute traffic across the replicas.

Ok thank you for the answer!

I would also try that and see if I can get it to work.

Just to see if I got that right:
I currently have a traefik instance running (routing to JHub), then I would need another deployment (docker-compose) managing the JHub-traefik-proxies, then I can use my already running traefik to also loadbalance between the JHub-proxies?

yes, I believe that’s correct