Slowness with swarmspawner

Hello, I can’t figure out the slowness with swarmspawner.

Launching a JupyterLab, via Jupyterhub using swarmspawner, in a docker/swarm cluster, seems a bit slow on some hosts.
Close to 50 s to get the service status running in the ‘start_object’ coroutine
Value confirmed by checking the Created dates of the docker network and the container (done on the target host)
In my case docker network created at 11:46:04 (GMT time) and docker container created at 11:46:55 (GMT time)

docker network inspect jupyter_network |grep Created
        "Created": "2024-10-09T13:46:04.338874395+02:00",

docker container inspect fac949f6ac86  |grep Created
        "Created": "2024-10-09T11:46:55.912624491Z",

Fortunately the container starts, but it takes a long time

Another test run on the leader host, using the same target host, done via

  • docker network create ... +
  • docker service create --constraint="node.id==9bgzo3wogakoqwftespnzlbm7" --network test ...

show that everything works fine in this case
Container is started immediately after docker network is created. It’s what I expect …

docker network inspect test |grep Created
        "Created": "2024-10-09T14:01:21.52683491+02:00",
docker container inspect 289fc8e65a96  |grep Created
        "Created": "2024-10-09T12:01:21.915244364Z",

I have no idea how to check where time is spent by swarmspawner ? Which method to extends to log activity ?
Regards

Is this a copy and paste error? The network is created at 13:46:04, not 11:46:04.

You can see the source code for SwarmSpawner here

Can you turn on debug logging and show us your JupyterHub logs?

  • Was not a copy and paste error. 13:46:04 was expressed in GMT+2, not the other date.
    So I write both dates in GMT time.

  • Running docker service ps shows most of the waiting time is due to long time in preparing state

 docker service ps jnp_notebook-bchambon   
ID             NAME                      IMAGE                                                                  NODE           DESIRED STATE   CURRENT STATE              ERROR     PORTS
rxqaartiyoj2   jnp_notebook-bchambon.1   gitlab-registry.in2p3.fr/cc-in2p3-dev/jnp_docker/myjbook_rhel9:8.2.0   ccosvmse0027   Running         Preparing 50 seconds ago             
[root@ccosvmse0033 jnp]$   docker service ps jnp_notebook-bchambon   

docker service ps jnp_notebook-bchambon   
ID             NAME                      IMAGE                                                                  NODE           DESIRED STATE   CURRENT STATE          ERROR     PORTS
rxqaartiyoj2   jnp_notebook-bchambon.1   gitlab-registry.in2p3.fr/cc-in2p3-dev/jnp_docker/myjbook_rhel9:8.2.0   ccosvmse0027   Running         Running 1 second ago      

So I think it’s a docker problem or docker network problem of my cluster , NOT a swarmspawner one

Here are some logs from JupyterHub

  | [I 2024-10-09 15:25:05.805 CustomJupyterHub base:937] User logged in: bchambon
  | [I 2024-10-09 15:25:05.809 CustomJupyterHub log:192] 302 GET /hub/oauth_callback?state=[secret]&session_state=[secret]&code=[secret] -> /hub/spawn (bchambon@10.0.0.2) 1191.63ms
  | [I 2024-10-09 15:25:05.963 CustomJupyterHub log:192] 200 GET /hub/spawn (bchambon@10.0.0.2) 55.36ms
  | [I 2024-10-09 15:25:10.646 CustomJupyterHub provider:660] Creating oauth client jupyterhub-user-bchambon
  | [I 2024-10-09 15:25:10.885 CustomJupyterHub dockerspawner:1033] Service 'jnp_notebook-bchambon' is gone
  | [I 2024-10-09 15:25:11.530 CustomJupyterHub log:192] 302 POST /hub/spawn?_xsrf=[secret] -> /hub/spawn-pending/bchambon?_xsrf=[secret] (bchambon@10.0.0.2) 1016.83ms
  | [I 2024-10-09 15:25:11.617 CustomJupyterHub pages:399] bchambon is pending spawn
  | [I 2024-10-09 15:25:11.618 CustomJupyterHub log:192] 200 GET /hub/spawn-pending/bchambon?_xsrf=[secret] (bchambon@10.0.0.2) 3.10ms
  | [I 2024-10-09 15:25:11.926 CustomJupyterHub dockerspawner:1311] Created service jnp_notebook-bchambon (id: qyrp9k3) from image gitlab-registry.in2p3.fr/cc-in2p3-dev/jnp_docker/myjbook_rhel9:8.2.0
  | [I 2024-10-09 15:25:11.926 CustomJupyterHub dockerspawner:1335] Starting service jnp_notebook-bchambon (id: qyrp9k3)
  | [W 2024-10-09 15:25:21.242 CustomJupyterHub base:1190] User bchambon is slow to start (timeout=10)
  | [I 2024-10-09 15:26:06.249 CustomJupyterHub base:1090] User bchambon took 55.721 seconds to start
  | [I 2024-10-09 15:26:06.249 CustomJupyterHub proxy:331] Adding user bchambon to proxy /user/bchambon/ => http://jnp_notebook-bchambon:8888
  | [I 2024-10-09 15:26:06.251 CustomJupyterHub users:776] Server bchambon is ready
  | [I 2024-10-09 15:26:06.252 CustomJupyterHub log:192] 200 GET /hub/api/users/bchambon/server/progress?_xsrf=[secret] (bchambon@10.0.0.2) 54496.08ms
  | [I 2024-10-09 15:26:06.476 CustomJupyterHub log:192] 302 GET /hub/spawn-pending/bchambon?_xsrf=[secret] -> /user/bchambon/ (bchambon@10.0.0.2) 90.72ms
  | [I 2024-10-09 15:26:06.781 CustomJupyterHub log:192] 302 GET /hub/api/oauth2/authorize?client_id=jupyterhub-user-bchambon&redirect_uri=%2Fuser%2Fbchambon%2Foauth_callback&response_type=code&state=[secret] -> /user/bchambon/oauth_callback?code=[secret]&state=[secret] (bchambon@10.0.0.2) 78.41ms
  | [I 2024-10-09 15:26:07.002 CustomJupyterHub log:192] 200 POST /hub/api/oauth2/token (bchambon@10.0.3.229) 146.09ms
  | [I 2024-10-09 15:26:07.024 CustomJupyterHub log:192] 200 GET /hub/api/user (bchambon@10.0.3.228) 19.58ms

Can you also post your single user server logs (docker containers where your JupyterLab is running)? Looking at timestamps of both JupyterHub and JupyterLab logs can give us some insights.

Hello,
Back to this problem for which I setup a workaround
As a matter of fact, I discovered that the slowness occured on a host when running a JupyterLab for the first time (= I mean when no other JupyterLab where running yet on that host), meaning that the docker network ‘jupyter_network’ should be created, because it didn’t exist yet.
So the slowness was related to the creation of the docker network ‘jupyter_network’, for sure.
As workaround, I run a dummy docker service, using that docker network ‘jupyter_network’, on all hosts of the cluster. Objective being to force creation of a permanent docker network on all hosts, at startup of the stack of services.
I mean something like:

 ...
  dummy_service:
    image: alpine:3.20.3
    command: ["sleep", "infinity"]
    networks:
      - jupyter_network
    deploy:
      mode: global
      placement:
        constraints: [ node.role != manager ]
 ...

Once this was in place, no more slowness
As an end user, now a JupyterLab is spawned (via JupyterHub) and fully available in less than 6 to 8s, which is quite acceptable (cluster of 20 hosts and serving up to 100 simultaneous users)

As a conclusion, it was definitely a 'docker ’ slowness. Absolutely NOT related to swarmspawner

Regards,
Bernard

3 Likes