Resume job with swarmspawner on multi nodes

I run jhub with swarmspawner on multi nodes,and when I need to update the config I have to restart Jhub ,and the user which had stard the containers on other nodes will keep their work live ,but, when restared jhub,they cannot resume and get 503 error,How could I set the config or do something?

Deploy with the hub, proxy, and db as separate services. Make sure the db is persistent. Then you can restart and it will resume just fine. I had an issue where the db was saving the wrong port for the notebook, so I run a script to fix them before restarting. I should probably just update my hub install at some point and see if that helps.

1 Like

I run the hub using docker, can I run an separate proxy using another docker container? and how to keep the db separate ? thx

version: "3.4"

services:
  proxy:
    env_file: .env
    image: jupyterhub/configurable-http-proxy:4.5
    networks:
      - net
    ports:
      - mode: host
        target: 8000
        published: 8080
    command:
      - configurable-http-proxy
      - '--error-path'
      - '/usr/share/chp-errors'

    deploy:
      replicas: 1
      placement:
        constraints:
          - node.role == manager

  hub:
    depends_on:
      - db
    env_file: .env
    environment:
      DOCKER_NETWORK_NAME: jupyterhub_net
      POSTGRES_HOST: db
      POSTGRES_DB: jupyterhub
      POSTGRES_USER: pguser
      POSTGRES_PASSWORD: password
      JUPYTERHUB_CRYPT_KEY: putyourkeyhere
    image: jupyterhub_image

    # mount the docker socket
    volumes:
      - "/var/run/docker.sock:/var/run/docker.sock"
      - "mountyourvolumeshere"
    networks:
        - net
    deploy:
      replicas: 1
      placement:
        constraints:
          - node.role == manager

  db:
    image: postgres:14.2
    volumes:
      - db_data:/var/lib/postgresql/data
    restart: always
    env_file: .env
    environment:
      POSTGRES_DB: jupyterhub
      PGDATA: /var/lib/postgresql/data
      POSTGRES_USER: pguser
      POSTGRES_PASSWORD: password
    networks:
        - net
    deploy:
      replicas: 1
      placement:
        constraints:
          - node.role == manager

volumes:
  db_data:

networks:
 net:
  driver: overlay

1 Like

Thanks very much!!Does db_data means the file jupyterhub.sqlite?

It’s a volume that becomes /var/lib/postgresql/data in the db service. This uses pgsql instead of sqlite. sqlite isn’t meant for production.

thanks! i try it now!