Docker_health check fails intermittently which causes services to be restarted (or fail)

Hi,

The health check fails intermittently :

"Log": [
                    {
                        "Start": "2024-09-03T09:31:03.338282842+02:00",
                        "End": "2024-09-03T09:31:04.38001624+02:00",
                        "ExitCode": 1,
                        "Output": "Traceback (most recent call last):\n  File \"/etc/jupyter/docker_healthcheck.py\", line 27, in <module>\n    json_file = next(runtime_dir.glob(\"*server-*.json\"))\n                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\nStopIteration\n"
                    },
                    {
                        "Start": "2024-09-03T09:31:07.381608267+02:00",
                        "End": "2024-09-03T09:31:08.477925226+02:00",
                        "ExitCode": -1,
                        "Output": "Health check exceeded timeout (1s): Traceback (most recent call last):\n  File \"/etc/jupyter/docker_healthcheck.py\", line 27, in <module>\n    json_file = next(runtime_dir.glob(\"*server-*.json\"))\n                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\nStopIteration\n"
                    },
                    {
                        "Start": "2024-09-03T09:31:11.479360283+02:00",
                        "End": "2024-09-03T09:31:12.694599546+02:00",
                        "ExitCode": -1,
                        "Output": "Health check exceeded timeout (1s)"
                    }
                ]

We are not 100% why it sometimes fails but we suspect it is a timing issue since we change user at login (and copy the homefolder of jovyan to this new user).
Our suspicion is that the runtime folder (/home/$NB_USER/.local/jupyter/runtime/ is not populated in time).
Our current workaround is to disable the health check but we are not sure of side effects.

So we have two questions :

  1. Is the health check compatible with changing the user (when this change might be slow) ?
  2. Why is the healthcheck needed in a swarm scenario with jupyterhub? We had the impression that the hub killed services (we run the docker swarm spawner) it could not reach after some timeout (the spawner have some timeout parameters related to spawning a service).

Regards
Jonas

EDIT: forgot to mention that we use the docker-stack notebooks as a base ofr our own custom notebook.