Docker_health check fails intermittently which causes services to be restarted (or fail)

jonakarl · September 3, 2024, 11:04am

Hi,

The health check fails intermittently :

"Log": [
                    {
                        "Start": "2024-09-03T09:31:03.338282842+02:00",
                        "End": "2024-09-03T09:31:04.38001624+02:00",
                        "ExitCode": 1,
                        "Output": "Traceback (most recent call last):\n  File \"/etc/jupyter/docker_healthcheck.py\", line 27, in <module>\n    json_file = next(runtime_dir.glob(\"*server-*.json\"))\n                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\nStopIteration\n"
                    },
                    {
                        "Start": "2024-09-03T09:31:07.381608267+02:00",
                        "End": "2024-09-03T09:31:08.477925226+02:00",
                        "ExitCode": -1,
                        "Output": "Health check exceeded timeout (1s): Traceback (most recent call last):\n  File \"/etc/jupyter/docker_healthcheck.py\", line 27, in <module>\n    json_file = next(runtime_dir.glob(\"*server-*.json\"))\n                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\nStopIteration\n"
                    },
                    {
                        "Start": "2024-09-03T09:31:11.479360283+02:00",
                        "End": "2024-09-03T09:31:12.694599546+02:00",
                        "ExitCode": -1,
                        "Output": "Health check exceeded timeout (1s)"
                    }
                ]

We are not 100% why it sometimes fails but we suspect it is a timing issue since we change user at login (and copy the homefolder of jovyan to this new user).
Our suspicion is that the runtime folder (/home/$NB_USER/.local/jupyter/runtime/ is not populated in time).
Our current workaround is to disable the health check but we are not sure of side effects.

So we have two questions :

Is the health check compatible with changing the user (when this change might be slow) ?
Why is the healthcheck needed in a swarm scenario with jupyterhub? We had the impression that the hub killed services (we run the docker swarm spawner) it could not reach after some timeout (the spawner have some timeout parameters related to spawning a service).

Regards
Jonas

EDIT: forgot to mention that we use the docker-stack notebooks as a base ofr our own custom notebook.

Topic		Replies	Views
Hub docker health check eats up CPU JupyterHub jupyterhub , help-wanted	2	806	September 20, 2023
Jupyterlab running on docker is very active JupyterLab jupyterlab , help-wanted , docker	3	225	April 25, 2024
Can someone help me to implement DB health check in HealthCheckHandler JupyterHub jupyterhub , how-to , help-wanted	0	274	February 2, 2023
Jupyterhub checks for notebook servers at wrong port at startup? JupyterHub	10	1001	March 5, 2022
JupyterHub with SwarmSpawner: failed to load JupyterHubSingleUser server extension JupyterHub jupyterhub , help-wanted	5	655	April 9, 2024

Docker_health check fails intermittently which causes services to be restarted (or fail)

Related topics