Hi, we’re using zero-to-jupyterhub-k8s 0.11.1 with jupyterhub 1.3.0, jupyterlab 3.1.7 and jupyter-server 1.10.2. Our singleuser-notebook pods are backed by object storage for the file system rather than persistent volumes. From time to time there is a hiccup with the object storage connection which breaks the server and we see an error like this in the logs:
Sep 1 09:41:43 jupyter-60fa26b0bdc80340c8a98b6a notebook WARNING WARNING 2021-09-01T14:41:43.234Z [SingleUserLabApp handlers:603] No such file or directory:
The pod status is
Running but when trying to exec into the pod we get an error, something like this:
error: Internal error occurred: error executing command in container: failed to exec in container: failed to start exec “63f044962ed1edbb5c60378873bf998d4a8dbc2465fd29f6c6f93c844332c34e”: OCI runtime exec failed: exec failed: container_linux.go:380: starting container process caused: chdir to cwd ("/home/jovyan") set in config.json failed: transport endpoint is not connected: unknown
Deleting the pod and having the user restart the notebook server resolves that issue. Ideally with the jupyterlab interface we direct users to the hub control panel to stop and start their notebook server pod themselves, however in this case users were getting a
Directory not found error when trying to load the
File menu which prevented them from getting to the
Hub Control Panel.
What I’m wondering is if there is a way to write some kind of custom liveness probe and package it into the notebook server app pod such that it will kill the pod if it fails to work with the file system (s3fs). I looked through the jupyter-server docs and config options but some kind of supported hook didn’t really stand out to me there. I saw the
extra_services option but that looks more like adding API handler extensions to the server web app which isn’t what we’re thinking of here. Are there other better hook points to add something like this, or are we better off just writing a script that runs on a cron within the notebook server image?
Thanks for any help.