Reflector times out and eventually restarts Jypyter Server

Hello everyone and thank in advance for any insights.

We have a dockerized Jupyterhub Server in a VM that uses kubeSpawner to spawn pods in a EKS Cluster. We have successfully managed to set up and roles and binding so the pods get scheduled and run normally.

We can see in the server logs that the reflects times out every 10 seconds:

2022-07-29T13:07:38.330892984Z [D 2022-07-29 13:07:38.330 JupyterHub reflector:362] events watcher timeout
2022-07-29T13:07:38.330932584Z [D 2022-07-29 13:07:38.330 JupyterHub reflector:281] Connecting events watcher
2022-07-29T13:07:38.387520875Z [D 2022-07-29 13:07:38.385 JupyterHub reflector:362] pods watcher timeout
2022-07-29T13:07:38.387822032Z [D 2022-07-29 13:07:38.385 JupyterHub reflector:281] Connecting pods watcher

After a while the reflector “craches” and restarts the hub, leaving users with a false “Kernel Died” message after the pods reconnect to the server. Our logs show the following:

2022-07-29T13:09:54.854557349Z [D 2022-07-29 13:09:54.853 JupyterHub reflector:281] Connecting events watcher
2022-07-29T13:09:54.854593239Z 2022-07-29 13:09:54,854 ERROR:Unclosed client session
2022-07-29T13:09:54.854598264Z client_session: <aiohttp.client.ClientSession object at 0x7f8ff6aba0e0>
2022-07-29T13:09:54.860506657Z [E 2022-07-29 13:09:54.860 JupyterHub reflector:355] Error when watching resources, retrying in 6.4s
2022-07-29T13:09:54.860544778Z     Traceback (most recent call last):
2022-07-29T13:09:54.860548988Z       File "/opt/conda/lib/python3.10/site-packages/kubespawner/reflector.py", line 285, in _watch_and_update
2022-07-29T13:09:54.860552049Z         resource_version = await self._list_and_update()
2022-07-29T13:09:54.860554721Z       File "/opt/conda/lib/python3.10/site-packages/kubespawner/reflector.py", line 233, in _list_and_update
2022-07-29T13:09:54.860557606Z         for p in initial_resources["items"]
2022-07-29T13:09:54.860560414Z     KeyError: 'items'
2022-07-29T13:09:54.860562903Z
2022-07-29T13:09:54.952203843Z [D 2022-07-29 13:09:54.951 JupyterHub reflector:281] Connecting pods watcher
2022-07-29T13:09:54.953328808Z 2022-07-29 13:09:54,953 ERROR:Unclosed client session
2022-07-29T13:09:54.953354075Z client_session: <aiohttp.client.ClientSession object at 0x7f8ff6abb160>
2022-07-29T13:09:54.962749021Z [E 2022-07-29 13:09:54.962 JupyterHub reflector:355] Error when watching resources, retrying in 6.4s
2022-07-29T13:09:54.962790977Z     Traceback (most recent call last):
2022-07-29T13:09:54.962842493Z       File "/opt/conda/lib/python3.10/site-packages/kubespawner/reflector.py", line 285, in _watch_and_update
2022-07-29T13:09:54.962849532Z         resource_version = await self._list_and_update()
2022-07-29T13:09:54.962853490Z       File "/opt/conda/lib/python3.10/site-packages/kubespawner/reflector.py", line 233, in _list_and_update
2022-07-29T13:09:54.962857372Z         for p in initial_resources["items"]
2022-07-29T13:09:54.962860803Z     KeyError: 'items'

and:

2022-07-29T13:10:39.676851297Z [D 2022-07-29 13:10:39.676 JupyterHub reflector:281] Connecting events watcher
2022-07-29T13:10:39.677844198Z 2022-07-29 13:10:39,677 ERROR:Unclosed client session
2022-07-29T13:10:39.677868636Z client_session: <aiohttp.client.ClientSession object at 0x7f8ff6e38670>
2022-07-29T13:10:39.691073966Z [E 2022-07-29 13:10:39.690 JupyterHub reflector:351] Watching resources never recovered, giving up
2022-07-29T13:10:39.691112887Z     Traceback (most recent call last):
2022-07-29T13:10:39.691119126Z       File "/opt/conda/lib/python3.10/site-packages/kubespawner/reflector.py", line 285, in _watch_and_update
2022-07-29T13:10:39.691123544Z         resource_version = await self._list_and_update()
2022-07-29T13:10:39.691127790Z       File "/opt/conda/lib/python3.10/site-packages/kubespawner/reflector.py", line 233, in _list_and_update
2022-07-29T13:10:39.691132305Z         for p in initial_resources["items"]
2022-07-29T13:10:39.691136425Z     KeyError: 'items'
2022-07-29T13:10:39.691140487Z
2022-07-29T13:10:39.692239309Z [C 2022-07-29 13:10:39.691 JupyterHub spawner:2326] Events reflector failed, halting Hub.
2022-07-29T13:10:39.801848470Z 2022-07-29 13:10:39,801 ERROR:Task was destroyed but it is pending!
2022-07-29T13:10:39.801889829Z task: <Task pending name='Task-7' coro=<shared_client.<locals>.close_client_task() running at /opt/conda/lib/python3.10/site-packages/kubespawner/clients.py:58> wait_for=<Future pending cb=[Task.task_wakeup()]>>
2022-07-29T13:10:39.803006694Z Exception ignored in: <coroutine object shared_client.<locals>.close_client_task at 0x7f8ffcb33ae0>
2022-07-29T13:10:39.803030785Z RuntimeError: coroutine ignored GeneratorExit
2022-07-29T13:10:39.928388364Z 2022-07-29 13:10:39,928 ERROR:Task was destroyed but it is pending!
2022-07-29T13:10:39.928428590Z task: <Task pending name='Task-18' coro=<ResourceReflector._watch_and_update() running at /opt/conda/lib/python3.10/site-packages/kubespawner/reflector.py:358> wait_for=<Future pending cb=[Task.task_wakeup()]>>
2022-07-29T13:10:39.929111919Z 2022-07-29 13:10:39,928 ERROR:Unclosed client session
2022-07-29T13:10:39.929154049Z client_session: <aiohttp.client.ClientSession object at 0x7f8ff6ababf0>
2022-07-29T13:10:39.931588579Z 2022-07-29 13:10:39,928 ERROR:Task exception was never retrieved
2022-07-29T13:10:39.931611915Z future: <Task finished name='Task-19' coro=<ResourceReflector._watch_and_update() done, defined at /opt/conda/lib/python3.10/site-packages/kubespawner/reflector.py:241> exception=SystemExit(1)>
2022-07-29T13:10:39.931650706Z Traceback (most recent call last):
2022-07-29T13:10:39.931654982Z   File "/opt/conda/lib/python3.10/site-packages/kubespawner/reflector.py", line 285, in _watch_and_update
2022-07-29T13:10:39.931658976Z     resource_version = await self._list_and_update()
2022-07-29T13:10:39.931662746Z   File "/opt/conda/lib/python3.10/site-packages/kubespawner/reflector.py", line 233, in _list_and_update
2022-07-29T13:10:39.931666185Z     for p in initial_resources["items"]
2022-07-29T13:10:39.931669679Z KeyError: 'items'
2022-07-29T13:10:39.931673278Z
2022-07-29T13:10:39.931676808Z During handling of the above exception, another exception occurred:
2022-07-29T13:10:39.931680811Z
2022-07-29T13:10:39.931684454Z Traceback (most recent call last):
2022-07-29T13:10:39.931687724Z   File "/opt/conda/lib/python3.10/site-packages/jupyterhub/app.py", line 2999, in launch_instance
2022-07-29T13:10:39.931691323Z     loop.start()
2022-07-29T13:10:39.931694635Z   File "/opt/conda/lib/python3.10/site-packages/tornado/platform/asyncio.py", line 215, in start
2022-07-29T13:10:39.931698389Z     self.asyncio_loop.run_forever()
2022-07-29T13:10:39.931702285Z   File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 600, in run_forever
2022-07-29T13:10:39.931706403Z     self._run_once()
2022-07-29T13:10:39.931710360Z   File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 1896, in _run_once
2022-07-29T13:10:39.931714354Z     handle._run()
2022-07-29T13:10:39.931718297Z   File "/opt/conda/lib/python3.10/asyncio/events.py", line 80, in _run
2022-07-29T13:10:39.931722613Z     self._context.run(self._callback, *self._args)
2022-07-29T13:10:39.931726906Z   File "/opt/conda/lib/python3.10/site-packages/kubespawner/reflector.py", line 353, in _watch_and_update
2022-07-29T13:10:39.931730895Z     self.on_failure()
2022-07-29T13:10:39.931734809Z   File "/opt/conda/lib/python3.10/site-packages/kubespawner/spawner.py", line 2330, in on_reflector_failure
2022-07-29T13:10:39.931739137Z     sys.exit(1)
2022-07-29T13:10:39.931742834Z SystemExit: 1
2022-07-29T13:10:39.933957901Z 2022-07-29 13:10:39,932 ERROR:Unclosed client session
2022-07-29T13:10:39.934005449Z client_session: <aiohttp.client.ClientSession object at 0x7f8ff6aba9b0>

We believe it’s a permissions error on the part of the user role inside K8s. This is our role:

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  namespace: jupyterhub
  name: kubespawner-role
rules:
- apiGroups: [""]
  resources:
  - persistentvolumes
  - persistentvolumeclaims
  - pods
  - secrets
  - services
  - namespaces
  - events
  verbs:
  - get
  - watch
  - list
  - create
  - delete

We alsο tried with a ClusterRole but we see the same issues.

Ay nudge to right direction will be much appreciated.

Thank you!