Hello everyone and thank in advance for any insights.
We have a dockerized Jupyterhub Server in a VM that uses kubeSpawner to spawn pods in a EKS Cluster. We have successfully managed to set up and roles and binding so the pods get scheduled and run normally.
We can see in the server logs that the reflects times out every 10 seconds:
2022-07-29T13:07:38.330892984Z [D 2022-07-29 13:07:38.330 JupyterHub reflector:362] events watcher timeout
2022-07-29T13:07:38.330932584Z [D 2022-07-29 13:07:38.330 JupyterHub reflector:281] Connecting events watcher
2022-07-29T13:07:38.387520875Z [D 2022-07-29 13:07:38.385 JupyterHub reflector:362] pods watcher timeout
2022-07-29T13:07:38.387822032Z [D 2022-07-29 13:07:38.385 JupyterHub reflector:281] Connecting pods watcher
After a while the reflector “craches” and restarts the hub, leaving users with a false “Kernel Died” message after the pods reconnect to the server. Our logs show the following:
2022-07-29T13:09:54.854557349Z [D 2022-07-29 13:09:54.853 JupyterHub reflector:281] Connecting events watcher
2022-07-29T13:09:54.854593239Z 2022-07-29 13:09:54,854 ERROR:Unclosed client session
2022-07-29T13:09:54.854598264Z client_session: <aiohttp.client.ClientSession object at 0x7f8ff6aba0e0>
2022-07-29T13:09:54.860506657Z [E 2022-07-29 13:09:54.860 JupyterHub reflector:355] Error when watching resources, retrying in 6.4s
2022-07-29T13:09:54.860544778Z Traceback (most recent call last):
2022-07-29T13:09:54.860548988Z File "/opt/conda/lib/python3.10/site-packages/kubespawner/reflector.py", line 285, in _watch_and_update
2022-07-29T13:09:54.860552049Z resource_version = await self._list_and_update()
2022-07-29T13:09:54.860554721Z File "/opt/conda/lib/python3.10/site-packages/kubespawner/reflector.py", line 233, in _list_and_update
2022-07-29T13:09:54.860557606Z for p in initial_resources["items"]
2022-07-29T13:09:54.860560414Z KeyError: 'items'
2022-07-29T13:09:54.860562903Z
2022-07-29T13:09:54.952203843Z [D 2022-07-29 13:09:54.951 JupyterHub reflector:281] Connecting pods watcher
2022-07-29T13:09:54.953328808Z 2022-07-29 13:09:54,953 ERROR:Unclosed client session
2022-07-29T13:09:54.953354075Z client_session: <aiohttp.client.ClientSession object at 0x7f8ff6abb160>
2022-07-29T13:09:54.962749021Z [E 2022-07-29 13:09:54.962 JupyterHub reflector:355] Error when watching resources, retrying in 6.4s
2022-07-29T13:09:54.962790977Z Traceback (most recent call last):
2022-07-29T13:09:54.962842493Z File "/opt/conda/lib/python3.10/site-packages/kubespawner/reflector.py", line 285, in _watch_and_update
2022-07-29T13:09:54.962849532Z resource_version = await self._list_and_update()
2022-07-29T13:09:54.962853490Z File "/opt/conda/lib/python3.10/site-packages/kubespawner/reflector.py", line 233, in _list_and_update
2022-07-29T13:09:54.962857372Z for p in initial_resources["items"]
2022-07-29T13:09:54.962860803Z KeyError: 'items'
and:
2022-07-29T13:10:39.676851297Z [D 2022-07-29 13:10:39.676 JupyterHub reflector:281] Connecting events watcher
2022-07-29T13:10:39.677844198Z 2022-07-29 13:10:39,677 ERROR:Unclosed client session
2022-07-29T13:10:39.677868636Z client_session: <aiohttp.client.ClientSession object at 0x7f8ff6e38670>
2022-07-29T13:10:39.691073966Z [E 2022-07-29 13:10:39.690 JupyterHub reflector:351] Watching resources never recovered, giving up
2022-07-29T13:10:39.691112887Z Traceback (most recent call last):
2022-07-29T13:10:39.691119126Z File "/opt/conda/lib/python3.10/site-packages/kubespawner/reflector.py", line 285, in _watch_and_update
2022-07-29T13:10:39.691123544Z resource_version = await self._list_and_update()
2022-07-29T13:10:39.691127790Z File "/opt/conda/lib/python3.10/site-packages/kubespawner/reflector.py", line 233, in _list_and_update
2022-07-29T13:10:39.691132305Z for p in initial_resources["items"]
2022-07-29T13:10:39.691136425Z KeyError: 'items'
2022-07-29T13:10:39.691140487Z
2022-07-29T13:10:39.692239309Z [C 2022-07-29 13:10:39.691 JupyterHub spawner:2326] Events reflector failed, halting Hub.
2022-07-29T13:10:39.801848470Z 2022-07-29 13:10:39,801 ERROR:Task was destroyed but it is pending!
2022-07-29T13:10:39.801889829Z task: <Task pending name='Task-7' coro=<shared_client.<locals>.close_client_task() running at /opt/conda/lib/python3.10/site-packages/kubespawner/clients.py:58> wait_for=<Future pending cb=[Task.task_wakeup()]>>
2022-07-29T13:10:39.803006694Z Exception ignored in: <coroutine object shared_client.<locals>.close_client_task at 0x7f8ffcb33ae0>
2022-07-29T13:10:39.803030785Z RuntimeError: coroutine ignored GeneratorExit
2022-07-29T13:10:39.928388364Z 2022-07-29 13:10:39,928 ERROR:Task was destroyed but it is pending!
2022-07-29T13:10:39.928428590Z task: <Task pending name='Task-18' coro=<ResourceReflector._watch_and_update() running at /opt/conda/lib/python3.10/site-packages/kubespawner/reflector.py:358> wait_for=<Future pending cb=[Task.task_wakeup()]>>
2022-07-29T13:10:39.929111919Z 2022-07-29 13:10:39,928 ERROR:Unclosed client session
2022-07-29T13:10:39.929154049Z client_session: <aiohttp.client.ClientSession object at 0x7f8ff6ababf0>
2022-07-29T13:10:39.931588579Z 2022-07-29 13:10:39,928 ERROR:Task exception was never retrieved
2022-07-29T13:10:39.931611915Z future: <Task finished name='Task-19' coro=<ResourceReflector._watch_and_update() done, defined at /opt/conda/lib/python3.10/site-packages/kubespawner/reflector.py:241> exception=SystemExit(1)>
2022-07-29T13:10:39.931650706Z Traceback (most recent call last):
2022-07-29T13:10:39.931654982Z File "/opt/conda/lib/python3.10/site-packages/kubespawner/reflector.py", line 285, in _watch_and_update
2022-07-29T13:10:39.931658976Z resource_version = await self._list_and_update()
2022-07-29T13:10:39.931662746Z File "/opt/conda/lib/python3.10/site-packages/kubespawner/reflector.py", line 233, in _list_and_update
2022-07-29T13:10:39.931666185Z for p in initial_resources["items"]
2022-07-29T13:10:39.931669679Z KeyError: 'items'
2022-07-29T13:10:39.931673278Z
2022-07-29T13:10:39.931676808Z During handling of the above exception, another exception occurred:
2022-07-29T13:10:39.931680811Z
2022-07-29T13:10:39.931684454Z Traceback (most recent call last):
2022-07-29T13:10:39.931687724Z File "/opt/conda/lib/python3.10/site-packages/jupyterhub/app.py", line 2999, in launch_instance
2022-07-29T13:10:39.931691323Z loop.start()
2022-07-29T13:10:39.931694635Z File "/opt/conda/lib/python3.10/site-packages/tornado/platform/asyncio.py", line 215, in start
2022-07-29T13:10:39.931698389Z self.asyncio_loop.run_forever()
2022-07-29T13:10:39.931702285Z File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 600, in run_forever
2022-07-29T13:10:39.931706403Z self._run_once()
2022-07-29T13:10:39.931710360Z File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 1896, in _run_once
2022-07-29T13:10:39.931714354Z handle._run()
2022-07-29T13:10:39.931718297Z File "/opt/conda/lib/python3.10/asyncio/events.py", line 80, in _run
2022-07-29T13:10:39.931722613Z self._context.run(self._callback, *self._args)
2022-07-29T13:10:39.931726906Z File "/opt/conda/lib/python3.10/site-packages/kubespawner/reflector.py", line 353, in _watch_and_update
2022-07-29T13:10:39.931730895Z self.on_failure()
2022-07-29T13:10:39.931734809Z File "/opt/conda/lib/python3.10/site-packages/kubespawner/spawner.py", line 2330, in on_reflector_failure
2022-07-29T13:10:39.931739137Z sys.exit(1)
2022-07-29T13:10:39.931742834Z SystemExit: 1
2022-07-29T13:10:39.933957901Z 2022-07-29 13:10:39,932 ERROR:Unclosed client session
2022-07-29T13:10:39.934005449Z client_session: <aiohttp.client.ClientSession object at 0x7f8ff6aba9b0>
We believe it’s a permissions error on the part of the user role inside K8s. This is our role:
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
namespace: jupyterhub
name: kubespawner-role
rules:
- apiGroups: [""]
resources:
- persistentvolumes
- persistentvolumeclaims
- pods
- secrets
- services
- namespaces
- events
verbs:
- get
- watch
- list
- create
- delete
We alsο tried with a ClusterRole but we see the same issues.
Ay nudge to right direction will be much appreciated.
Thank you!