Pending hook-image-puller pods

jonasswe · April 4, 2023, 7:07am

We have recently ran into a problem that has gotten worse and worse.
We run our own custom jupyterhub image (singleuser) in an Azure AKS Kubernetes cluster with autoscaling enabled. We deploy jupyterhub with helm with a new imagetag every time we built a new custom jupyterhub image.

The problem we have started to run into recently when we deploy with helm is that one or more hook-image-puller pods are stuck in pending state and not being scheduled. They should schedule and run and then dissappear.

kubectl describe pod:

The nodes doesn’t have any taints.

Should these pods trigger a scale up?

k8s version is 1.24.6, we are using helm version 3.11.1 and z2jh jupyterhub helm chart 2.0.0
“helm upgrade --install” is used during the deploy.

jonasswe · April 21, 2023, 3:18pm

I am bumping this post

manics · April 23, 2023, 2:14pm

Do you only have this problem when the limitation is Too many pods, or do you also have the same problem when you hit the CPU/memory limit for a node?

jonasswe · April 24, 2023, 12:48pm

@manics

Yes, this is the standard event message from “describe pod” on the pending pod. It is like the kubernetes scheduler is looking for pods to evict but can’t find one but don’t instead trigger a scale up so the pods can schedule on a new, fresh node?

Note:
There are other apps/services besied Jupyterhub running on this nodepool

manics · April 24, 2023, 4:15pm

I meant can you reproduce this problem by hitting the memory/CPU limit on the node, but when the pod limit hasn’t been hit? This will help to narrow down whether it’s a general problem that occurs with all resources, or if it’s only a problem when you hit the number of pods limit.

Could you also try and reproduce this on a seperate cluster with no other applications running, in case there’s some interaction with the non-Z2JH pods?

jonasswe · April 25, 2023, 6:06am

@manics

That would be something to try out but I am not sure that there is a lack of resource problem. What is clear though is that the only node that hasn’t a hook-image-puller pod scheduled is maxed out with 30 pods which is the max in this nodepool. It seems that this prevents the pod to schedule on that node and not trigger a scale up?

jonasswe · April 25, 2023, 7:42am

It is clearly a number of pods problem. I just deleted a pod in a non jupyterhub related deployment on the node which the hook-image-puller can’t be scheduled on and voila it could no schedule. It seems like the hook-image-puller daemonset reiles on there is always free space for more pods on nodes and doesn’t trigger a scale up?

Topic		Replies	Views
Spawn timeouts on autoscaling Zero to JupyterHub on Kubernetes	5	630	December 16, 2021
Installation runs without error messages, but hub pod remains "Pending" Zero to JupyterHub on Kubernetes	7	1799	April 3, 2021
User's pod with pending state in JupyterHub JupyterHub jupyterhub , help-wanted	2	241	April 10, 2024
Pod/hook-image-awaiter failed during the deployment of JupyterHub on AKS Zero to JupyterHub on Kubernetes jupyterhub , help-wanted	2	367	January 10, 2024
Singleuser pods stuck in Pending Zero to JupyterHub on Kubernetes	12	5984	July 9, 2023

Pending hook-image-puller pods

Related topics