User-scheduler pod getting OOMKilled

Hi there!
I did a fresh installation of JH via latest helm chart and after few days of experimenting, I see that the user-scheduler pod is getting OOMKilled which prevents my notebook from spawning(KubeSpawner). I have configured user-scheduler pod with default resources(50M memory requests and 500M limits). It works if I increase it 4x, but as per documentation it should not require so many resources.
Also, in the logs of the pod I see that its listening to events across other namespaces as well, but I don’t expect it to, since pods will be spawned in the same namespace.

Thanks for the help

What’s the total resource usage across your cluster? Could it be that other pods are using too much memory, and some lower priority pods are being killed?

My hub, proxy, spawned pods all are under 500M memory and 30M CPU. My only concern is user-scheduler pod processing events from other namespaces.

The user-scheduler pod rund a k8s official kube-dcheduler binary configured to schedule pods on node - specifically pods declared to be scheduled by the user-scheduler. Its purpose is to ensure pods gets packed tight on nodes to help scale down nodes that then can end up unused.

Anyhow, kube-scheduler requires notable amounts of permissions, this is because it inspects a lot of non-namespaced details, such as nodes and their available capacity to schedule a pod on. Due to that, the user-scheduler is granted k8s permissions beyond the namespace.

If you want to pack user pods without user-scheduler, i know for example in GCP, you can make the k8s cluster’s pod scheduler to schedule pods to pack tight on available nodes. If you use a GKE cluster, you could rely on that and disable the user-scheduler for the same effect in other words.

Got it. Thanks for the detailed explanation @consideRatio
I think its abnormal for such small binary to get OOMKilled. Wondering if it really needs 2Gi memory to work…since anything in the range of default 64M till 1k, it dies.

Absolutely, its very unusual.

Your z2jh version used could be relevant to know here for reference, and especially what version of kube-scheduler is used in the user-scheduler pod (it makes use of a k8s official kube-scheduler image!). There could be issues mismatching the kube-schedulers version with the k8s clusters version, but we’ve not got reports about this so far and at least one minor version off should be acceptable i think.

Further, the work kube-scheduler does depends on the amounts of nodes/pods in ghe cluster i think, so if you are running in a very large cluster, i expect it to work harder. But, to require something like 2GB of memory seems unlikely given I recall it often manages with very low amounts of memory.

Were you able to solve this issue?
We are having the exact same problem, user-schedule is running OOM and increasing the resources assigned works for some time, until it doesn’t and we need to upgrade even beyond.