I’m trying to understand how the limits & guarantees are being utilized in my JupyterHub deployment.
According to some research I have made (which of course may not be 100% correct…) on which I came to the conclustion that is “better” to adjust the memory limits equal to the memory guarantees and not setting CPU limits & guarantees at all.
According to all the above, I have made 3 different resources profiles.
For example, if one select the “High” profile and exceeds the 32GB of memory his pod/notebook will be killed…but nope!
Those are not getting killed even when the limits have been surpassed for hours! I quote some screenshots from Grafana.
This data though are not “true”. By investigating the pods under the hood with the below commands I am often observing that the actual consumption is normal:
cat /sys/fs/cgroup/cpu/cpuacct.usage # CPU usage cat /sys/fs/cgroup/memory/memory.usage_in_bytes # Memory usage
Thus, I have came to a conclution on which I am in a position to say that the kube metrics are not showing the correct picture of the consumption and to be honest, I cannot clearly understand on which side the problem resides. Is it some wrong configuration on JupyterHub? (doubt it). Is it something wrong on Kubernetes configuration? (very possible!). Maybe both? Something else?
Some details about the infrastructure: This a kubadm cluster of 5 nodes (1 master & 4 workers) which are operating as VMs in a Proxmox installation.