Resources allocation understanding

I’m trying to understand how the limits & guarantees are being utilized in my JupyterHub deployment.

According to some research I have made (which of course may not be 100% correct…) on which I came to the conclustion that is “better” to adjust the memory limits equal to the memory guarantees and not setting CPU limits & guarantees at all.

More:

According to all the above, I have made 3 different resources profiles.

  • Low
    cpu_guarantee:
    cpu_limit:
    mem_guarantee: 4G
    mem_limit: 4G

  • Medium
    cpu_guarantee:
    cpu_limit:
    mem_guarantee: 16G
    mem_limit: 16G

  • High
    cpu_guarantee:
    cpu_limit:
    mem_guarantee: 32G
    mem_limit: 32G

For example, if one select the “High” profile and exceeds the 32GB of memory his pod/notebook will be killed…but nope!

Those are not getting killed even when the limits have been surpassed for hours! I quote some screenshots from Grafana.

This data though are not “true”. By investigating the pods under the hood with the below commands I am often observing that the actual consumption is normal:

cat /sys/fs/cgroup/cpu/cpuacct.usage # CPU usage
cat /sys/fs/cgroup/memory/memory.usage_in_bytes # Memory usage

Thus, I have came to a conclution on which I am in a position to say that the kube metrics are not showing the correct picture of the consumption and to be honest, I cannot clearly understand on which side the problem resides. Is it some wrong configuration on JupyterHub? (doubt it). Is it something wrong on Kubernetes configuration? (very possible!). Maybe both? Something else?

Some details about the infrastructure: This a kubadm cluster of 5 nodes (1 master & 4 workers) which are operating as VMs in a Proxmox installation.

Try running kubectl get pod <podname> -o yaml and verify it shows the expected resource limits. If it does then the problem lies beyond the control of JupyterHub/KubeSpawner.