Auto-scaling based on CPU-usage?

I’m very new to this, so I hope this question makes sense…

I’ve been working through the excellent “Zero to JupyterHub with Kubernetes” tutorial using Google Cloud Platform. My Hub is running and everything works nicely, but I’m struggling to achieve successful auto-scaling.

I’ve created an auto-scaling user node pool, as described in Step 7 of the tutorial here, and I’ve also modified config.yaml based on the recommendations here.

I’m not sure what I should expect from this, but so far I have not yet managed to trigger any upscaling of the cluster. My basic user node pool comprises one Google n1-standard-4 machine with 4 CPUs and 15 GB of RAM. If I log-in to the hub and start some intensive processing, I can see from the Google Cloud dashboard that this machine is fully occupied (i.e. CPU usage at 99.9%). If another user logs in, I was hoping that the auto-scaler might launch another node, but it seems I’m misunderstanding?

From what I can understand of the Kubernetes documentation, it sounds as though CPU usage might not actually trigger auto-scaling? Is there an alternative? For that matter, if a single user launches some intensive (“embarrassingly parallel”) processing, is it possible to upscale their node, or will each user always be limited by the machine type of the single node their pod is on?

As you can probably tell, I have a lot to learn about cloud computing! However, one very appealing aspect of JupyterHub for me is the potential to upscale beyond the capabilities of my laptop for short periods of intensive simulation. I would love to get this working if it’s possible!

Thank you :slight_smile:

1 Like

The kubernetes scheduler looks at the resources assigned to a pod in its specification. It doesn’t look at how much CPU it actually uses.

Take a look at https://kubernetes.io/docs/tasks/configure-pod-container/assign-cpu-resource/ (and similar for RAM).

To find out what resources are assigned to your user pod use kubectl describe <podname> --namespace=<namespaceyoudeployedyourhubto>. It will list how much CPU and RAM is promised to your pod as well as what its upper limit is. Only if the promised amount of CPU/RAM of all pods in a node pool exceeds the resources will the auto-scaler trigger a scale up. Take a look at kubectl describe node <nameofanodeinyourcluster> which will tell you the guarantees and limits for a node.

Thanks @betatim - that helps!

So, if I’ve understood correctly, the auto-scaler bases its decision about whether to upscale on the amount of memory/CPU that is guaranteed to an incoming user (not the limit value), compared to the remaining resources on the node?

I guess that makes sense, but in that case I might need to try a different approach. Looking at this link I see I can define different profiles that my users can choose from. Do you know if it’s possible to assign these profiles to different node pools on the Kubernetes cluster, please?

For example, maybe I could setup several user node pools with different machine types (e.g. “Standard”, “High CPU” and “High memory”) and users could choose which they want when they log-in. By setting appropriate guarantees and limits in my config.yaml (as per the link) I could make sure that any users choosing “High CPU” or “High Memory” were directed to an appropriate machine/node, whereas “Standard” users would be allocated to a shared node to reduce costs.

It looks as though this should be possible, but I must be missing some settings somewhere. I’ve repeated the code from Step 7 here to create a new node pool with extra resources (n1-highcpu-16 machines instead of n1-standard-4), but this isn’t being picked up by my JupyterHub: when I log-in to the “High CPU” profile, I get a message saying 0/2 nodes are available; Insufficient cpu, whereas in fact I now have a third node pool with high CPU resources.

I can’t see anything in config.yaml that points explicitly to my user node pool(s). Is this possible at the level of config.yaml, or do I need to make changes deeper down in the configuration?

Thanks again for your help so far!

What you want to do is possible. However I don’t know off the top of my head how (or where to point you to). @consideRatio has donne a lot of work on this so maybe he can direct you somewhere quickly.

In principle you should be able to assign different guaranteed values to different users/profiles the user selected. Like you suggest. Then the scheduler would automatically select the best node to run that pod on.

Thanks once again for your reply, @betatim!

I spent last night reading the docs in more detail, especially regarding taints and tolerations, and I feel like I’m making progress, but I still can’t quite get it to work. If @consideRatio or anyone else can point out what I’m doing wrong below, I’d be very grateful!

After creating my basic cluster, I create a node pool for my users, exactly as in the z2jh tutorial:

gcloud beta container node-pools create user-pool \
  --cluster niva-jhub \
  --zone europe-west1-b \
  --machine-type n1-standard-4 \
  --num-nodes 0 \
  --enable-autoscaling \
  --min-nodes 0 \
  --max-nodes 10 \
  --node-labels hub.jupyter.org/node-purpose=user \
  --node-taints hub.jupyter.org_dedicated=user:NoSchedule

Then, I’m creating a second pool of more powerful machines, using the same 'user' label as above, but a different taint:

gcloud beta container node-pools create user-pool-hi-cpu \
  --cluster niva-jhub \
  --zone europe-west1-b \
  --machine-type n1-highcpu-16 \
  --num-nodes 0 \
  --enable-autoscaling \
  --min-nodes 0 \
  --max-nodes 10 \
  --node-labels hub.jupyter.org/node-purpose=user \
  --node-taints hub.jupyter.org_dedicated=user-hi-cpu:NoSchedule

Then, in config.yaml, I’ve defined two user 'profiles':

singleuser:
  defaultUrl: "/lab"
  image:
    name: my_image
    tag: my_tag
  profileList:
    - display_name: "Standard (default)"
      description: |
        <4 CPUs, <12 GB RAM, no GPU. Resources allocated dynamically. Use this unless you have special requirements.
      default: True
      kubespawner_override:
        cpu_limit: 4
        cpu_guarantee: 0.05
        mem_limit: 12G
        mem_guarantee: 512M
        tolerations: [
                      {
                       'key': 'hub.jupyter.org_dedicated',
                       'operator': 'Equal',
                       'value': 'user',
                       'effect': 'NoSchedule'
                      }
                     ]
        extra_resource_limits: {}
        start_timeout: 600
    - display_name: "High CPU"
      description: |
        Access to 16 CPUs, 12 GB RAM, no GPU. For CPU-heavy processing; expensive!
      kubespawner_override:
        cpu_limit: 16
        cpu_guarantee: 12
        mem_limit: 12G
        mem_guaranttee: 8G
        tolerations: [
                      {
                       'key': 'hub.jupyter.org_dedicated',
                       'operator': 'Equal',
                       'value': 'user-hi-cpu',
                       'effect': 'NoSchedule'
                      }
                     ]
        extra_resource_limits: {}
        start_timeout: 600

And I’ve also enabled scheduling etc., as in the tutorial:

prePuller:
  continuous:
    enabled: true

scheduling:
  userScheduler:
    enabled: true
  podPriority:
    enabled: true
  userPlaceholder:
    enabled: true
    replicas: 1
  userPods:
    nodeAffinity:
      # matchNodePurpose valid options:
      # - ignore
      # - prefer (the default)
      # - require
      matchNodePurpose: require

cull:
  enabled: true
  timeout: 3600
  every: 300

When I try to log-in to my Hub, even using the 'standard' user with low resource guarantees, I see messages saying

0/2 nodes are available: 1 node(s) didn't match node selector, 1 node(s) had taints that the pod didn't tolerate

and

pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 node(s) had taints that the pod didn't tolerate

I’m obviously specifying my taints and/or tolerations incorrectly, or I’ve messed up the scheduling some other way. I’ve tried lots of permutations and no success so far :frowning_face: Any tips from those that actually know what they’re doing would be much appreciated! Thanks :slightly_smiling_face:

Hi JES!

It seems like you have indeed managed to set some taints on the nodes, but that the tolerations for the pods are failing somehow. I’d really triple check that the actual tolerations set on the pod that is pending and failing to schedule.

Overall in this situation, I suggest to inspect the pod that fails to schedule and the nodes that are available to better grasp if everything makes sense.

# look for the tolerations being set on the pod
# look for the nodeSelector and/or required nodeAffinity on the pod
kubectl get pod -n <my-jhub-namespace> <my-pod-name> -o yaml

# look for the taints on the nodes you have, do the pod tolerate this taint?
# look for the labels on the nodes you have, do they match nodeSelector/required nodeAffinity?
kubectl get node <my-node-name> -o yaml

PS: I wrote some related debugging tips recently here: https://github.com/pangeo-data/pangeo-cloud-federation/issues/264#issuecomment-489926874.

1 Like

Hey btw JES, I got to thinking and I have a concrete suggestion:

  1. Let both the high and normal CPU node pools have the hub.jupyter.org_dedicated=user:NoSchedule taint
  2. Don’t manually specify the toleration, it is added by default
  3. Label the normal user pool something to indicate its a normal pool
  4. Label the high cpu user pool to indicate it is a high cpu pool
  5. Override the node_selector of the different profiles to match with the label you gave the normal and the high cpu node pool, based on what kind of node pool u want to schedule on.

Regarding autoscaling based on CPU etc. This is not easy, to understand how hard it can be, first describe to yourself very concretely what behavior you want. Should you add a node if it is very highly used for 1 second? Somewhat loaded for 1 minute? No matter what pod caused this load? Should you add pods based on the pods usage?

There are some tools to accomplish various things, and I’m uncertain if some of these or any at all would be suitable here. In general for this use case where we have pods that we cannot just thrown away, it is very hard and I would not advice pursuing this unless you are very very motivated and willing to spend a lot of time.

Hi @consideRatio, Thank you very much for these replies - everything is working now :grinning: :tada: Actually, I’ve been Googling around this topic for a few days, and several of your previous posts have helped me enormously, so thanks for those too :+1:

I finally managed to get everything going by following the link in your first post above. I think my current solution may be unnecessarily complicated (the suggestion in your second post seems cleaner), but I think I’ll leave things as they are for now and revisit when I’ve gained more familiarity with Google Cloud and Kubernetes.

Regarding auto-scaling based on CPU usage, I take your point as to why this is difficult. I was pursuing this idea before I discovered the userProfiles option, which I think will be fine for my use case. Most of my users have low resource requirements, and those wanting more will almost always know about it before they start their session. With my current setup, I can make a range of options available and users can choose. If they decide partway through that they need more power, they can simply log-out and then sign back in on a more powerful machine. I didn’t realise this was possible at first, which is why I was looking for some way of up-scaling to more powerful machines (not just more nodes) as resources become limited.

In case anyone else is having similar issues, below is an extract from my current configuration. However, please note the caveat that the solution from @consideRatio, above, is probably a better/simpler way to achieve the same thing.

Right, I’m off to play with my new Hub :grinning:


Setup overview
First, create the cluster and a default pool with the core label

gcloud container clusters create my-cluster-name \
  --machine-type=n1-standard-4 \
  --num-nodes=1 \
  --zone=europe-west1-b \
  --cluster-version=latest  \
  --node-labels=hub.jupyter.org/node-purpose=core

Then create the “standard” user pool, as per the tutorial

gcloud container node-pools create user-pool \
  --cluster=my-cluster-name \
  --machine-type=n1-standard-4 \
  --zone=europe-west1-b \
  --num-nodes=0 \
  --enable-autoscaling \
  --min-nodes=0 \
  --max-nodes=10 \
  --node-labels=hub.jupyter.org/node-purpose=user \
  --node-taints=hub.jupyter.org_dedicated=user:NoSchedule    

Next, create a pool of more powerful machines for users needing more CPUs. Note that I’ve included the default JupyterHub label and taint, plus a new label and taint defining this particular group of users (user-hi-cpu)

gcloud container node-pools create user-pool-hi-cpu \
  --cluster=my-cluster-name \
  --machine-type=n1-highcpu-16 \
  --zone=europe-west1-b \
  --num-nodes=0 \
  --enable-autoscaling \
  --min-nodes=0 \
  --max-nodes=10 \
  --node-labels=hub.jupyter.org/node-purpose=user,my-dedicated=user-hi-cpu \
  --node-taints=hub.jupyter.org_dedicated=user:NoSchedule,my-dedicated=user-hi-cpu:NoSchedule

Then, in config.yaml, I have the following

singleuser:
  # ... Other singleuser config.
  profileList:
    - display_name: "Standard (default)"
      description: |
        <4 CPUs, <12 GB RAM, no GPU. Resources allocated dynamically. Use this unless you have special requirements.
      default: True
      kubespawner_override:
        cpu_limit: 4
        cpu_guarantee: 0.05
        mem_limit: 15G
        mem_guarantee: 512M
        start_timeout: 900
    - display_name: "High CPU (new node)"
      description: |
        Access to 16 CPUs, 12 GB RAM, no GPU. For CPU-heavy processing
      kubespawner_override:
        cpu_limit: 16
        cpu_guarantee: 12
        mem_limit: 15G
        mem_guarantee: 8G
        start_timeout: 900
        tolerations:
          - effect: NoSchedule
            key: hub.jupyter.org_dedicated
            operator: Equal
            value: user
          - effect: NoSchedule
            key: my-dedicated
            operator: Equal
            value: user-hi-cpu

Finally, I also have this in config.yaml to ensure that “user” and and “core” pods get allocated correctly

prePuller:
  continuous:
    enabled: true

scheduling:
  userScheduler:
    enabled: true
  podPriority:
    enabled: true
  userPlaceholder:
    enabled: true
    replicas: 1
  userPods:
    nodeAffinity:
      matchNodePurpose: require
  corePods:
    nodeAffinity:
      matchNodePurpose: require

cull:
  enabled: true
  timeout: 3600
  every: 300
3 Likes

Oh that is really motivating to hear and it makes me happy, thanks for your encouragement @JES!

I appreciate your warm enthusiasm and how you write down what you have ended up doing, those kinds of post tend to be really helpful to me!

Lots of appreciation /erik!

@consideRatio @JES Thanks for the useful discussion here - recently got our own k8s z2jh deployment up and running with per-user resource provisioning that assigned jobs to specific resource groups based on the ideas from this discussion. Ended up implementing the override of the NodeSelector as suggested, selecting on named node-group pools for a given profile. Worth mentioning to remember to include the label
k8s.io/cluster-autoscaler/node-template/label/nodegroup-name: "my-nodegroup-name"
in the node group definitions for your workers, otherwise the autoscaler won’t find it (if you use the nodegroup-name as the selector, anyway).