Question: how to change machine type of an existing JupyterHub on K8s

yaojenkuo · September 20, 2020, 1:27am

Hi,

I am currently running a JupyterHub on K8s following the amazing Z2JH documentation. The cluster is set up with a default-pool and a user-pool. Both are n1-standard-2 machine types. Given a class around 80+ students, I have to downgrade per-user RAM guarantee to 64M to accommodate all students to login during lecture.
I’ve tried resizing cluster to increase the max number of nodes on user-pool, but I hit the GCP quota limit(I am still requesting more quota limit through local sales team at the same time.) So I am wondering if migrating the user-pool to n1-highmem-4 is an alternative?
Is there any resource recommended to study regarding the “change of machine type” of an existing JupyterHub on K8s? I’ve tried https://cloud.google.com/kubernetes-engine/docs/tutorials/migrating-node-pool but did not succeed.

mr_z_ro · September 22, 2024, 2:11am

@yaojenkuo I know it’s 4 years later, but I’d love to understand what resources you found most valuable in figuring this out. I’m in a virtually identical situation and would love any tips you or others might have here.

consideRatio · September 22, 2024, 6:27am

1:8 cpu to RAM ratio:

Use of highmem machines, with for example 4 CPU and 32 GB memory is far better to be cost effective than a 4 CPU 16GB choice with many users on a z2jh deployment starting pods om the same machines.

Specific machine choices:

On AWS, use r machines, for example r7i. R is highmem, 7 is more recent cpu than 6, i is intel.
On GCP, use n4-highmem machines, where r on aws is like highmem on GCP
On Azure, use Standard E v5 machines, where E is like r/highmem and v5 is more recent than v3 etc.

Intel/AMD/Arm:

Use of Arm or AMD machines instead of intel based CPUs is a choice as well and could be considered, but i lack experience trying them. I think everything should work fine in jupyterhub etc, but user images need to support Arm based images explicitly.

I believe Arm may be more cost effective overall, but also may reduce functional docker images in user environments.

Users per node:

I recommend picking a node size that fits at least 10% of your max users and at least a few users per node no matter what, maybe 8 for example. The more users per node, the better sharing CPU, making it more efficient.

Its reasonable to have up to around 100 user on a node, depending on what kind of CPU usage patterns they have. AWS nodes can not always allow 100 or so users, smaller nodes only allow 27 or so, and a bit larger can twice as much etc.

Its a bit topic, but overall use a node with 8 times GB RAM over memory and fit many users per node

mr_z_ro · September 25, 2024, 5:39pm

Thanks @consideRatio this is very helpful guidance.

I’ve started with an n4-highmem-4 machine, and am getting strange errors from the spawner when it tries to attach the associated pvc for a user.

It appears to match the case in this stackoverflow response, based on linked GKE documentation:

Regional persistent disks are restricted from being used with memory-optimized machines or compute-optimized machines.

Consider using a non-regional persistent disk storage class if using a regional persistent disk is not a hard requirement. If using a regional persistent disk is a hard requirement, consider scheduling strategies such as taints and tolerations to ensure that the Pods that need regional PD are scheduled on a node pool that are not optimized machines.

Is it the case then that I must update the PVC type per the z2jh documentation?

Have you run into this? Any suggestions? Although it’s GKE specific, it seems like this should be part of the core documentation.

mr_z_ro · September 25, 2024, 5:46pm

It appears this was a similar case (but perhaps not identical) where indeed the solution was to set up a custom storage class config:

And indeed the docs have some notes on GKE and defaults as well, with instructions on how to set that up:

mr_z_ro · September 25, 2024, 5:53pm

Ok, here’s the core issue from the Supported Disk Types for n4 documentation:

N4 does not support Persistent Disk or Local SSD. Read Move your workload from an existing VM to a new VM to migrate your Persistent Disk resources to a newer machine series.

tl;dr Switching to n2-highmem-4 (which supports pd-standard) worked well for me.

Topic		Replies	Views
Background for JupyterHub / Kubernetes cost calculations? JupyterHub	5	1249	July 23, 2020
Deploying Jupyterhub on GPU JupyterHub	1	149	October 8, 2024
Minimum specs for JupyterHub infrastructure VMs? JupyterHub	4	1457	July 24, 2020
Auto-scaling based on CPU-usage? General jupyterhub	9	4287	May 8, 2020
GCE specifics for Berkeley DataHub setup? JupyterHub	2	434	December 29, 2020

Question: how to change machine type of an existing JupyterHub on K8s

Related topics