Cost of running a JupyterHub on Kubernetes

Over time many people have asked various versions of “how much should I expect to pay when running JupyterHub on Kubernetes?” Obviously, the answer will be: “It depends on your resource needs and usage patterns”, however are also common patterns that we’ve seen after seeing JupyterHubs deployed in the wild.

In this thread can people share their stories for understanding hub costs? Things like…

  • what kind of hub infrastructure you are using on kubernetes
  • how are you running kubernetes
  • what kinds of users and usage patterns you have
  • what resources you provide to those users
  • how do you handle “human” resources (e.g. who manages the cluster)
  • any issues you’ve run into that affects costs

Hi Chris,

We’re just getting around to thinking about this now and I have a back-of-the-envelope calculation on what it would look like for one of the *.syzygy.ca hubs. I’d welcome corrections and feedback but here is what we were thinking…

If I translate our existing user limits into guarantees and limits for kubernetes then something like this should still provide a roughly equivalent user experience

  • Guarantee: (5000/100000 CPU shares, 512M)
  • Limit: (100000/100000 CPU shares, 2GB)

Inside kubernetes, we probably want separate node pools for the system stuff and the user pods. For the system I’m going to pick something like 3x(2core,8GB). For the user pool, something like N*(4core, 30GB). You can get a rough ideal of the number of users per node based on the guarantees and limits (maximum & minimum).

The maximum case would be for users with lightish workloads and would be 4*100000/5000 = 80 users (by CPU), and 30 / 0.5 = 60 users (by memory). The lowest of these would win so that would be 60 users per node. So 10 nodes would give us 600 users, 20 would give us 1200 etc.

The minimum case is set by the memory which would be 30GB/2GB ~ 15 users in the worst possible case (CPU is handled a little differently).

Empirically, we’re closer to the maximum case, but let’s say 50 active users per c430gb node. The idea is then to build an autoscaling group of these nodes and let it grow to satisfy the needed maximum simultaneous user count. Taking 1000 concurrent as an example (~3*current). That would be 20 nodes. Without autoscaling, just leaving that running 24x7 would be something like $4300/month. (https://cloud.google.com/products/calculator/#id=fe57b857-8721-436b-8353-b5cc2484fe6e) with the vast majority of that being for the (auto-scaled) user nodes.

Yuvi has a lot of information about how the autoscaling behaves in practice, but if we idealized it as 12 nodes for 1/3 of the day and 3 nodes for 2/3 of the day, we get something like (2/3 * 800 + 1/3 * 4300) ~ $2000 USD/Month.

There’s a little overhead on top of that for storage and other things, but I think that’s the right ballpark.
From there the next step would be to compare the userbase for somewhere which would need ~1000 simultaneous users (e.g. UBC) to the target institution. You could probably get a reasonable estimate by just comparing ugrad populations to do that.

2 Likes

AWS using EKS here:

This was actually really good timing. I just was working through an exercise on costing for our current clusters.
Currently we are utilizing a combination of Amazon’s c4 and r4 instance types for our classroom deployments, and r5 instances for our research deployments. I used the instance costs to create some projections that have worked out pretty close to the exact costs.

On-Demand

CPU ~ $33

RAM ~ $1.6

Storage = $30 per 100GB (standard)

Storage = $2.50 per 100GB (Standard IA)

Reserved 1 year no-upfront

CPU ~ $22

RAM ~ $1

Storage = $30 per 100GB (standard)

Storage = $2.50 per 100GB (Standard IA)

Example of how this cost would break down:

Using on-demand pricing
5 researchers

1-2 CPU each pod (guarantee - limit)

2-4 GB RAM per pod (guarantee - limit)

20 GB Storage per pod

1-2 CPU each pod $33 * 10 $330
2-4 GB RAM per pod $1.6 * 20 $32
20 GB Storage per pod $6 * 5 $30
Total: $392

This would be accomplished with 1 c4.2xlarge and a c4.large instance.

There are two other costs that aren’t quite as factorable with AWS. The EKS and ELB. Looking historically, the EKS service is at about $150 a month for our deployment. Take that with a grain of salt, as they are supposed to be slashing the cost of EKS soon :tm: We have ran a self hosted Kubernetes cluster, and the costs were about on par for EKS. Considering requiring a bastion and kubernetes master that are barely utilized, having the service hosted really took a layer out of the administration that is much worth it.
ELB costs aren’t straightforward as well. Depending on the usage and controller, you could be looking at anything between $20 - $100. Ours has typically been on the higher end. We’ve opted for the classic load balancer, but the cost is marginal over the application load balancer.

There’s a couple other ways that costs can be lowered in AWS such as using:

  • spot instances
  • all upfront reserved instances
  • savings plans
  • EFS lifecycle policies

From my limited time managing these clusters it seems like the cost can really change depending on the use. Class-room usage is usually on par with research usage cost, since the classroom will have 10x the number of active pods (if not more) than a research group. The research group however will require the pods to be running for longer periods of time.

Not sure how many are hosting using AWS, but I hope it helps.

3 Likes