Rescheduling single-user pods

peterrmah · November 3, 2020, 5:50pm

I have a deployment of JupyterHub on AWS EKS with multiple nodes.

An issue that my team has been running in to is that after a class is over, our nodes begin vacating as users logout or the culler removes inactive pods. Approximately an hour after the class is over, most single-user pods have been shut down one way or another. However, there are always a few users keeping their pods active hours after the class has finished. These remaining users’ pods are generally spread sparsely between cluster nodes, often only one or two single-user active pods remaining on each node. This scenario has been preventing the autoscaler from reducing the node count.

In the interest of minimizing server costs, we are looking for a solution to reschedule the sparsely distributed single-user pods onto a single node.
@yuvipanda are you aware of any existing solution that may solve our need? If not, do you have any suggestions on how we might achieve this, at the same time minimizing interruptions to single-users during the rescheduling process?

Peter

yuvipanda · December 11, 2020, 7:10am

There are two things to try here.

Set maxAge on the culler. This will cull pods that have been running for a long time, regardless of them being currently in use or not. If the users start their servers again, it’ll hopefully go to a better-suited node, especially if you are using the custom user scheduler.
Do internal culling on the notebook pod itself, with notebook config customizations. See https://github.com/jupyterhub/mybinder.org-deploy/blob/16fae275d5e4bfdcd0a5ad3c8adb9e08941fc3e9/mybinder/values.yaml#L4 for some possible options.

Hope this helps!

Topic		Replies	Views
How to set culling for single user pods Zero to JupyterHub on Kubernetes	1	116	July 11, 2024
Trouble understanding behavior of autoscaling support Zero to JupyterHub on Kubernetes	3	771	June 5, 2019
Core component resilience/reliability JupyterHub	10	2018	September 11, 2020
Jupyterhub Pods all going to only one node on the cluster Zero to JupyterHub on Kubernetes	10	1592	September 5, 2023
Singleuser containers stuck Terminating Zero to JupyterHub on Kubernetes	1	780	January 11, 2021

Rescheduling single-user pods

Related topics