Rescheduling single-user pods

I have a deployment of JupyterHub on AWS EKS with multiple nodes.

An issue that my team has been running in to is that after a class is over, our nodes begin vacating as users logout or the culler removes inactive pods. Approximately an hour after the class is over, most single-user pods have been shut down one way or another. However, there are always a few users keeping their pods active hours after the class has finished. These remaining users’ pods are generally spread sparsely between cluster nodes, often only one or two single-user active pods remaining on each node. This scenario has been preventing the autoscaler from reducing the node count.

In the interest of minimizing server costs, we are looking for a solution to reschedule the sparsely distributed single-user pods onto a single node.
@yuvipanda are you aware of any existing solution that may solve our need? If not, do you have any suggestions on how we might achieve this, at the same time minimizing interruptions to single-users during the rescheduling process?

Peter

1 Like

There are two things to try here.

  1. Set maxAge on the culler. This will cull pods that have been running for a long time, regardless of them being currently in use or not. If the users start their servers again, it’ll hopefully go to a better-suited node, especially if you are using the custom user scheduler.
  2. Do internal culling on the notebook pod itself, with notebook config customizations. See https://github.com/jupyterhub/mybinder.org-deploy/blob/16fae275d5e4bfdcd0a5ad3c8adb9e08941fc3e9/mybinder/values.yaml#L4 for some possible options.

Hope this helps!

1 Like