Background:
Past few weeks we have been running into scaling issues for a couple 5-15 node single-zone autoscaling GKE clusters running jupyterhub v0.8.0. In both cases we see autoscaling getting triggered when expected but occasionally failing with ZONE_RESOURCE_POOL_EXHAUSTED
, which I gather means the zone we are running in is out of compute for us. Of course these are times mid-day when the cluster is acutely needed. It’s relatively straight-forward to get new nodes in a different zone, but the persistent storage volumes are locked to the current zone as far as I can tell.
Questions:
- anyone else run into this?
- is there a way to have user volumes float across a region though configuration to gcloud / jupyterhub’s helm config?
- is there a sane way to take what we have and make it multi-zonal without tearing everything down?