Hi,
We have been having issues with our Hub crashing when 40+ students log in at once. I am aware that I should migrate deployments to use regional clusters for HA. Since that requires destroying my current cluster, I am wondering if there are other practices that could increase reliability.
For instance, I noticed that the resources allocated by default to the hub pod are
resources:
requests:
cpu: 200m
memory: 512Mi
Is that enough memory? Would increasing the memory be wise? Can we have more replicas of the hub pos?
In case is helpful my cluster set up is:
# create core pool
gcloud beta --project=$PROJECT_NAME container clusters create $CLUSTER_NAME \
--machine-type=n1-highmem-4 \
--num-nodes=1 \
--enable-autoscaling \
--enable-autorepair \
--min-nodes=1 \
--max-nodes=4 \
--cluster-version latest \
--node-labels hub.jupyter.org/node-purpose=core
# create an user pool
gcloud beta --project=$PROJECT_NAME container node-pools create user-pool \
--cluster=$CLUSTER_NAME \
--machine-type=n1-highmem-8 \
--num-nodes=1 \
--enable-autoscaling \
--enable-autorepair \
--min-nodes=1 \
--max-nodes=10 \
--node-labels hub.jupyter.org/node-purpose=user \
--node-taints hub.jupyter.org/dedicated=user:NoSchedule
My HUB_VERSION=v0.9-dcde99a
Any advice is highly appreciated. Thanks!