We are planning to run Jupyterhub in a multi-zonal (3 availability zone) Google Kubernetes Engine (GKE) cluster on a private Virtual Private Cloud. Jupyterhub and Jupyter Lab will NOT have external IP addresses. Access to Jupyterhub and Jupyterlab from web clients will be through an Ingress Controller (Istio ILB Gateway).
We are doing this for scalability and high availability purposes.
We are planning on deploying the configurable HTTP proxy, the Hub and the users’ Jupyterlab containers in all three availability zones.
We are planning to deploy a shared (single instance) of Cloud SQL for Postgress database to save user session state. The hubs running in all three of the availability zones will read and write from this shared Cloud SQL instance.
Questions:
- Will this deployment strategy work?
- What are the pitfalls?
- What changes to this strategy need to be done to achieve at least some degree of high availability and scalability?
- Should the Ingress Controller use sticky or non-sticky sessions? In a perfect world a non-sticky approach would be the way to go. But pragmatically, it might make more sense to keep the user in the same availability zone where he/she authenticated to the hub. Please advise.