Posting how I solved this in case others run into something similar.
I set up a JupyterHub w Kubernetes on Azure and had been using it with a small team of 3-4 for a year. Then I did a workshop to test it with more people. It worked great during the workshop. After the workshop, I crashed my server (ran out of RAM). No problem. That often happens and I restart. This time, I got a volume / node affinity error and the pod was stuck in pending. Some other people could still launch pods, but I could not.
Turns out it was a mismatch between the zone that my user PVC was on and the zone of the node. As the cluster scaled up during the workshop, new nodes on uswest2-1, uswest2-2, uswest-3 were created because I didn’t specify the zone of my nodes when setting up Kubernetes nodes. I only set the region: uswest2. As the cluster auto-scaled back down, it just so happened that the ‘last node standing’ was on uswest2-2. My user PVC is on uswest2-1 and so there was a pvc / node mismatch.
List the user PVCs
kubectl get pv -n dhub
Describe the user PVC that is not starting. The pvc-… bit is the PVC name for the user.
kubectl describe pv pvc-b2e50a00-df23-4513-b7ae-17f6cxxxxxx -n dhub
In the describe info, I see this
Term 0: topology.disk.csi.azure.com/zone in [westus2-1]
Take a look at any zones specified in the node specs.
dhub is my namespace.
kubectl get nodes --show-labels -n dhub
I see this
How did that happen? There is only one node and it is a different zone (2) than the user PVC.
I go onto the Azure dashboard and look at my node specs. I see that I did not check the box to restrict to a specific zone like uswest2-1. So during the workshop, the nodes were being created in different zones but same region.
Fortunately the people who joined the workshop could be deleted.
I created a new node specification on Azure with the zone 1 box checked since the 3-4 people who had been on the hub all had PVCs on uswest2-1. Then I deleted all the new workshop participants.
These posts helped