Does anyone else suffer from k8s nodes going bad? We find about one node per week becomes faulty - either k8s is aware of the problems, or it just pods’ networking seizes up or won’t start properly. We find it really disruptive when a node is full of Jupyter pods - we have to disrupt all the users on the node by draining it and they have to restart on another node. Is it just us with a rubbish k8s cluster, or do others find this too?
It makes me wonder whether k8s is stable enough for running stateful apps like Jupyter, which can’t have multiple replicas.