Hey folks, my team is using the Jupyterhub Helm Chart (v0.11.1) with the userScheduler enabled and we’re noticing timeouts due to pods having to wait for large images to finish pulling. This seems to only happen when two notebooks are spawned in rapid succession and when the first of said notebook spawns triggers a scale-out event. It seems like the first spawn evicts a placeholder pod which forces a scaleout event such that the evicted placeholder pod gets scheduled onto a new node. Before that placeholder can pull the image onto the new node, the second spawn takes place and the second pod evicts the same placeholder pod, thus getting scheduled onto the new node which hasn’t finished pulling the singleuser server image.
Instead, I would expect the second pod to evict some “older” (i.e., less-recently-evicted) placeholder pod such that it is scheduled on a node which has long-since pulled the singleuser image. Has anyone else run into this problem before? Is it fixed in some newer version of the application?
Hi! This is a bit unclear- if your cluster is scaling out then that means all nodes are occupied or close to being fully occupied, so where would the spare capacity come from?
We’re running on Google’s Kubernetes Engine, so the underlying node pool scales up when we run out of capacity. A new node is provisioned for the evicted placeholder pod.
Yes, but if the node has only just been provisioned the image-puller may not have pulled the singleuser image. It sounds like you’re expecting there to be another node that’s ready with the image already pre-pulled- I don’t understand why you’re expecting this.
@manics Yeah, it’s confusing. Let me try to clarify:
At idle, we have 8-ish placeholder pods running on 8-ish nodes. A user spawns a notebook pod, causing one of the placeholder pods (let’s say placeholder-4) to be evicted. The scheduler is now trying to schedule placeholder-4, but there’s no capacity so the autoscaler begins to provision a new node for placeholder-4.
Now, another user spawns a notebook pod. I expect that it would evict any other placeholder besides placeholder-4 so that it gets scheduled onto a node which already has the image pulled; however, at least in some cases, it seems like it prefers to evict placeholder-4 once again thus scheduling the singleuer notebook pod on the new node which hasn’t yet finished pulling the singleuser notebook image.
I don’t yet know if this happens every time this scenario runs or if it’s relatively rare (maybe the scheduler just picks a node at random and we’re only seeing this behavior about once out of every 8 scenarios?). Still working to pin this down.