The short answer is: it depends. There are several different clusters that serve traffic for mybinder.org and the exact setup regarding “docker registry things” depends on the cluster.
First some things which are common to how the clusters are configured:
- every cluster has a docker registry which is used to store built images
- when a person launches a binder we check if that registry contains an up to date image and if yes use it instead of building the image again
- each compute node in each of the clusters has a cache of docker image layers that were recently used. This means sometimes we don’t need to pull the layers from the registry.
- each cluster has its own public IP
- all layers used to build an image are pushed to the cluster’s docker registry (for example if your
Dockerfile in the repository just contains
FROM someorg/somereposprebuiltimage:sometag that layer should end up in the registry of the cluster
The thing that is configured differently on clusters is where the cluster’s docker registry is hosted. Some have a dedicated registry (for example hosted on Google Container Registry) and some use docker hub.
Pulls from this registry are IIRC not using credentials, at least for the docker hub case. This means that pulls in point (3) (when the node doesn’t have all layers) fall under the rate limited case. This potentially happens on every launch, but the most likely case is that several launches of the same repository all get scheduled onto the same node on the same cluster. In which case there would only be one pull from the registry.
Pulls of layers required to build an image in the first place are probably also not authenticated (but I’d have to check). This means each time we have to build a new image for a repository there is a chance that we need to pull a layer from docker hub. Why docker hub? Because most base layers are public images which are hosted there. We try hard to schedule builds of the same repository onto the same node in the same cluster to maximise the chances of being able to reuse image layers created during other/previous builds. So in a typical case
N builds should not lead to
N pulls from docker hub.
In summary: clusters that host their own private image registry might be effected by the rate limit at image build time if the build requires a pull of a layer hosted on docker hub (minimise the chance of this happening by using popular base images which are likely already on the build nodes because everyone else uses them.
FROM buildpack-deps:bionic is what repo2docker uses.)
Clusters that use docker hub as their “internal registry” will use up quota for launches as well as builds. Several launches of the same version of a repo within a short period of time probably only use “one amount” of quota because they get scheduled on the same/a few nodes.
So the final answer to your question of “how many pulls per launch?” is: I have no idea, could be as few as zero but could also be “a few”, most likely somewhere in between :-/
We (people running clusters for mybinder.org) should check if we can increase the use of credentials when pulling images from docker hub.
A way to level up your forum skills (like number of links per post) is to participate more in the forum. For example Introduce yourself! is a place you can introduce yourself. That gets you “points” and puts a virtual face to your name