Long story short: try and make your image as small as possible. Beyond that there aren’t many things you can control related to start up time. Below some explanation about how things work to explain why image size is the thing I’d focus on.
There are a few reasons why a pre-built image can be faster and slower. The only time you get the 10-15s experience is if the image you are requesting is already present on a node in our cluster that is also available to launch it now. This is the best case.
The next best case is that a node is available to launch your image but doesn’t have a copy of it. Then it needs to fetch that image first. How much time that takes depends on which cluster you ended up on and more importantly how large the image is (you can control this factor to some extent).
The next best case is if a new node needs to be booted and then needs to fetch your image. We try to boot nodes ahead of time but sometimes the scale up of demand is so fast that we can’t stay ahead. Booting a node takes ~5-10min. I think it is very rare for an average user to run into this situation.
The next best case is that your image isn’t in the registry and needs to be built.
The next best case is that mybinder.org is down and you will have to wait until it is fixed
My guess is that the difference in times you see is because you were assigned to a node that needs to fetch the image (minutes of wait time) and a node that already had the image (tens of seconds wait time).
Unfortunately there isn’t much we can do. Popular images tend to be present on many nodes and stay in the cache of the nodes. Not so popular images get evicted from the cache. This is probably the best way to setup a cache (recently used stuff remains in the cache, least recently used is evicted). Unfortunately, from the point of view of a repo owner, most images are unpopular because the repo owner is (to be a little dramatic) is the only user of that image.
Please don’t try and artificially make your repo popular (some might think of scripts to launch it or such). If doing that becomes a popular past time we will probably just ban the offending repos and then have to spend time on figuring out automatic defences against this. Instead of spending time on actual features .
The thing I am most excited about in terms of improving this situation is being able to start a container without having to transfer the whole image first. There is some amazing work in the docker/container community happening on this. For example GitHub - containerd/stargz-snapshotter: Fast docker image distribution plugin for containerd, based on CRFS/stargz however as far as I know we are still some ways away from being able to deploy that for mybinder.org. Both in terms of it being ready enough as well as having expertise in the team on how to do this. If you or someone has tried this or keeps a close eye on it … please let us know