Something up with mybinder.org cache

When I try to build this repo alex_binders / matplotlib-binder · GitLab on mybinder dot org it recognizes that it should have an image for it, takes forever, fails, tries again and again and finally fails. When I try the same exact repo at https://notebooks.gesis.org/, it works (albeit with extremely slow page loads).

A. Presumably binderhub should take a different response then keep trying forever if it can’t get the image.
B. If you have a way to check for images that don’t work (cache says they should exist but they don’t), it would be nice if such images could get kicked from the cache.

Do you know which backend member of the federation the error occured on, and what the exact error message was?
It’s working at the moment

This is the implementation of the redirector:

It redirects a user based on the initial request, but after that it doesn’t track the request all the way through the build and launch processes.

Sadly no. I gave up after I went to gesis where it worked (albeit extremely slowly). I got that one to work by going directly to gesis.

How would I tell which backend member I was on, would a copy of the build log have that info?

P.S. This wasn’t the first time it had failed like this. It had done this several times over a few weeks, so if you tell me how to find this I’ll look to find out what backend is failing if this should happen again.

If you can copy the build logs when it fails, that will help. I did find a stuck build from your repo on our GKE cluster, so I deleted that. It seemed to build and launch promptly on OVH. It may be due to one or more unhealthy nodes, which can cause slow launch times due to disk pressure or other issues.

Here you go.

Found built image, launching...
Launching server...
Server requested
2023-03-24T17:23:39Z [Normal] Successfully assigned ovh2/jupyter-alex-5fbinders-2dmatplotlib-2dbinder-2dwitktrjz to user-202211a-node-9b20ab
2023-03-24T17:23:40Z [Normal] Container image "jupyterhub/mybinder.org-tc-init:2020.12.4-0.dev.git.4289.h140cef52" already present on machine
2023-03-24T17:23:40Z [Normal] Created container tc-init
2023-03-24T17:23:40Z [Normal] Started container tc-init
2023-03-24T17:23:41Z [Normal] Pulling image "2lmrrh8f.gra7.container-registry.ovh.net/mybinder-builds/r2d-g5b5b759https-3a-2f-2fgitlab-2eflux-2eutah-2eedu-2falex-5fbinders-2fmatplotlib-2dbinder-5c6152:2786061c55f7c1fba835bf0ed97a9ddd09ffd40b"
Spawn failed: Timeout
Launch attempt 1 failed, retrying...
Server requested
2023-03-24T17:33:14Z [Normal] Successfully assigned ovh2/jupyter-alex-5fbinders-2dmatplotlib-2dbinder-2dm3ih50tc to user-202211a-node-9b20ab
2023-03-24T17:33:15Z [Normal] Container image "jupyterhub/mybinder.org-tc-init:2020.12.4-0.dev.git.4289.h140cef52" already present on machine
2023-03-24T17:33:15Z [Normal] Created container tc-init
2023-03-24T17:33:15Z [Normal] Started container tc-init
2023-03-24T17:33:16Z [Normal] Pulling image "2lmrrh8f.gra7.container-registry.ovh.net/mybinder-builds/r2d-g5b5b759https-3a-2f-2fgitlab-2eflux-2eutah-2eedu-2falex-5fbinders-2fmatplotlib-2dbinder-5c6152:2786061c55f7c1fba835bf0ed97a9ddd09ffd40b"

In case you’re wondering, yes, it’s getting stuck at the end of that log, I expect it will fail again (based off previous experience).

P.S. I tried ovh.mybinder.org, Binder and gke dot mybinder dot org. Each of them worked, though ovh and gke took a minute or so and gesis was short enough that I didn’t notice it.