EKS Binder fails to build repo

Hi,

I’m trying to build this repository as part of a binder I deployed in EKS: https://github.com/ploomber/ploomber

The repo works fine on https://mybinder.org/

I’m getting this log output:

Waiting for build to start...
Built image, launching...
Launching server...
Launch attempt 1 failed, retrying...
Launch attempt 2 failed, retrying...
Launch attempt 3 failed, retrying...
Failed to launch image ploomber/binder-dev-binder-2dexamples-2dconda-8677da:034931911e853252322f2309f1246a4f1076fd7d

And when logging into one of the binder instances (binder/binder-7b64cd5b8f-8bxhf), that’s what I’m seeing (I ran this on a different repo first):

[E 220810 21:04:26 builder:693] Retrying launch of https://github.com/binder-examples/conda after error (duration=0s, attempt=3): HTTPError()    
[I 220810 21:04:42 launcher:197] Creating user binder-examples-conda-mzp7vgbd for image ploomber/binder-dev-binder-2dexamples-2dconda-8677da:034931911e853252322f2309f12 
[I 220810 21:04:42 launcher:257] Starting server for user binder-examples-conda-mzp7vgbd with image ploomber/binder-dev-binder-2dexamples-2dconda-8677da:034931911e85325 
[E 220810 21:04:43 launcher:337] Error starting server for user binder-examples-conda-mzp7vgbd: HTTP 404: Not Found                                                      
     b'{"status": 404, "message": "Not Found"}'                                                                                                                           
[W 220810 21:04:43 web:1787] 500 GET /build/gh/binder-examples/conda/HEAD (135.84.167.61): Failed to launch image ploomber/binder-dev-binder-2dexamples-2dconda-8677da:0 
[I 220810 21:04:43 log:135] 200 GET /build/gh/binder-examples/conda/HEAD (anonymous@135.84.167.61) 30638.73ms  

Any ideas what’s going on?

(I saw this thread, but I couldn’t see where the docker pull is happening + I’m using a dockerhub user and not a cloud one)

Can you share the logs from JupyterHub in the same time period? That may shed some more light.

Yes, this is what I’m seeing in the JupyterHub (I just reinitiated the binder call with the repo):

I’m also seeing 2 user-schedulers, one has ok status and one keeps trying to get a lock:


Is the image definitely pushed to docker hub, and is it a public/private repo?

I have it in the config.yaml (adding all of it):

config:
  BinderHub:
    use_registry: true
    image_prefix: **ploomber/binder-dev**
    hub_url: SOME_HUB
    cors_allow_origin: '*'

jupyterhub:
    hub:
      config:
        BinderSpawner:
          cors_allow_origin: '*'
    cull:
      enabled: true
      # maxAge is 5 hours: 5 * 3600 = 18000
      maxAge: 18000
    ingress:
      enabled: true
      hosts:
        - SOME_HUB
      annotations:
        kubernetes.io/ingress.class: nginx
        kubernetes.io/tls-acme: "true"
        cert-manager.io/issuer: letsencrypt-production
        https: "nginx"
      tls:
        - secretName: SOME_SECRET
          hosts:
            - SOME_HOST

ingress:
  enabled: true
  hosts:
    - SOME_BINDER
  annotations:
    kubernetes.io/ingress.class: nginx
    kubernetes.io/tls-acme: "true"
    cert-manager.io/issuer: letsencrypt-production
    https: "nginx"
  tls:
    - secretName: SOME_SECRET
      hosts:
        - SOME_HOST

I was also able to push it into this public repository via docker push ploomber/binder-dev:test

The thing that looks different than the .org one is that the docker doesn’t show any log (like building the image or pulling it). Is this usually present in the hubs log?
I’ve been using the latest version via helm 0.2.0-n978.h2eb8b07.

Ll[quote=“Ido_Michael, post:1, topic:15287”]

Failed to launch image ploomber/binder-dev-binder-2dexamples-2dconda-8677da:034931911e853252322f2309f1246a4f1076fd7d

[/quote]

This image isn’t visible under Docker Hub which suggests the build process didn’t push the image to the registry. Can you try rebuilding an image, and follow the logs of the build pod to see what’s going wrong?

It should create this repo automatically right?

Which one is the build pod? the binder one?

Yes, when you push to Docker Hub the repo is automatically created.

The build pod is an ephemeral pod that runs repo2docker to build and push the image. If you monitor the list of pods you should see it appear when you trigger a build of a new repo, then disappear after the build completes or fails. Try tailing the logs.

Yeah it looks like it has some issue to even start, it shows in for a second and then disappears. I also tried with a different docker account but that’s not it. Maybe can I change the state of the pod through the yaml somehow so I can view the logs?

You can use a utility such as stern to automatically view logs when the pod starts. For example

stern -n <namespace> '(binder|build)-'

will tail the logs of pods with names binder-* or build-*, including newly created or replaced pods.

2 Likes

Can you also enable debug logging e.g.

config:
  Application:
    log_level: DEBUG

and show us the BinderHub logs from startup until the build fails?

2 Likes

Awesome, looks like the pod isn’t found?
Maybe it’s the repo2docker version?
I see it’s quay.io/jupyterhub/repo2docker:2022.02.0 which is 6 months old.

I got some more logs:

binder-dcbb4bff5-k8bv9 binder [I 220812 18:56:05 build:447] Started build build-ploomber-2dploomber-3873b0-5b5-22
binder-dcbb4bff5-k8bv9 binder [I 220812 18:56:05 build:449] Watching build pod build-ploomber-2dploomber-3873b0-5b5-22
binder-dcbb4bff5-k8bv9 binder [D 220812 18:56:06 log:135] 200 GET /versions (anonymous@10.0.100.140) 1.07ms
binder-dcbb4bff5-k8bv9 binder [D 220812 18:56:07 rest:219] response body: {"kind":"Pod","apiVersion":"v1","metadata":{"name":"build-ploomber-2dploomber-3873b0-5b5-22","namespace":"binder","uid":"eb9bea63-82f9-4f3b-ac79-d7f66e6d104e","resourceVersion":"25403525","creationTimestamp":"2022-08-12T18:56:05Z","deletionTimestamp":"2022-08-12T18:56:07Z","deletionGracePeriodSeconds":0,"labels":{"component":"binderhub-build","name":"build-ploomber-2dploomber-3873b0-5b5-22"},"annotations":{"binder-repo":"https://github.com/ploomber/ploomber","kubernetes.io/psp":"eks.privileged"},"managedFields":[{"manager":"Swagger-Codegen","operation":"Update","apiVersion":"v1","time":"2022-08-12T18:56:05Z","fieldsType":"FieldsV1","fieldsV1":{"f:metadata":{"f:annotations":{".":{},"f:binder-repo":{}},"f:labels":{".":{},"f:component":{},"f:name":{}}},"f:spec":{"f:affinity":{".":{},"f:podAntiAffinity":{".":{},"f:preferredDuringSchedulingIgnoredDuringExecution":{}}},"f:containers":{"k:{\"name\":\"builder\"}":{".":{},"f:args":{},"f:env":{".":{},"k:{\"name\":\"GIT_CREDENTIAL_ENV\"}":{".":{},"f:name":{},"f:value":{}}},"f:image":{},"f:imagePullPolicy":{},"f:name":{},"f:resources":{".":{},"f:limits":{".":{},"f:memory":{}},"f:requests":{".":{},"f:memory":{}}},"f:terminationMessagePath":{},"f:terminationMessagePolicy":{},"f:volumeMounts":{".":{},"k:{\"mountPath\":\"/root/.docker\"}":{".":{},"f:mountPath":{},"f:name":{}},"k:{\"mountPath\":\"/var/run/docker.sock\"}":{".":{},"f:mountPath":{},"f:name":{}}}}},"f:dnsPolicy":{},"f:enableServiceLinks":{},"f:restartPolicy":{},"f:schedulerName":{},"f:securityContext":{},"f:terminationGracePeriodSeconds":{},"f:tolerations":{},"f:volumes":{".":{},"k:{\"name\":\"docker-config\"}":{".":{},"f:name":{},"f:secret":{".":{},"f:defaultMode":{},"f:secretName":{}}},"k:{\"name\":\"docker-socket\"}":{".":{},"f:hostPath":{".":{},"f:path":{},"f:type":{}},"f:name":{}}}}}},{"manager":"kubelet","operation":"Update","apiVersion":"v1","time":"2022-08-12T18:56:07Z","fieldsType":"FieldsV1","fieldsV1":{"f:status":{"f:conditions":{"k:{\"type\":\"ContainersReady\"}":{".":{},"f:lastProbeTime":{},"f:lastTransitionTime":{},"f:message":{},"f:reason":{},"f:status":{},"f:type":{}},"k:{\"type\":\"Initialized\"}":{".":{},"f:lastProbeTime":{},"f:lastTransitionTime":{},"f:status":{},"f:type":{}},"k:{\"type\":\"Ready\"}":{".":{},"f:lastProbeTime":{},"f:lastTransitionTime":{},"f:message":{},"f:reason":{},"f:status":{},"f:type":{}}},"f:containerStatuses":{},"f:hostIP":{},"f:phase":{},"f:podIP":{},"f:podIPs":{".":{},"k:{\"ip\":\"10.0.221.107\"}":{".":{},"f:ip":{}}},"f:startTime":{}}}}]},"spec":{"volumes":[{"name":"docker-socket","hostPath":{"path":"/var/run/docker.sock","type":"Socket"}},{"name":"docker-config","secret":{"secretName":"binder-build-docker-config","defaultMode":420}},{"name":"default-token-xf4kx","secret":{"secretName":"default-token-xf4kx","defaultMode":420}}],"containers":[{"name":"builder","image":"quay.io/jupyterhub/repo2docker:2022.02.0","args":["jupyter-repo2docker","--ref=5b5d33bd1cff26c9e9c99a39b62081d9a9159afa","--image=ploomber/binder-devploomber-2dploomber-3873b0:5b5d33bd1cff26c9e9c99a39b62081d9a9159afa","--no-clean","--no-run","--json-logs","--user-name=jovyan","--user-id=1000","--push","https://github.com/ploomber/ploomber"],"env":[{"name":"GIT_CREDENTIAL_ENV","value":"username=SOME_TOKEN\\npassword=x-oauth-basic"}],"resources":{"limits":{"memory":"0"},"requests":{"memory":"0"}},"volumeMounts":[{"name":"docker-socket","mountPath":"/var/run/docker.sock"},{"name":"docker-config","mountPath":"/root/.docker"},{"name":"default-token-xf4kx","readOnly":true,"mountPath":"/var/run/secrets/kubernetes.io/serviceaccount"}],"terminationMessagePath":"/dev/termination-log","terminationMessagePolicy":"File","imagePullPolicy":"IfNotPresent"}],"restartPolicy":"Never","terminationGracePeriodSeconds":30,"dnsPolicy":"ClusterFirst","serviceAccountName":"default","serviceAccount":"default","nodeName":"ip-10-0-100-130.ec2.internal","securityContext":{},"affinity":{"podAntiAffinity":{"preferredDuringSchedulingIgnoredDuringExecution":[{"weight":100,"podAffinityTerm":{"labelSelector":{"matchLabels":{"component":"binderhub-build"}},"topologyKey":"kubernetes.io/hostname"}}]}},"schedulerName":"default-scheduler","tolerations":[{"key":"hub.jupyter.org/dedicated","operator":"Equal","value":"user","effect":"NoSchedule"},{"key":"hub.jupyter.org_dedicated","operator":"Equal","value":"user","effect":"NoSchedule"},{"key":"node.kubernetes.io/not-ready","operator":"Exists","effect":"NoExecute","tolerationSeconds":300},{"key":"node.kubernetes.io/unreachable","operator":"Exists","effect":"NoExecute","tolerationSeconds":300}],"priority":0,"enableServiceLinks":true,"preemptionPolicy":"PreemptLowerPriority"},"status":{"phase":"Failed","conditions":[{"type":"Initialized","status":"True","lastProbeTime":null,"lastTransitionTime":"2022-08-12T18:56:05Z"},{"type":"Ready","status":"False","lastProbeTime":null,"lastTransitionTime":"2022-08-12T18:56:05Z","reason":"ContainersNotReady","message":"containers with unready status: [builder]"},{"type":"ContainersReady","status":"False","lastProbeTime":null,"lastTransitionTime":"2022-08-12T18:56:05Z","reason":"ContainersNotReady","message":"containers with unready status: [builder]"},{"type":"PodScheduled","status":"True","lastProbeTime":null,"lastTransitionTime":"2022-08-12T18:56:05Z"}],"hostIP":"10.0.198.170","podIP":"10.0.221.107","podIPs":[{"ip":"10.0.221.107"}],"startTime":"2022-08-12T18:56:05Z","containerStatuses":[{"name":"builder","state":{"terminated":{"exitCode":1,"reason":"Error","startedAt":"2022-08-12T18:56:06Z","finishedAt":"2022-08-12T18:56:06Z","containerID":"docker://14c157d8ed745a401ba0b68302e1475a019227edbc37951000ffe88283b2afd5"}},"lastState":{},"ready":false,"restartCount":0,"image":"quay.io/jupyterhub/repo2docker:2022.02.0","imageID":"docker-pullable://quay.io/jupyterhub/repo2docker@sha256:c8b592a1012ea88db342c9303adc5f04dda682293392814af9e26460738465c2","containerID":"docker://14c157d8ed745a401ba0b68302e1475a019227edbc37951000ffe88283b2afd5","started":false}],"qosClass":"BestEffort"}}
binder-dcbb4bff5-k8bv9 binder     
binder-dcbb4bff5-k8bv9 binder [D 220812 18:56:07 rest:219] response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"pods \"build-ploomber-2dploomber-3873b0-5b5-22\" not found","reason":"NotFound","details":{"name":"build-ploomber-2dploomber-3873b0-5b5-22","kind":"pods"},"code":404}
binder-dcbb4bff5-k8bv9 binder     
binder-dcbb4bff5-k8bv9 binder [I 220812 18:56:07 launcher:197] Creating user ploomber-ploomber-zu7n0z96 for image ploomber/binder-devploomber-2dploomber-3873b0:5b5d33bd1cff26c9e9c99a39b62081d9a9159afa
binder-dcbb4bff5-k8bv9 binder [I 220812 18:56:07 launcher:257] Starting server for user ploomber-ploomber-zu7n0z96 with image ploomber/binder-devploomber-2dploomber-3873b0:5b5d33bd1cff26c9e9c99a39b62081d9a9159afa
binder-dcbb4bff5-k8bv9 binder [E 220812 18:56:07 launcher:337] Error starting server for user ploomber-ploomber-zu7n0z96: HTTP 404: Not Found
binder-dcbb4bff5-k8bv9 binder     b'{"status": 404, "message": "Not Found"}'
binder-dcbb4bff5-k8bv9 binder [E 220812 18:56:07 builder:693] Retrying launch of https://github.com/ploomber/ploomber after error (duration=0s, attempt=1): HTTPError()
binder-dcbb4bff5-k8bv9 binder [D 220812 18:56:08 log:135] 200 GET /versions (anonymous@10.0.198.170) 0.94ms
binder-dcbb4bff5-k8bv9 binder [D 220812 18:56:11 log:135] 200 GET /versions (anonymous@10.0.198.170) 0.89ms
binder-dcbb4bff5-k8bv9 binder [I 220812 18:56:11 launcher:197] Creating user ploomber-ploomber-pvtwq2w9 for image ploomber/binder-devploomber-2dploomber-3873b0:5b5d33bd1cff26c9e9c99a39b62081d9a9159afa
binder-dcbb4bff5-k8bv9 binder [I 220812 18:56:11 launcher:257] Starting server for user ploomber-ploomber-pvtwq2w9 with image ploomber/binder-devploomber-2dploomber-3873b0:5b5d33bd1cff26c9e9c99a39b62081d9a9159afa
binder-dcbb4bff5-k8bv9 binder [E 220812 18:56:11 launcher:337] Error starting server for user ploomber-ploomber-pvtwq2w9: HTTP 404: Not Found
binder-dcbb4bff5-k8bv9 binder     b'{"status": 404, "message": "Not Found"}'
binder-dcbb4bff5-k8bv9 binder [E 220812 18:56:11 builder:693] Retrying launch of https://github.com/ploomber/ploomber after error (duration=0s, attempt=2): HTTPError()
binder-dcbb4bff5-k8bv9 binder [D 220812 18:56:13 log:135] 200 GET /versions (anonymous@10.0.198.170) 0.96ms
binder-dcbb4bff5-k8bv9 binder [D 220812 18:56:16 log:135] 200 GET /versions (anonymous@10.0.198.170) 0.88ms
binder-dcbb4bff5-k8bv9 binder [D 220812 18:56:18 log:135] 200 GET /versions (anonymous@10.0.198.170) 0.95ms
binder-dcbb4bff5-k8bv9 binder [I 220812 18:56:19 launcher:197] Creating user ploomber-ploomber-hw8an6oi for image ploomber/binder-devploomber-2dploomber-3873b0:5b5d33bd1cff26c9e9c99a39b62081d9a9159afa
binder-dcbb4bff5-k8bv9 binder [I 220812 18:56:19 launcher:257] Starting server for user ploomber-ploomber-hw8an6oi with image ploomber/binder-devploomber-2dploomber-3873b0:5b5d33bd1cff26c9e9c99a39b62081d9a9159afa
binder-dcbb4bff5-k8bv9 binder [E 220812 18:56:19 launcher:337] Error starting server for user ploomber-ploomber-hw8an6oi: HTTP 404: Not Found
binder-dcbb4bff5-k8bv9 binder     b'{"status": 404, "message": "Not Found"}'
binder-dcbb4bff5-k8bv9 binder [E 220812 18:56:19 builder:693] Retrying launch of https://github.com/ploomber/ploomber after error (duration=0s, attempt=3): HTTPError()
binder-dcbb4bff5-k8bv9 binder [D 220812 18:56:21 log:135] 200 GET /versions (anonymous@10.0.198.170) 0.87ms
binder-dcbb4bff5-k8bv9 binder [D 220812 18:56:23 log:135] 200 GET /versions (anonymous@10.0.198.170) 0.96ms
binder-dcbb4bff5-k8bv9 binder [D 220812 18:56:26 log:135] 200 GET /versions (anonymous@10.0.198.170) 0.88ms
binder-dcbb4bff5-k8bv9 binder [D 220812 18:56:28 log:135] 200 GET /versions (anonymous@10.0.198.170) 0.91ms
binder-dcbb4bff5-k8bv9 binder [D 220812 18:56:31 log:135] 200 GET /versions (anonymous@10.0.198.170) 0.88ms
binder-dcbb4bff5-k8bv9 binder [D 220812 18:56:33 log:135] 200 GET /versions (anonymous@10.0.198.170) 1.05ms
binder-dcbb4bff5-k8bv9 binder [I 220812 18:56:35 launcher:197] Creating user ploomber-ploomber-sz9m9fns for image ploomber/binder-devploomber-2dploomber-3873b0:5b5d33bd1cff26c9e9c99a39b62081d9a9159afa
binder-dcbb4bff5-k8bv9 binder [I 220812 18:56:35 launcher:257] Starting server for user ploomber-ploomber-sz9m9fns with image ploomber/binder-devploomber-2dploomber-3873b0:5b5d33bd1cff26c9e9c99a39b62081d9a9159afa
binder-dcbb4bff5-k8bv9 binder [E 220812 18:56:35 launcher:337] Error starting server for user ploomber-ploomber-sz9m9fns: HTTP 404: Not Found
binder-dcbb4bff5-k8bv9 binder     b'{"status": 404, "message": "Not Found"}'
binder-dcbb4bff5-k8bv9 binder [W 220812 18:56:35 web:1787] 500 GET /build/gh/ploomber/ploomber/HEAD (135.84.167.61): Failed to launch image ploomber/binder-devploomber-2dploomber-3873b0:5b5d33bd1cff26c9e9c99a39b62081d9a9159afa
binder-dcbb4bff5-k8bv9 binder [I 220812 18:56:35 log:135] 200 GET /build/gh/ploomber/ploomber/HEAD (anonymous@135.84.167.61) 30694.16ms
binder-dcbb4bff5-k8bv9 binder [D 220812 18:56:36 log:135] 200 GET /versions (anonymous@10.0.198.170) 0.87ms
binder-dcbb4bff5-k8bv9 binder [D 220812 18:56:37 rest:219] response body: {"kind":"PodList","apiVersion":"v1","metadata":{"resourceVersion":"25403626"},"items":[]}
binder-dcbb4bff5-k8bv9 binder     
binder-dcbb4bff5-k8bv9 binder [D 220812 18:56:37 build:224] 0 build pods
binder-dcbb4bff5-k8bv9 binder [D 220812 18:56:37 build:272] Build phase summary: {}
binder-dcbb4bff5-k8bv9 binder [D 220812 18:56:38 log:135] 200 GET /versions (anonymous@10.0.198.170) 0.91ms
binder-dcbb4bff5-k8bv9 binder [D 220812 18:56:41 log:135] 200 GET /versions (anonymous@10.0.198.170) 0.89ms
binder-dcbb4bff5-k8bv9 binder [D 220812 18:56:43 log:135] 200 GET /versions (anonymous@10.0.198.170) 0.91ms

I’ve uninstalled the binder and reinstalled it without the cert-manager and nginx I had.
It does seems like a docker issue.

repository does not exist or may require ‘docker login’: denied: requested access to the resource is denied.
It makes sense, the repo isn’t there (I thought the repo2docker should create it and push an image).

Waiting for build to start...
Built image, launching...
Launching server...
Server requested
2022-08-15T13:14:35.039269Z [Normal] Successfully assigned binder/jupyter-binder-2dexamples-2dconda-2d1myzbhrg to ip-10-0-128-160.ec2.internal
2022-08-15T13:14:35Z [Normal] Container image "jupyterhub/k8s-network-tools:1.2.0" already present on machine
2022-08-15T13:14:35Z [Normal] Created container block-cloud-metadata
2022-08-15T13:14:35Z [Normal] Started container block-cloud-metadata
2022-08-15T13:14:36Z [Normal] Pulling image "idomic/binder-dev-binder-2dexamples-2dconda-8677da:034931911e853252322f2309f1246a4f1076fd7d"
2022-08-15T13:14:36Z [Warning] Failed to pull image "idomic/binder-dev-binder-2dexamples-2dconda-8677da:034931911e853252322f2309f1246a4f1076fd7d": rpc error: code = Unknown desc = Error response from daemon: pull access denied for idomic/binder-dev-binder-2dexamples-2dconda-8677da, repository does not exist or may require 'docker login': denied: requested access to the resource is denied
2022-08-15T13:14:36Z [Warning] Error: ErrImagePull
2022-08-15T13:14:37Z [Normal] Back-off pulling image "idomic/binder-dev-binder-2dexamples-2dconda-8677da:034931911e853252322f2309f1246a4f1076fd7d"
2022-08-15T13:14:37Z [Warning] Error: ImagePullBackOff

I also saw that the pods are having imagepull error, but it never built the image so obviously those will fail:

The repo2docker version shouldn’t matter, at worst you’d get an error whilst installing a dependency

I’m pretty sure this means the build isn’t starting at all, since you should get some progress messages in between these two message.

Taking the remainder of this log message, and reformatting as JSON for readability:

{
  "kind": "Pod",
  "apiVersion": "v1",
  "metadata": {
    "name": "build-ploomber-2dploomber-3873b0-5b5-22",
    "namespace": "binder",
    "uid": "eb9bea63-82f9-4f3b-ac79-d7f66e6d104e",
    "resourceVersion": "25403525",
    "creationTimestamp": "2022-08-12T18:56:05Z",
    "deletionTimestamp": "2022-08-12T18:56:07Z",
    "deletionGracePeriodSeconds": 0,
    "labels": {
      "component": "binderhub-build",
      "name": "build-ploomber-2dploomber-3873b0-5b5-22"
    },
    "annotations": {
      "binder-repo": "https://github.com/ploomber/ploomber",
      "kubernetes.io/psp": "eks.privileged"
    },
    "managedFields": [
      {
        "manager": "Swagger-Codegen",
        "operation": "Update",
        "apiVersion": "v1",
        "time": "2022-08-12T18:56:05Z",
        "fieldsType": "FieldsV1",
        "fieldsV1": {
          "f:metadata": {
            "f:annotations": {
              ".": {},
              "f:binder-repo": {}
            },
            "f:labels": {
              ".": {},
              "f:component": {},
              "f:name": {}
            }
          },
          "f:spec": {
            "f:affinity": {
              ".": {},
              "f:podAntiAffinity": {
                ".": {},
                "f:preferredDuringSchedulingIgnoredDuringExecution": {}
              }
            },
            "f:containers": {
              "k:{\"name\":\"builder\"}": {
                ".": {},
                "f:args": {},
                "f:env": {
                  ".": {},
                  "k:{\"name\":\"GIT_CREDENTIAL_ENV\"}": {
                    ".": {},
                    "f:name": {},
                    "f:value": {}
                  }
                },
                "f:image": {},
                "f:imagePullPolicy": {},
                "f:name": {},
                "f:resources": {
                  ".": {},
                  "f:limits": {
                    ".": {},
                    "f:memory": {}
                  },
                  "f:requests": {
                    ".": {},
                    "f:memory": {}
                  }
                },
                "f:terminationMessagePath": {},
                "f:terminationMessagePolicy": {},
                "f:volumeMounts": {
                  ".": {},
                  "k:{\"mountPath\":\"/root/.docker\"}": {
                    ".": {},
                    "f:mountPath": {},
                    "f:name": {}
                  },
                  "k:{\"mountPath\":\"/var/run/docker.sock\"}": {
                    ".": {},
                    "f:mountPath": {},
                    "f:name": {}
                  }
                }
              }
            },
            "f:dnsPolicy": {},
            "f:enableServiceLinks": {},
            "f:restartPolicy": {},
            "f:schedulerName": {},
            "f:securityContext": {},
            "f:terminationGracePeriodSeconds": {},
            "f:tolerations": {},
            "f:volumes": {
              ".": {},
              "k:{\"name\":\"docker-config\"}": {
                ".": {},
                "f:name": {},
                "f:secret": {
                  ".": {},
                  "f:defaultMode": {},
                  "f:secretName": {}
                }
              },
              "k:{\"name\":\"docker-socket\"}": {
                ".": {},
                "f:hostPath": {
                  ".": {},
                  "f:path": {},
                  "f:type": {}
                },
                "f:name": {}
              }
            }
          }
        }
      },
      {
        "manager": "kubelet",
        "operation": "Update",
        "apiVersion": "v1",
        "time": "2022-08-12T18:56:07Z",
        "fieldsType": "FieldsV1",
        "fieldsV1": {
          "f:status": {
            "f:conditions": {
              "k:{\"type\":\"ContainersReady\"}": {
                ".": {},
                "f:lastProbeTime": {},
                "f:lastTransitionTime": {},
                "f:message": {},
                "f:reason": {},
                "f:status": {},
                "f:type": {}
              },
              "k:{\"type\":\"Initialized\"}": {
                ".": {},
                "f:lastProbeTime": {},
                "f:lastTransitionTime": {},
                "f:status": {},
                "f:type": {}
              },
              "k:{\"type\":\"Ready\"}": {
                ".": {},
                "f:lastProbeTime": {},
                "f:lastTransitionTime": {},
                "f:message": {},
                "f:reason": {},
                "f:status": {},
                "f:type": {}
              }
            },
            "f:containerStatuses": {},
            "f:hostIP": {},
            "f:phase": {},
            "f:podIP": {},
            "f:podIPs": {
              ".": {},
              "k:{\"ip\":\"10.0.221.107\"}": {
                ".": {},
                "f:ip": {}
              }
            },
            "f:startTime": {}
          }
        }
      }
    ]
  },
  "spec": {
    "volumes": [
      {
        "name": "docker-socket",
        "hostPath": {
          "path": "/var/run/docker.sock",
          "type": "Socket"
        }
      },
      {
        "name": "docker-config",
        "secret": {
          "secretName": "binder-build-docker-config",
          "defaultMode": 420
        }
      },
      {
        "name": "default-token-xf4kx",
        "secret": {
          "secretName": "default-token-xf4kx",
          "defaultMode": 420
        }
      }
    ],
    "containers": [
      {
        "name": "builder",
        "image": "quay.io/jupyterhub/repo2docker:2022.02.0",
        "args": [
          "jupyter-repo2docker",
          "--ref=5b5d33bd1cff26c9e9c99a39b62081d9a9159afa",
          "--image=ploomber/binder-devploomber-2dploomber-3873b0:5b5d33bd1cff26c9e9c99a39b62081d9a9159afa",
          "--no-clean",
          "--no-run",
          "--json-logs",
          "--user-name=jovyan",
          "--user-id=1000",
          "--push",
          "https://github.com/ploomber/ploomber"
        ],
        "env": [
          {
            "name": "GIT_CREDENTIAL_ENV",
            "value": "username=SOME_TOKEN\\npassword=x-oauth-basic"
          }
        ],
        "resources": {
          "limits": {
            "memory": "0"
          },
          "requests": {
            "memory": "0"
          }
        },
        "volumeMounts": [
          {
            "name": "docker-socket",
            "mountPath": "/var/run/docker.sock"
          },
          {
            "name": "docker-config",
            "mountPath": "/root/.docker"
          },
          {
            "name": "default-token-xf4kx",
            "readOnly": true,
            "mountPath": "/var/run/secrets/kubernetes.io/serviceaccount"
          }
        ],
        "terminationMessagePath": "/dev/termination-log",
        "terminationMessagePolicy": "File",
        "imagePullPolicy": "IfNotPresent"
      }
    ],
    "restartPolicy": "Never",
    "terminationGracePeriodSeconds": 30,
    "dnsPolicy": "ClusterFirst",
    "serviceAccountName": "default",
    "serviceAccount": "default",
    "nodeName": "ip-10-0-100-130.ec2.internal",
    "securityContext": {},
    "affinity": {
      "podAntiAffinity": {
        "preferredDuringSchedulingIgnoredDuringExecution": [
          {
            "weight": 100,
            "podAffinityTerm": {
              "labelSelector": {
                "matchLabels": {
                  "component": "binderhub-build"
                }
              },
              "topologyKey": "kubernetes.io/hostname"
            }
          }
        ]
      }
    },
    "schedulerName": "default-scheduler",
    "tolerations": [
      {
        "key": "hub.jupyter.org/dedicated",
        "operator": "Equal",
        "value": "user",
        "effect": "NoSchedule"
      },
      {
        "key": "hub.jupyter.org_dedicated",
        "operator": "Equal",
        "value": "user",
        "effect": "NoSchedule"
      },
      {
        "key": "node.kubernetes.io/not-ready",
        "operator": "Exists",
        "effect": "NoExecute",
        "tolerationSeconds": 300
      },
      {
        "key": "node.kubernetes.io/unreachable",
        "operator": "Exists",
        "effect": "NoExecute",
        "tolerationSeconds": 300
      }
    ],
    "priority": 0,
    "enableServiceLinks": true,
    "preemptionPolicy": "PreemptLowerPriority"
  },
  "status": {
    "phase": "Failed",
    "conditions": [
      {
        "type": "Initialized",
        "status": "True",
        "lastProbeTime": null,
        "lastTransitionTime": "2022-08-12T18:56:05Z"
      },
      {
        "type": "Ready",
        "status": "False",
        "lastProbeTime": null,
        "lastTransitionTime": "2022-08-12T18:56:05Z",
        "reason": "ContainersNotReady",
        "message": "containers with unready status: [builder]"
      },
      {
        "type": "ContainersReady",
        "status": "False",
        "lastProbeTime": null,
        "lastTransitionTime": "2022-08-12T18:56:05Z",
        "reason": "ContainersNotReady",
        "message": "containers with unready status: [builder]"
      },
      {
        "type": "PodScheduled",
        "status": "True",
        "lastProbeTime": null,
        "lastTransitionTime": "2022-08-12T18:56:05Z"
      }
    ],
    "hostIP": "10.0.198.170",
    "podIP": "10.0.221.107",
    "podIPs": [
      {
        "ip": "10.0.221.107"
      }
    ],
    "startTime": "2022-08-12T18:56:05Z",
    "containerStatuses": [
      {
        "name": "builder",
        "state": {
          "terminated": {
            "exitCode": 1,
            "reason": "Error",
            "startedAt": "2022-08-12T18:56:06Z",
            "finishedAt": "2022-08-12T18:56:06Z",
            "containerID": "docker://14c157d8ed745a401ba0b68302e1475a019227edbc37951000ffe88283b2afd5"
          }
        },
        "lastState": {},
        "ready": false,
        "restartCount": 0,
        "image": "quay.io/jupyterhub/repo2docker:2022.02.0",
        "imageID": "docker-pullable://quay.io/jupyterhub/repo2docker@sha256:c8b592a1012ea88db342c9303adc5f04dda682293392814af9e26460738465c2",
        "containerID": "docker://14c157d8ed745a401ba0b68302e1475a019227edbc37951000ffe88283b2afd5",
        "started": false
      }
    ],
    "qosClass": "BestEffort"
  }
}

you can see the pod has status "phase": "Failed" and "exitCode": 1

This should’ve been detected by BinderHub but obviously wasn’t, I don’t know if Check for low-level errors when submitting builds by manics · Pull Request #1517 · jupyterhub/binderhub · GitHub would fix the error detection or if you’re hit some other problem.

so try going through the k8s events, or describing the build pod whilst it’s being created, to look for clues.

Could you also tell us how you provisioned and configured your EKS cluster?

I’m adding here the stack I used to deploy it with, it’s using the default base vpc with kubernetes 1.20 and 1 instance.

    def __init__(self, scope: cdk.Construct, id: str, **kwargs) -> None:
        super().__init__(scope, id, **kwargs)

        # import default VPC
        vpc = ec2.Vpc(self, 'EKS-CDK-VPC', cidr='10.0.0.0/16', nat_gateways=1)

        # create an admin role
        eks_admin_role = iam.Role(self,
                                  'EKSAdminRole',
                                  assumed_by=iam.AccountPrincipal(
                                      account_id=self.account)
                                  )
        # create the cluster
        cluster = eks.Cluster(self, 'cluster',
                              masters_role=eks_admin_role,
                              vpc=vpc,
                              default_capacity=0,
                              version=eks.KubernetesVersion.V1_20,
                              output_cluster_name=True
                              )

        cluster.add_nodegroup_capacity(
            "binder-node-group",
            instance_types=[ec2.InstanceType('a1.2xlarge')],
            min_size=1,
            max_size=3
        )

How can I apply this patch into my cluster? (I was deploying from the guide via the helm chart)

Update:

I’ve teared down the whole cluster, and changed the kubernetes version to: KubernetesVersion.V1_21

After reinstalling the hub, it’s passing the point where docker couldn’t fetch the image but now the image build is failing on some ubuntu requirements:

Waiting for build to start...
Picked Git content provider.
Cloning into '/tmp/repo2docker0k2ko4cx'...
HEAD is now at 0349319 Merge pull request #13 from manics/add-binder-badge
Building conda environment for python=Using CondaBuildPack builder
Building conda environment for python=Building conda environment for python=Step 1/51 : FROM buildpack-deps:bionic
 ---> 256bc5b8157de...
Step 2/51 : ENV DEBIAN_FRONTEND=noninteractive
 ---> Running in 8d705335afe7
Removing intermediate container 8d705335afe7
 ---> 4a1873c1a445
Step 3/51 : RUN apt-get -qq update &&     apt-get -qq install --yes --no-install-recommends locales > /dev/null &&     apt-get -qq purge &&     apt-get -qq clean &&     rm -rf /var/lib/apt/lists/*
 ---> Running in ed22ebff83aa
W: Failed to fetch http://archive.ubuntu.com/ubuntu/dists/bionic/InRelease  Temporary failure resolving 'archive.ubuntu.com'
W: Failed to fetch http://archive.ubuntu.com/ubuntu/dists/bionic-updates/InRelease  Temporary failure resolving 'archive.ubuntu.com'
W: Failed to fetch http://archive.ubuntu.com/ubuntu/dists/bionic-backports/InRelease  Temporary failure resolving 'archive.ubuntu.com'
W: Failed to fetch http://security.ubuntu.com/ubuntu/dists/bionic-security/InRelease  Temporary failure resolving 'security.ubuntu.com'
W: Some index files failed to download. They have been ignored, or old ones used instead.
E: Package 'locales' has no installation candidate
Removing intermediate container ed22ebff83aa
The command '/bin/sh -c apt-get -qq update &&     apt-get -qq install --yes --no-install-recommends locales > /dev/null &&     apt-get -qq purge &&     apt-get -qq clean &&     rm -rf /var/lib/apt/lists/*' returned a non-zero code: 100Built image, launching...
Failed to connect to event stream

Might be related to this

Ok I was able to deploy successfully, here’s what to do if anyone else hits this issue with AWS.
Also the error on binderhub was not clear but I already saw there’s now an open issue about this.

The AWS kubernetes version wasn’t working well with binder. I was deploying with CDK and had to use eks.KubernetesVersion.V1_21 (instead of 19).
I also had to make sure that docker inside docker is enabled. I had to use a launch template (their docs is broken on how to do it with CDK) here’s what I used eventually:

user_data = ec2.UserData.for_linux()
        user_data.add_commands(
            "cp /etc/docker/daemon.json /etc/docker/daemon_backup.json")
        user_data.add_commands(
            "echo -e '.bridge=\"docker0\" | .\"live-restore\"=false' > /etc/docker/jq_script")
        user_data.add_commands(
            "jq -f /etc/docker/jq_script /etc/docker/daemon_backup.json | tee "
            "/etc/docker/daemon.json")
        user_data.add_commands("systemctl restart docker")
        multipart_user_data = ec2.MultipartUserData()
        multipart_user_data.add_user_data_part(user_data,
                                               ec2.MultipartBody.SHELL_SCRIPT,
                                               True)

        ec2_lt_data = ec2.CfnLaunchTemplate.LaunchTemplateDataProperty(
               user_data=cdk.Fn.base64(multipart_user_data.render()),
               instance_type=some_instance_type
        )

I also noticed that if you already deploy something in the launch template and need to update/upgrade, that’ll throw an error, so everytime that happens I had to delete the resource and recreate.

Thanks for all of the support @manics I think once we figured out the binder is bootstrapping wrong on top of the cluster things went smoothly, kind of.

1 Like