Autohttps pod unable to obtain Lets Encrypt certificate

I’ve been attempting to set up a new jupyterhub cluster on GCP using the Z2JH guide, and it’s been unable to obtain a certificate from Let’s Encrypt to allow for an https connection. I’ve had this working on an older version of jupyterhub about 2 years ago (and the Z2JH guide) but no luck in recreating the magic on more recent versions. I know the domain name (datahub.ncssm.edu) has had time to resolve to the IP address, as it’s had 72 hours and I’ve been able to verify that I can reach the server via domain name using http.

I’ve configured the .yaml file to include:

proxy:
  https:
    enabled: true
    hosts:
      - datahub.ncssm.edu
    letsencrypt:
      contactEmail: <removed>@ncssm.edu

It successfully creates the autohttps pod, but when I run kubectl logs <podname> I see the following message:

Defaulted container "traefik" out of: traefik, secret-sync, load-acme (init)
time="2022-07-29T18:28:17Z" level=info msg="Configuration loaded from file: /etc/traefik/traefik.yaml"
time="2022-07-29T18:28:17Z" level=warning msg="Traefik Pilot is deprecated and will be removed soon. Please check our Blog for migration instructions later this year."
time="2022-07-29T18:28:17Z" level=warning msg="No domain found in rule PathPrefix(`/`), the TLS options applied for this router will depend on the SNI of each request" entryPointName=https routerName=default@file
time="2022-07-29T18:28:36Z" level=error msg="Unable to obtain ACME certificate for domains \"datahub.ncssm.edu\" : unable to generate a certificate for the domains [datahub.ncssm.edu]: error: one or more domains had a problem:\n[datahub.ncssm.edu] acme: error: 400 :: urn:ietf:params:acme:error:connection :: 34.86.203.110: Fetching http://datahub.ncssm.edu/.well-known/acme-challenge/ZmJjOqz5lSX7A3W6xMJLlZzI9qxaIFb1VXkGDjRyxV8: Timeout during connect (likely firewall problem)\n" providerName=default.acme

It seems like it’s not able to reach the server to complete the challenge, is that right? If so, any ideas on how to correct this issue?

If helpful, I’m using helm chart version 1.1.3-n743.h730f014f

See autohttps pod unable to obtain certificate (GKE) · Issue #2818 · jupyterhub/zero-to-jupyterhub-k8s · GitHub and try this workaround:

proxy:
  traefik:
    extraInitContainers:
      # This startup delay can help the k8s container network interface setup
      # network routing of traffic to the pod, which is essential for the ACME
      # challenge submitted by Traefik on startup to acquire a TLS certificate.
      #
      # Sleeping for 7 seconds has been consistently sufficient to avoid issues
      # in GKE when this workaround was explored initially for GKE.
      #
      - name: startup-delay
        image: busybox:stable
        command: ["sh", "-c", "sleep 10"]
1 Like

Thanks @consideRatio . Following up here to confirm that this workaround fixes the issue in case anyone else finds this issue in the future.

1 Like

Thanks for the update, I always run into this problem with new hubs and my solution was always to kill the autohttps-xxxxxxxx-xxx pod, When it auto restarted it always worked. ConsideRatio’s solution is much more elegant, it could probably be the default setup for traefik.