traefik/Let's Encrypt Issues

Hi all, I am pretty desperate to figure this out, and I really do not know what I am doing wrong or how or debug/resolve the issue.

I followed the steps here to setup kubernetes on Google Cloud, and subsequently followed the steps to set up helm, and finally JupyterHub.

I went with the JupyterHub chart versioned 1.1.3, and used the following helm config:

scheduling:
  userScheduler:
    enabled: true
  podPriority:
    enabled: true
  userPlaceholder:
    enabled: true
    replicas: 4
  userPods:
    nodeAffinity:
      matchNodePurpose: require

cull:
  enabled: true
  timeout: 3600
  every: 300

singleuser:
  cpu:
    limit: 1
    guarantee: 0.4
  memory:
    limit: 1G
    guarantee: 1G

I was able to successfully access Jupyter Hub at the raw IP address, and was able to set up an A record with my host provider (and then successfully access my hub with my domain name!). Where things go wrong is when I try to set up automatic HTTPS.

I added the following to my config:

proxy:
  https:
    enabled: true
    hosts:
      - <redacted>
    letsencrypt:
      contactEmail: <redacted>

where the hostname listed is the bare name (no http/https/www or anything prefixed). Then I ran the following in the gcloud shell to update:

helm upgrade --cleanup-on-fail --install my-helm-namespace jupyterhub/jupyterhub --namespace my-k8s-namespace --create-namespace --version=1.1.3 --values config.yaml

Then if I check the logs out for the autohttp* pod:

kubectl logs -f autohttps-6b64696744-lgrcc traefik                                                                                                                                                    
time="2021-09-17T00:45:31Z" level=info msg="Configuration loaded from file: /etc/traefik/traefik.yaml"
time="2021-09-17T00:45:31Z" level=warning msg="No domain found in rule PathPrefix(`/`), the TLS options applied for this router will depend on the hostSNI of each request" entryPointName=https routerName=default@file
time="2021-09-17T00:45:32Z" level=warning msg="No domain found in rule PathPrefix(`/`), the TLS options applied for this router will depend on the hostSNI of each request" entryPointName=https routerName=default@file
time="2021-09-17T00:47:18Z" level=error msg="Unable to obtain ACME certificate for domains \"<redacted>\" : unable to generate a certificate for the domains [<redacted>]: error: one or more domains had a problem:\n[<redacted>] acme: error: 400 :: urn:ietf:params:acme:error:connection :: Fetching http://<redacted>/.well-known/acme-challenge/UGdkMpJR6ye_pxBlbH7YIPTcSzHc2SlgNaqS4gWHwQU: Timeout after connect (your server may be slow or overloaded)\n" providerName=default.acme

I read someone that traefik may be starting too early, so to try killing the autohttp* pod, but I still receive an error (though different):

kubectl logs -f autohttps-6b64696744-ql5dp traefik
time="2021-09-17T01:24:30Z" level=info msg="Configuration loaded from file: /etc/traefik/traefik.yaml"
time="2021-09-17T01:24:30Z" level=warning msg="No domain found in rule PathPrefix(`/`), the TLS options applied for this router will depend on the hostSNI of each request" entryPointName=https routerName=default@file
time="2021-09-17T01:24:38Z" level=error msg="Unable to obtain ACME certificate for domains \"<redacted>\" : unable to generate a certificate for the domains [<redacted>]: error: one or more domains had a problem:\n[<redacted>] acme: error: 400 :: urn:ietf:params:acme:error:connection :: Fetching http://<redacted>/.well-known/acme-challenge/RTSKJIYButiki8qyq8hD45U1o3RRRaZvxAMTWRibuJo: Connection refused\n" providerName=default.acme

I also read elsewhere that it may be something wrong with how ports are configured?

Here is the output from kubectl get pod and get service:

NAME                              READY   STATUS    RESTARTS   AGE
autohttps-6b64696744-xctql        2/2     Running   0          39m
hub-6dcf88f799-9b7wx              1/1     Running   0          87m
proxy-75d76cb74d-nwlwq            1/1     Running   0          118m
user-placeholder-0                0/1     Pending   0          120m
user-placeholder-1                0/1     Pending   0          120m
user-placeholder-2                0/1     Pending   0          120m
user-placeholder-3                0/1     Pending   0          120m
user-scheduler-8688fbc697-4sjqq   1/1     Running   0          110m
user-scheduler-8688fbc697-9q8nh   1/1     Running   0          110m
NAME           TYPE           CLUSTER-IP      EXTERNAL-IP    PORT(S)                      AGE
hub            ClusterIP      10.95.254.116   <none>         8081/TCP                     121m
proxy-api      ClusterIP      10.95.240.110   <none>         8001/TCP                     121m
proxy-http     ClusterIP      10.95.254.50    <none>         8000/TCP                     88m
proxy-public   LoadBalancer   10.95.251.37    <redacted>   443:30856/TCP,80:31409/TCP   121m

If any thing needs to be un-redacted (I am paranoid and just redacted a lot of things), then please let me know. I am brand new to everything here and feel totally lost. Any help is tremendously appreciated.

1 Like

This was the solution

1 Like

This is still an issue as of February 24th 2022 with JupyterHub helm version 1.3 on google cloud with google Kubernetes engine. The same approach solves it.

In short, identify your autohttps pod and then delete it.
A new one will spawn, and you SSL certificate should now be valid.