Hi all, I am pretty desperate to figure this out, and I really do not know what I am doing wrong or how or debug/resolve the issue.
I followed the steps here to setup kubernetes on Google Cloud, and subsequently followed the steps to set up helm, and finally JupyterHub.
I went with the JupyterHub chart versioned 1.1.3, and used the following helm config:
scheduling:
userScheduler:
enabled: true
podPriority:
enabled: true
userPlaceholder:
enabled: true
replicas: 4
userPods:
nodeAffinity:
matchNodePurpose: require
cull:
enabled: true
timeout: 3600
every: 300
singleuser:
cpu:
limit: 1
guarantee: 0.4
memory:
limit: 1G
guarantee: 1G
I was able to successfully access Jupyter Hub at the raw IP address, and was able to set up an A record with my host provider (and then successfully access my hub with my domain name!). Where things go wrong is when I try to set up automatic HTTPS.
I added the following to my config:
proxy:
https:
enabled: true
hosts:
- <redacted>
letsencrypt:
contactEmail: <redacted>
where the hostname listed is the bare name (no http/https/www or anything prefixed). Then I ran the following in the gcloud shell to update:
helm upgrade --cleanup-on-fail --install my-helm-namespace jupyterhub/jupyterhub --namespace my-k8s-namespace --create-namespace --version=1.1.3 --values config.yaml
Then if I check the logs out for the autohttp*
pod:
kubectl logs -f autohttps-6b64696744-lgrcc traefik
time="2021-09-17T00:45:31Z" level=info msg="Configuration loaded from file: /etc/traefik/traefik.yaml"
time="2021-09-17T00:45:31Z" level=warning msg="No domain found in rule PathPrefix(`/`), the TLS options applied for this router will depend on the hostSNI of each request" entryPointName=https routerName=default@file
time="2021-09-17T00:45:32Z" level=warning msg="No domain found in rule PathPrefix(`/`), the TLS options applied for this router will depend on the hostSNI of each request" entryPointName=https routerName=default@file
time="2021-09-17T00:47:18Z" level=error msg="Unable to obtain ACME certificate for domains \"<redacted>\" : unable to generate a certificate for the domains [<redacted>]: error: one or more domains had a problem:\n[<redacted>] acme: error: 400 :: urn:ietf:params:acme:error:connection :: Fetching http://<redacted>/.well-known/acme-challenge/UGdkMpJR6ye_pxBlbH7YIPTcSzHc2SlgNaqS4gWHwQU: Timeout after connect (your server may be slow or overloaded)\n" providerName=default.acme
I read someone that traefik may be starting too early, so to try killing the autohttp*
pod, but I still receive an error (though different):
kubectl logs -f autohttps-6b64696744-ql5dp traefik
time="2021-09-17T01:24:30Z" level=info msg="Configuration loaded from file: /etc/traefik/traefik.yaml"
time="2021-09-17T01:24:30Z" level=warning msg="No domain found in rule PathPrefix(`/`), the TLS options applied for this router will depend on the hostSNI of each request" entryPointName=https routerName=default@file
time="2021-09-17T01:24:38Z" level=error msg="Unable to obtain ACME certificate for domains \"<redacted>\" : unable to generate a certificate for the domains [<redacted>]: error: one or more domains had a problem:\n[<redacted>] acme: error: 400 :: urn:ietf:params:acme:error:connection :: Fetching http://<redacted>/.well-known/acme-challenge/RTSKJIYButiki8qyq8hD45U1o3RRRaZvxAMTWRibuJo: Connection refused\n" providerName=default.acme
I also read elsewhere that it may be something wrong with how ports are configured?
Here is the output from kubectl get pod
and get service
:
NAME READY STATUS RESTARTS AGE
autohttps-6b64696744-xctql 2/2 Running 0 39m
hub-6dcf88f799-9b7wx 1/1 Running 0 87m
proxy-75d76cb74d-nwlwq 1/1 Running 0 118m
user-placeholder-0 0/1 Pending 0 120m
user-placeholder-1 0/1 Pending 0 120m
user-placeholder-2 0/1 Pending 0 120m
user-placeholder-3 0/1 Pending 0 120m
user-scheduler-8688fbc697-4sjqq 1/1 Running 0 110m
user-scheduler-8688fbc697-9q8nh 1/1 Running 0 110m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
hub ClusterIP 10.95.254.116 <none> 8081/TCP 121m
proxy-api ClusterIP 10.95.240.110 <none> 8001/TCP 121m
proxy-http ClusterIP 10.95.254.50 <none> 8000/TCP 88m
proxy-public LoadBalancer 10.95.251.37 <redacted> 443:30856/TCP,80:31409/TCP 121m
If any thing needs to be un-redacted (I am paranoid and just redacted a lot of things), then please let me know. I am brand new to everything here and feel totally lost. Any help is tremendously appreciated.