Letsencrypt autohttps Failure in GKE Deployment

I have been wrestling with this problem for several months and am hoping someone here might be able help me solve it.

I have not been able to get letsencrypt to work with my GKE Z2JH deployments regardless of tweaking. I think part of the issue is that I’m very unfamiliar with DNS/SSL certifications, but I believe the problem might lie in our Google Domain.

DNS Information:
This was all set up before I joined the group so I am the least familiar with it. I suspect something is going wrong here but I’m unsure if it’s our setup or something with the Google Domains service. I know this is way outside the scope of this discussion board but might be helpful for figuring out where the problem is. If anyone else is using domains.google.com without issues I’d love to chat.

We set up a domain under domains.google.com - I’ll refer to it as our-jupyter.org

Under it we set up a registered host: course.our-jupyter.org

After that we set up dynamic DNS synthetic records. Wee use domain forwarding:

our-jupyter.org.org, www.columbiajupyter2.org → http://(the IP of the registered hostname)

Our dynamic DNS is set with Dynamic DNS for class.our-jupyter.org to refer to A records.

Finally we set up A records for each hub we set up and then point those to the specific LoadBalancer IPs.

JupyterHub Config.yaml
Here is our proxy snippet from our config.yaml for jupyterhub:

proxy:
  secretToken: "(our-secret-token)"
  https:
    enabled: true
    type: letsencrypt
    letsencrypt:
      contactEmail: mw2931@columbia.edu
    hosts:
      - test-class1.our-jupyter-site.org
    letsencrypt:
      contactEmail: 'mw2931@columbia.edu'
  service:
    type: LoadBalancer
    loadBalancerIP: (the proxy external IP)

The Issue
When I don’t have letsencrypt on it connects to the LoadBalancerIP just fine, but as soon as I try to get letsencrypt working it just fails to reach the site. The ssllabs test reports “Assessment failed: Failed to communicate with the secure server.”

By manually adding certs it works fine, so I’m thinking there’s some disconnect between letsencrypt and the site. Originally I thought this was a traefik error but trying other versions doesn’t change the behavior.

I really appreciate the help. I have just tried changing so many things and experimenting with it to no avail that I just don’t know what else could be wrong with it at this point. If anyone is using Google Domains without any problems I’d be really grateful to chat!

Best,
Michael Weisner

I’m a bit confused by your DNS setup, would you mind clarifying a few things?

By “domain forwarding” do you mean an A record, a CNAME, HTTP redirect, or something else?

Could you explain the two sets of A records, and why dynamic DNS is involved? I’d have thought you’d have just one fixed A record for each hub, pointing to the LoadBalancer IP.

Hi Manics,

There aren’t two sets of records, I was trying to keep it generic rather than our specific registration (columbiajupyter2.org) with “our-jupyter.org” but typoed/missed the second instantiation with the www. It’s just really just columbiajupyter2.org & www.columbiajupyter2.org redirecting to the IP of the registered hostname.

As for the domain forwarding, this is something I inherited and I confess I am ignorant to the reasoning of it. it’s listed under our Google domain Synthetic Records, and looks to include severarl A records, AAAA records, and a CNAME record.

After that, the Dynamic DNS was already set up. The way I thought it was working was that the dynamic DNS was set as a generic placeholder for our individuals hubs, aka the class.our-jupyter.org in which we replace the “class” section with the name of the courses of the jupyterhub we’re setting up. We then set up the individual A records that way, but perhaps there’s no reason for the dynamic DNS?

In Jupyterhub I just reference the A record that points to the LoadBalancer, e.g. “class-1.our-jupyter.org” however this address fails when we try to use letsencrypt and only works when we manually set the SSL certs inside the config.yaml file. I just can’t figure out why it fails when we try to do it automatically as the only feedback I get from SSL Labs is that it can’t communicate with the secure server - which doesn’t make sense to me since it communicates fine when I set the certs manually and change nothing else.

Hopefully that makes a bit more sense, sorry about that! Thanks for your help!

Well this is embarrassing but for whatever reason I just tore down and tried again and now, for the first time in a year that I’ve been trying to do this, the autohttps worked.

I think the problem actually was actually just in that I tried to enforce the https rules too early, before the A record properly took effect.

My apologies for wasting everyone’s time!