Autohttps doesn't start

Hi all,

I am following the instructions for setting https for JupyterHub. I already bought the domain, created a CNAME zone (the instructions mentioned A Zone, but based on this issue looks like CNAME can be used in AWS), and made it point to my external-ip. After that, I can connect to my JupyterHub using the domain but still under http.

I then changed the proxy configuration in the helm chart to this:

proxy:
  https:
    enabled: true
    hosts:
      - <my-domain>
    letsencrypt:
      contactEmail: <myemail>

It created a new autohttps pod that gets stuck at Init:0/1
image
I read in other topics that the solution could be delaying the starting of Traefik, but that still doesn’t work. In the same thread I read that deleting the pod helped, but in my case it doesn’t

When I run kubectl describe, this is the output:

Name:             autohttps-764f9d5b44-cvvf7
Namespace:        default
Priority:         0
Service Account:  autohttps
Node:             i-0005b46cc367a9f22/172.20.58.123
Start Time:       Wed, 12 Jun 2024 01:15:55 +0000
Labels:           app=jupyterhub
                  component=autohttps
                  hub.jupyter.org/network-access-proxy-http=true
                  pod-template-hash=764f9d5b44
                  release=jupyterhub
Annotations:      checksum/static-config: eaf9940443dcbb831724385f5c6f760fb89aecc6b26dc4ce9adda17e0c8f2049
                  kubernetes.io/limit-ranger:
                    LimitRanger plugin set: cpu request for container traefik; cpu request for container secret-sync; cpu request for init container load-acme
Status:           Pending
IP:               100.96.2.1
IPs:
  IP:           100.96.2.1
Controlled By:  ReplicaSet/autohttps-764f9d5b44
Init Containers:
  load-acme:
    Container ID:  containerd://471d16cef822452af81bf667baab71998f01efb2f1e84df72996f14e0bfaaf9a
    Image:         quay.io/jupyterhub/k8s-secret-sync:3.3.7
    Image ID:      quay.io/jupyterhub/k8s-secret-sync@sha256:cd53bef49a271d88628211e463c345250d6dc81da018e6c3d359b8176b7eebd0
    Port:          <none>
    Host Port:     <none>
    Args:
      load
      proxy-public-tls-acme
      acme.json
      /etc/acme/acme.json
    State:          Running
      Started:      Wed, 12 Jun 2024 01:53:23 +0000
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Wed, 12 Jun 2024 01:43:40 +0000
      Finished:     Wed, 12 Jun 2024 01:52:40 +0000
    Ready:          False
    Restart Count:  4
    Requests:
      cpu:  100m
    Environment:
      PYTHONUNBUFFERED:  True
    Mounts:
      /etc/acme from certificates (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-mbdzs (ro)
Containers:
  traefik:
    Container ID:   
    Image:          traefik:v2.11.0
    Image ID:       
    Ports:          8080/TCP, 8443/TCP
    Host Ports:     0/TCP, 0/TCP
    State:          Waiting
      Reason:       PodInitializing
    Ready:          False
    Restart Count:  0
    Requests:
      cpu:        100m
    Environment:  <none>
    Mounts:
      /etc/acme from certificates (rw)
      /etc/traefik from traefik-config (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-mbdzs (ro)
  secret-sync:
    Container ID:  
    Image:         quay.io/jupyterhub/k8s-secret-sync:3.3.7
    Image ID:      
    Port:          <none>
    Host Port:     <none>
    Args:
      watch-save
      --label=app=jupyterhub
      --label=release=jupyterhub
      --label=chart=jupyterhub-3.3.7
      --label=heritage=secret-sync
      proxy-public-tls-acme
      acme.json
      /etc/acme/acme.json
    State:          Waiting
      Reason:       PodInitializing
    Ready:          False
    Restart Count:  0
    Requests:
      cpu:  100m
    Environment:
      PYTHONUNBUFFERED:  True
    Mounts:
      /etc/acme from certificates (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-mbdzs (ro)
Conditions:
  Type                        Status
  PodReadyToStartContainers   True 
  Initialized                 False 
  Ready                       False 
  ContainersReady             False 
  PodScheduled                True 
Volumes:
  certificates:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     
    SizeLimit:  <unset>
  traefik-config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      autohttps
    Optional:  false
  kube-api-access-mbdzs:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              <none>
Tolerations:                 hub.jupyter.org/dedicated=core:NoSchedule
                             hub.jupyter.org_dedicated=core:NoSchedule
                             node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason     Age                  From               Message
  ----     ------     ----                 ----               -------
  Normal   Scheduled  46m                  default-scheduler  Successfully assigned default/autohttps-764f9d5b44-cvvf7 to i-0005b46cc367a9f22
  Normal   Pulling    46m                  kubelet            Pulling image "quay.io/jupyterhub/k8s-secret-sync:3.3.7"
  Normal   Pulled     45m                  kubelet            Successfully pulled image "quay.io/jupyterhub/k8s-secret-sync:3.3.7" in 4.128s (4.128s including waiting)
  Warning  BackOff    8m51s (x6 over 27m)  kubelet            Back-off restarting failed container load-acme in pod autohttps-764f9d5b44-cvvf7_default(ac7791ac-0ee7-484b-803a-4738c97f10be)
  Normal   Created    8m36s (x5 over 45m)  kubelet            Created container load-acme
  Normal   Started    8m36s (x5 over 45m)  kubelet            Started container load-acme
  Normal   Pulled     8m36s (x4 over 36m)  kubelet            Container image "quay.io/jupyterhub/k8s-secret-sync:3.3.7" already present on machine

Any idea what I am doing wrong?

Thanks in advance!

This is the key issue, but why?

Do “kubectl logs --previous -c load-acme (pod name of autohttps pod)” and report back what the logs say

Thanks for the reply @consideRatio :slight_smile:

This is what the logs say:

2024-06-12 11:14:49,209 INFO /usr/local/bin/acme-secret-sync.py load proxy-public-tls-acme acme.json /etc/acme/acme.json
2024-06-12 11:17:03,179 WARNING Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7fe1e5d775d0>, 'Connection to 100.64.0.1 timed out. (connect timeout=None)')': /api/v1/namespaces/default/secrets/proxy-public-tls-acme
2024-06-12 11:19:18,343 WARNING Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7fe1e5d77c50>, 'Connection to 100.64.0.1 timed out. (connect timeout=None)')': /api/v1/namespaces/default/secrets/proxy-public-tls-acme
2024-06-12 11:21:33,511 WARNING Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7fe1e5d88290>, 'Connection to 100.64.0.1 timed out. (connect timeout=None)')': /api/v1/namespaces/default/secrets/proxy-public-tls-acme
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/urllib3/connection.py", line 198, in _new_conn
    sock = connection.create_connection(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/urllib3/util/connection.py", line 85, in create_connection
    raise err
  File "/usr/local/lib/python3.11/site-packages/urllib3/util/connection.py", line 73, in create_connection
    sock.connect(sa)
TimeoutError: [Errno 110] Operation timed out

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/urllib3/connectionpool.py", line 793, in urlopen
    response = self._make_request(
               ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/urllib3/connectionpool.py", line 491, in _make_request
    raise new_e
  File "/usr/local/lib/python3.11/site-packages/urllib3/connectionpool.py", line 467, in _make_request
    self._validate_conn(conn)
  File "/usr/local/lib/python3.11/site-packages/urllib3/connectionpool.py", line 1099, in _validate_conn
    conn.connect()
  File "/usr/local/lib/python3.11/site-packages/urllib3/connection.py", line 616, in connect
    self.sock = sock = self._new_conn()
                       ^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/urllib3/connection.py", line 207, in _new_conn
    raise ConnectTimeoutError(
urllib3.exceptions.ConnectTimeoutError: (<urllib3.connection.HTTPSConnection object at 0x7fe1e5d88890>, 'Connection to 100.64.0.1 timed out. (connect timeout=None)')

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/bin/acme-secret-sync.py", line 183, in <module>
    main()
  File "/usr/local/bin/acme-secret-sync.py", line 143, in main
    value = get_secret_value(args.namespace, args.secret_name, args.key)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/bin/acme-secret-sync.py", line 87, in get_secret_value
    secret = v1.read_namespaced_secret(namespace=namespace, name=secret_name)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/kubernetes/client/api/core_v1_api.py", line 25013, in read_namespaced_secret
    return self.read_namespaced_secret_with_http_info(name, namespace, **kwargs)  # noqa: E501
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/kubernetes/client/api/core_v1_api.py", line 25100, in read_namespaced_secret_with_http_info
    return self.api_client.call_api(
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/kubernetes/client/api_client.py", line 348, in call_api
    return self.__call_api(resource_path, method,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/kubernetes/client/api_client.py", line 180, in __call_api
    response_data = self.request(
                    ^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/kubernetes/client/api_client.py", line 373, in request
    return self.rest_client.GET(url,
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/kubernetes/client/rest.py", line 244, in GET
    return self.request("GET", url,
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/kubernetes/client/rest.py", line 217, in request
    r = self.pool_manager.request(method, url,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/urllib3/_request_methods.py", line 136, in request
    return self.request_encode_url(
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/urllib3/_request_methods.py", line 183, in request_encode_url
    return self.urlopen(method, url, **extra_kw)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/urllib3/poolmanager.py", line 444, in urlopen
    response = conn.urlopen(method, u.request_uri, **kw)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/urllib3/connectionpool.py", line 877, in urlopen
    return self.urlopen(
           ^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/urllib3/connectionpool.py", line 877, in urlopen
    return self.urlopen(
           ^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/urllib3/connectionpool.py", line 877, in urlopen
    return self.urlopen(
           ^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/urllib3/connectionpool.py", line 847, in urlopen
    retries = retries.increment(
              ^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/urllib3/util/retry.py", line 515, in increment
    raise MaxRetryError(_pool, url, reason) from reason  # type: ignore[arg-type]
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='100.64.0.1', port=443): Max retries exceeded with url: /api/v1/namespaces/default/secrets/proxy-public-tls-acme (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7fe1e5d88890>, 'Connection to 100.64.0.1 timed out. (connect timeout=None)'))

In case it helps, I’ll give some context: I set up JupyterHub after following the z2jh steps, and I am hosting it in an AWS EC2 instance. When I attempted to spin up a server for a new user, the server wasn’t starting at all or got stuck at some point while starting up, and after some research and debugging, I found out that I had to add this:

hub:
  networkPolicy:
    egress:
      - ports:
          - port: 6443
          - port: 443

I guess that shouldn’t affect to the main issue, but just wanted to add it in case it may cause trouble.

This is the whole config.yaml file that I am using for helm (some blocks are commented out because they will eventually be included, and other config that allows JupyterHub to be embedded in an iframe is included - a bit messy, but everything is still under development)

singleuser:
  image:
    name: aicorecompany/custom-jupyterhub # Custom Docker for experiments
    tag: 0.0.12
proxy:
  https:
    enabled: true
    hosts:
      - <mydomain>
    letsencrypt:
      contactEmail: <myemail>
hub:
  config:
    Application:
      log_level: "DEBUG"
    # JupyterHub:
    #   authenticator_class: ltiauthenticator.lti13.auth.LTI13Authenticator
    # LTI13Authenticator:
    #   username_key: "email"
    #   issuer: <issuer>
    #   authorize_url: <auth_url>
    #   client_id: 
    #     - <client_id>
    #   jwks_endpoint: <jwks_endpoint>


  networkPolicy:
    egress:
      - ports:
          - port: 6443
          - port: 443
  extraConfig:
    csp: |
      c.JupyterHub.tornado_settings = {
        'headers': {
          'Content-Security-Policy': "frame-ancestors http://localhost:3000",
        }
      }
    customSpawner-1: |
      from tornado.web import RequestHandler

      class CSPHandler(RequestHandler):
          def set_default_headers(self):
              self.set_header('Content-Security-Policy', "frame-ancestors http://localhost:3000")

      c.Spawner.args = ['--NotebookApp.tornado_settings={"headers":{"Content-Security-Policy":"frame-ancestors \'self\' http://localhost:3000"}}']

Any advice would be very very helpful! Thanks again and if I will happily provide any more information

EDIT: I tried the same removing the custom Docker image, and the error persists

Are you using AWS EKS, or did you install K8s yourself?

I installed K8s myself following the steps here: Kubernetes on Amazon Web Services (AWS) — Zero to JupyterHub with Kubernetes documentation

I decided to add the key and the certificate manually and it works now.

A couple of things that can be helpful for someone out there:

  • If you are using AWS and purchase a domain, AWS Certificate Manager (ACM) won’t be very useful if you decide to add the key and the certificate manually, as ACM doesn’t show the key and cert (which you need to copy and paste). In my case I bought the domain, and then purchase a certificate through zeroSSL (I know there are probably better options, but I need a quick test)

  • The z2jh states that you should use an A hosted zone, and you might see that it doesn’t work with the external-ip given by kube-proxy. There are some solutions out there saying that you have to create a CNAME hosted zone, but what I found that works better is using an A hosted zone, and use an alias to point to the load balancer your kubernetes is using

    . That way you don’t need to specify a subdomain asked by CNAME hosted zones, and the certificate you use won’t throw any risk warning

I will keep this open in case someone finds a solution for the letsencrypt issue, so it’s easier to keep the Certificate

Thanks a lot for your time @consideRatio and @manics :+1: