I used the below configuration, to link the Jupyterhub instance through ingress and ensure tls encryption with a basic certificate for the base URL and a wildcard certificate for the user subdomains.
WIthout user subdomains enabled this works fine.
After logging in the user is directed to “https://base-url.example.com/user/testuser/lab?”
When I enable the commented out configuration however my Jupyterhub instance starts to behave badly.
After login the user is directed to “https://testuser.base-url.example.com/hub/user/testuser/lab” and gets displayed a “500: Internal server error redirect loop detected.”
The hub pod logs show several times in succession:
[timestamp] 302 GET /hub/user/testuser/labs? -> https://testuser.base-url.example.com/user/testuser/lab?
[timestamp] 302 GET /user/testuser/lab? -> /hub/user/testuser/labs?
until finally it shows:
[timestamp] 500 GET /hub/user/testuser/lab? : Redirect loop detected.
and closes the connection.
I am guessing that the redirect has an error and should usually strip:
Are you able to get JupyterHub working with subdomains without the Helm Chart? This means you have full control over jupyterhub_config.py and can iteratively modify it, and will help identify whether KubeSpawner handles subdomains
If KubeSpawner doesn’t work, then are you able to get JupyterHub working with subdomains but without Kubernetes, i.e. using a different spawner?
Actually i use subdomains in a z2jh deployment, it works fine! I don’t remember if there was something to fix for it to work currently, but i could check.
I didn’t need much changes at all to the config for things to work. Just hub.config.JupyterHub.subdomain_host and configuring my Ingress and certificate acquisition to handle the wildcard subdomain.
In your config, I would try using a user image known to work, for example quay.io/jupyter/minimal-notebook:latest, and not adjust the cmd for the image either.
jupyterhub:
ingress:
enabled: true
ingressClassName: nginx
hosts:
- staging.example.com
- "*.staging.example.com"
tls:
- hosts:
- staging.example.com
- "*.staging.example.com"
secretName: hub-tls
annotations:
nginx.ingress.kubernetes.io/proxy-body-size: 256m
# i use let's encrypt to acquire certificates
cert-manager.io/cluster-issuer: letsencrypt
cert-manager:
serviceAccount:
annotations:
# gcp workload identity way of getting a service-account
# with permissions to fiddle with DNS settings, as required for cert-manager to handle
# DNS-01 challenges, as required for the *.staging.example.com wildcard certificate
iam.gke.io/gcp-service-account: my-gcp-sa-name@my-project-name.iam.gserviceaccount.com
My setup with cert-manager required this:
# this is a helm template, so must be edited before `kubectl apply -f ...` etc
---
kind: ClusterIssuer
apiVersion: cert-manager.io/v1
metadata:
name: letsencrypt
labels:
{{- include "infra.labels" . | nindent 4 }}
spec:
acme:
email: {{ .Values.clusterIssuer.email | required "clusterIssuer.email is required" }}
server: https://acme-v02.api.letsencrypt.org/directory
profile: tlsserver
privateKeySecretRef:
name: letsencrypt
solvers:
- http01:
ingress:
class: nginx
- dns01:
cloudDNS:
project: {{ .Values.clusterIssuer.dns01.project | required "clusterIssuer.dns01.project is required" }}
# hostedZoneName makes us not need to provide project wide
# permissions to list DNS zones etc, but instead just permissions to
# act on this specific zone.
hostedZoneName: {{ .Values.clusterIssuer.dns01.hostedZoneName | required "clusterIssuer.dns01.hostedZoneName is required" }}
Oooooh, the proxy pod! It may need to be restarted manually. This requires the proxy pod to be restarted i suspect.
Shutdown your servers, configure subdomain host, upgrade, check if proxy pod was restarted or not, manually restart it,and if you did also restart the hub pod after, then start a user server.