BinderHub HTTPS

Hi,

I just did a basic BinderHub deployment in GCP, but I really just want to get to the next level. For that I simply need some help to wrap my head around documentation on this.
https://binderhub.readthedocs.io/en/latest/https.html

I got passed the deprecated syntax since I am using helm v3. BUT then it simply derails for me (I have not done a helm update yet because I don’t understand. And I don’t like to deploy something that I don’t understand).
We agree that the we have TWO IP’s in play by default when deploying a BinderHub setup. One for the BinderHub page and one for the JupyterHub. That makes sense also in the config.yaml where we route between those two.

But as I read the #latest documentation on HTTPS and cert-manager I simply rely on ONE IP with two different A records. How does that resolve the internal routing from the BinderHub page to the JupyterHub page? Is something missing in the guide or am I simply a total noob when it comes to such a setup (which I am). But please enlighten me! I REALLY want to understand! :slight_smile:

So far thanks for a super cool open source project!

1 Like

Where did you get the idea of having two IPs? Most BinderHub deployments have one public IP and two hostnames (one for the binderhub and one for jupyterhub). The ingress controller (for example nginx) decides based on the Host: header the browser sends which of the two the browser wants to talk to.

For example https://gke.mybinder.org/ is where you talk to our BinderHub and https://hub.gke.mybinder.org/hub/api is where you reach the JupyterHub. Both hostnames point to the same IP (even more correctly hub.gke2.mybinder.org is a CNAME record pointing to gke2.mybinder.org).

ccert-manager is what the kubernetes community seems to have settled on as the tool for sorting gout let’s encrypt certificates. Unfortunately it is pretty “full on kubernetes” in the way it “thinks” and works. Something that helped me a lot get to grips with it and understand how it “ticks” is to follow the cert-manager guide to the letter and deploy some very simple single pod apps that request a hostname. That way you can get a bit of practice before then adding the complexities off a BinderHub deploy on top.


Yes our HTTPS guide could do with some tender love and care. As all docs it is ~constantly out of date :-/

Thanks for clearing a boy of my confusion out :slight_smile:

I got the idea about two IPs from the facts that I can enter both the http binderhub IP and get the binderhub page and I can enter the jupyterhub IP and e.g. get the admin page. And I thought those two should know about eachother. Or atleast the binderhub page should know where the jupyterhub service was running.
Like from the initial docs:
"
Copy the IP address under EXTERNAL-IP. This is the IP of your JupyterHub. Now, add the following lines to config.yaml file:

config:
BinderHub:
hub_url: http://
Next, upgrade the helm chart to deploy this change:
"
How does binderhub know “where to go”, if it only gets a direction from an A record pointing at the same IP? My jupyterhub is running behind that external IP. If I change this with an arbitrary A record for nginx ingress I fail to see how that will work? If it is totally idiotic question I am really sorry! :-/

There are also these initial docs (again sorely out of date!) that may help

Thanks for the link! I will have a look :slight_smile:

I get the mechanism about certificates, but how do we ensure that changing the config.yaml from:

config:
  BinderHub:
    hub_url: http://"EXTERNAL IP FOR JUPYTERHUB SERVICE"

which comes from:

X@cloudshell:~/binderhub (Y)$ kubectl --namespace=Z get svc proxy-public
NAME           TYPE           CLUSTER-IP    EXTERNAL-IP      PORT(S)        AGE
proxy-public   LoadBalancer   INTERNAL   EXTERNAL   80:30261/TCP   7m57s

to

config:
  BinderHub:
    hub_url: https://"JUPYTERHUB A RECORD FOR NGINX INGRESS"

where is the “link” made, so I ensure that it actually ends up the right place? Because this still seems arbitrary to me?!

A quick reply now, more detailed one later (possibly much later).

I realised that I haven’t read and followed our setup guide for too long. As a result I can’t play the steps through in my head and need to find time to follow it myself.

I think if you strictly follow the guide you might actually have two IPs when you are at 3. Set up BinderHub — BinderHub 0.1.0 documentation. However you have no hostnames setup yet. So the way the two hubs know how to talk to each other are the two IPs. You are also not yet using nginx-ingress as an ingress controller.

However when you get to Secure with HTTPS — BinderHub 0.1.0 documentation you will setup nginx and change the ServiceType for both hubs to ClusterIP. At which point the services representing the two hubs won’t have a public IP any more. Only the ingress service will.

Like I said I haven’t followed our setup guide to the letter (compared to more or less freestyling it) for “a long long time”. So I will post an update when I find time to do that.

Maybe someone else who has more recently followed the guide can chime in and help clear up these points. I think getting things setup and working is complex enough that being super precise with words and instructions is worth it. Otherwise you quickly end up in a super weird situation that it is hard to recover from.

1 Like

So when I do this on Azure (which I detail in the first comment on the thread I sent previously):

  • I set up a DNS Zone
  • Add A records that point to the ClusterIP of nginx-ingress LoadBalancer

Is this the “link” you mean? Because I think this is some thing that you do on whatever platform your domain name is being hosted with and the instructions on how to do that will differ depending.

Is this helpful? :grimacing:

1 Like

A follow up would be VERY much appreciated. I tried to follow the HTTPS trough and I ended up crashed my Kubernetes cluster - like it exploded - or something pretty close :slight_smile: Anyway, it was not that glad for that “ClusterIP” setting :sob:

So I am pretty sure I ended up in that “super weird situation” :smiley: But I closed the cluster completely and started over. But now I am just stranded in the situation of complete failure:

Found built image, launching…
Launching server…
Launch attempt 1 failed, retrying…
Launch attempt 2 failed, retrying…
Launch attempt 3 failed, retrying…
Internal Server Error

In the zero to BinderHub there is nothing about DNS zones as I recall it :sob:

Yesterday I actually got it working with GitHub authentication and and access to private GitHub. But after taking it to the HTTPS-level it simply crashed and burned. So much for just once challenging my principle of fully understaing before deploy! Crap :expressionless:

Is it not step 2 here? Secure with HTTPS — BinderHub 0.1.0 documentation “Getting a DNS Zone on Azure” is equivalent to buying one from a registrar. Because my hub is hosted by the Turing Institute, it lives under turing.ac.uk and I “just” had to register with my IT department.

I have also burned many, many hubs before getting to where I am :joy:

Oh yes, of this part. But not the initial stuff. I also have a domain from Google Domains (what is called a registrar I guess)

I also got through this. And then I started to wonder. Now I have one IP “to rule them all”.

Before I had an A record for the public-proxy (effectively being the JupyterHub) hence:

config:
BinderHub:
hub_url: http://“EXTERNAL IP FOR JUPYTERHUB SERVICE”

And then one A record for BinderHub (also external IP). as a result of the initial basic config.yaml

Now the config is changed (among other things) to this ClusterIP, which was not acceted during update. I got an error code, which I did copy because everything became non-responsive.

But as I both understood and still understand it, I should just make two A records, which points at the same ingress IP, correct? It seems counter intuitive coming from the initial state. But that doesn’t make it impossible ofcause :slight_smile:

1 Like

Yes, that is correct. I think (may be wrong) this is because the loadbalancer will now check where requests are coming from and redirect them to the appropriate internal IP of either the proxy or binder service, so the A records still need to be distinguishable from one another. (A high probability that I just made that up though :sweat_smile: )

1 Like

yes sure, but HOW does it know the internal IPs from the records. I have created both a binder.domain.com and a hub.domain.com. So that I understood :slight_smile:

AH! You provide the host names/kubernetes annotations in the config e.g. hub23-deploy/values.yaml at d111d44d39b95c99de6ea906a53b2038f5b55eec · alan-turing-institute/hub23-deploy · GitHub I guess nginx-ingress knows how to read these.

Okay yes,

So this changes the type from LoadBalancer as I see it now when running
kubectl --namespace=binderhubplayground get svc binder
an makes it clusterIP type, which then explains how I end up at the binderhub page. As I read the yaml file I should then have an ingress part both in jupyter part and binder part. From Secure with HTTPS — BinderHub 0.1.0 documentation it seems like one part is sort of ouside a context? Is that right or should it be moved up as a part of the BinderHub context?

Yes, that chart I linked to is dependent on the BinderHub chart so it has an extra top level key binderhub. If you’re using the chart directly then you won’t have the top level key.

1 Like

Hey!
How did you make it work with GitHub? I can build the image, and I’m trying to push it to my repository (after giving Kubernetes the deploy access token) but I always get access denied at the end. I think I’m missing some configuration details but I followed the documentation in Zero to BinderHub about using a custom registry…
Thanks!