Connecting BinderHub to existing JupyterHub (installed with DaskGateway)

Hi!
We already have a JupyterHub deployed (including dask-gateway) from daskhub chart. Tried connecting BinderHub to JHub as documented below: 3. Set up BinderHub — BinderHub documentation

With above changes, Binder builds an image, pushes it to the repository, and goes to start but fails with below logs entries in Binder UI:

Found built image, launching...
Launching server...
Launch attempt 1 failed, retrying...
Launch attempt 2 failed, retrying...
Launch attempt 3 failed, retrying...
Failed to create temporary user for <redacted>/binderhub/binder-dev-binder-2dexamples-2dconda-8677da:f00a783146e9c6a2ed9726f01fc09fbfbad2f89e

I m wondering, is this the right way of doing this or do we have any other way to achieve this, please help.

Below is the values file used for BinderHub installation:

config:
  BinderHub:
    hub_url: https://jhub**
    image_prefix: hub**/binderhub/binder-dev-
    use_registry: true
  DockerRegistry:
    token_url: https://hub****
    url: https://hub**
extraPodSpec:
  dnsPolicy: ClusterFirstWithHostNet
  hostNetwork: true
imageBuilderType: dind
ingress:
  annotations:
  enabled: true
  hosts:
  - binder****
  tls:
  - hosts:
    - binder****
    secretName: binder**
registry:
  password: <Redacted>
  url: <Redacted>
  username: custom
service:
  type: ClusterIP

Is that your full BinderHub configuration? JupyterHub is deployed by default unless you disable it in your BinderHub config. If you haven’t you’ll now have two independent JupyterHub running- so you might as well use the second one to test your BinderHub before connecting it to your first hub.

Can you show us your hub configuration too?

If it’s a standard configuration with authenticated users you can’t just connect BinderHub to it directly:

@manics To disable JupyterHub deployment in BinderHub, we have added the below parameters in the config, but we still see the hub pod in the running state. Is this the right to disable JupyterHub? Please review.

config:
  JupyterHub:
    enabled: false   

And, yes, we have already tested JupyterHub with BinderHub, and it was successful
Below is our complete values.yaml used for binder chart:

config:
  BinderHub:
    hub_url: https://jhub**
    image_prefix: hub**/binderhub/binder-dev-
    use_registry: true
  DockerRegistry:
    token_url: https://hub****
    url: https://hub**   
extraPodSpec:
  dnsPolicy: ClusterFirstWithHostNet
  hostNetwork: true
imageBuilderType: dind
ingress:
  annotations:
  enabled: true
  hosts:
  - binder****
  tls:
  - hosts:
    - binder****
    secretName: binder**
jupyterhub:
  debug:
    enabled: true
  hub:
    config:
      Authenticator:
        admin_users:
        - <Redacted>
      BinderSpawner:
        auth_enabled: true
      GitHubOAuthenticator:
        allowed_organizations:
        - <Redacted>
        client_id: <Redacted>
        client_secret: <Redacted>
        oauth_callback_url: https://<Redacted>/hub/oauth_callback
        scope:
        - read:org
      JupyterHub:
        authenticator_class: github
      proxy.token: <Redacted>
    services:
      binder:
        apiToken: <Redacted>
  ingress:
    annotations:
      nginx.ingress.kubernetes.io/rewrite-target: /
      nginx.ingress.kubernetes.io/secure-backends: "true"
      nginx.ingress.kubernetes.io/ssl-redirect: "true"
      nginx.ingress.kubernetes.io/websocket-services: proxy-public
      nginx.org/websocket-services: proxy-public
    enabled: true
    hosts:
    - jhub-**
    ingressClassName: nginx
    tls:
    - hosts:
      - jhub-**
      secretName: <Redacted>
  proxy:
    secretToken: <Redacted>
    service:
      type: ClusterIP
  singleuser:
    cmd: jupyterhub-singleuser
    extraEnv:
      DASK_DISTRIBUTED__DASHBOARD__LINK: $(JUPYTERHUB_SERVICE_PREFIX)proxy/{{port}}/status
      DASK_GATEWAY__CLUSTER__OPTIONS__IMAGE: $(JUPYTER_IMAGE_SPEC)
    image:
      name: pangeo/pangeo-notebook
      tag: 2023.05.18
    profileList:
    - default: true
      description: Start a container with the chosen specifications on a node of this
        type
      display_name: Pangeo Notebook
      kubespawner_override:
        cpu_limit: null
        mem_limit: null
      profile_options:
        requests:
          choices:
            mem_1:
              default: true
              display_name: ~1 GB, ~0.125 CPU
              kubespawner_override:
                cpu_guarantee: 0.013
                mem_guarantee: 0.904G
          display_name: Container Selection
      slug: pangeo
    storage:
      extraVolumeMounts:
      - mountPath: /home/<Redacted>
        name: nfs-volume
        readOnly: true
      extraVolumes:
      - name: nfs-volume
        nfs:
          path: /mnt/<Redacted>
          server: <Redacted>  
registry:
  password: <Redacted>
  url: <Redacted>
  username: custom
service:
  type: ClusterIP

Our current goal is to set up BinderHub and link it to JupyterHub.
We installed our existing JupyterHub instance using the dask-hub chart and it makes use of the dask-gateway.

Let us know if you need any additional information.

@manics Tried enabling authentication with below values in Binder chart as suggested, but landed into error shown below after entering login credentials:

jupyterhub:
  cull:
    # don't cull authenticated users
    users: False
  hub:
    redirectToServer: false
    config:
      BinderSpawner:
        auth_enabled: true
      # specify the desired authenticator
      JupyterHub:
        authenticator_class: github
      # use config of your authenticator here
      # use the docs at https://zero-to-jupyterhub.readthedocs.io/en/stable/authentication.html
      # to get more info about different config options
      Authenticator: 
        admin_users:
          <Redacted>
    services:
      binder:
        oauth_client_id: service-binderhub
        oauth_no_confirm: true
        oauth_redirect_uri: "https://<Redacted>/oauth_callback"
    loadRoles:
      user:
        scopes:
          - self
          - "access:services"

  singleuser:
    # to make notebook servers aware of hub
    cmd: jupyterhub-singleuser

400 : Bad Request

Invalid client_id parameter value.

Please suggest.

You’ll need to duplicate some of that config in your actual JupyterHub deployment. For example, BinderHub needs to know some of the JupyterHub config to make API requests:

but since the BinderHub chart isn’t running JupyterHub you’ll need to add some of the equivalent config into your existing JupyterHub.

@manics Can you share any help article with detailed instructions on the changes that need to be added to the JupyterHub chart?

@manics please share the documentation, if any

You’ll need to merge some of the bundled JupyterHub config into your own JupyterHub config to allow BinderHub to launch images on your JupyterHub:

As mentioned earlier BinderHub needs to know how to connect to JupyterHub- so you’ll need to get the relevant values from your JupyterHub and substitute them into the relevant BinderHub helm chart values.

Finally you’ll need to create an API token in your JupyterHub. You’ll either need to figure out how to create a k8s secret that matches the required path, or perhaps modify the Helm chart?

You’ll have to create this token yourself in JupyterHub:

There may be a few other things to configure.

@manics As suggested, added https://github.com/jupyterhub/binderhub/blob/c85cac4b56ae14389bfbd066016aaca6dfb6a41a/helm-chart/binderhub/values.yaml#L78-L211 and upgraded existing JupyterHub chart.

On browsing JupyterHUb URL we see message as Service Unavailable and hub pod is getting continuously restarted with below logs:

Loading /usr/local/etc/jupyterhub/secret/values.yaml
No config at /usr/local/etc/jupyterhub/existing-secret/values.yaml
Loading extra config: 0-binderspawnermixin
Loading extra config: 00-add-dask-gateway-values
Setting DASK_GATEWAY__ADDRESS http://proxy-public/services/dask-gateway
Adding dask-gateway service URL
Loading extra config: 00-binder
[I 2024-02-06 17:54:43.743 JupyterHub app:2775] Running JupyterHub version 3.0.0
[I 2024-02-06 17:54:43.743 JupyterHub app:2805] Using Authenticator: oauthenticator.github.GitHubOAuthenticator-15.1.0
[I 2024-02-06 17:54:43.743 JupyterHub app:2805] Using Spawner: builtins.BinderSpawner
[I 2024-02-06 17:54:43.743 JupyterHub app:2805] Using Proxy: jupyterhub.proxy.ConfigurableHTTPProxy-3.0.0
[I 2024-02-06 17:54:43.798 JupyterHub app:1934] Not using allowed_users. Any authenticated user will be allowed.
[I 2024-02-06 17:54:43.825 JupyterHub provider:653] Updating oauth client service-dask-gateway
[I 2024-02-06 17:54:43.854 JupyterHub app:2145] Found unexisting services binder in role definition binder
[E 2024-02-06 17:54:43.854 JupyterHub app:3297]
    Traceback (most recent call last):
      File "/usr/local/lib/python3.9/site-packages/jupyterhub/app.py", line 3294, in launch_instance_async
        await self.initialize(argv)
      File "/usr/local/lib/python3.9/site-packages/jupyterhub/app.py", line 2826, in initialize
        await self.init_role_assignment()
      File "/usr/local/lib/python3.9/site-packages/jupyterhub/app.py", line 2157, in init_role_assignment
        raise ValueError(
    ValueError: services binder defined in config role definition binder but not present in database

Below is the values.yaml for further reference:

dask-gateway:
  enabled: true
  gateway:
    auth:
      type: jupyterhub
    backend:
      scheduler:
        cores:
          limit: 1
          request: 0.01
        extraPodConfig: null
        memory:
          limit: 1G
          request: 128M
      worker:
        extraContainerConfig:
          securityContext:
            runAsGroup: 1000
            runAsUser: 1000
        extraPodConfig:
          securityContext:
            fsGroup: 1000
    extraConfig:
      idle: |
        # timeout after 30 minutes of inactivity
        c.KubeClusterConfig.idle_timeout = 1800
      optionHandler: |
        from dask_gateway_server.options import Options, Integer, Float, String, Mapping
        import string

        # Escape a string to be dns-safe in the same way that KubeSpawner does it.
        # Reference https://github.com/jupyterhub/kubespawner/blob/616f72c4aee26c3d2127c6af6086ec50d6cda383/kubespawner/spawner.py#L1828-L1835
        # Adapted from https://github.com/minrk/escapism to avoid installing the package
        # in the dask-gateway api pod which would have been problematic.
        def escape_string_label_safe(to_escape):
            safe_chars = set(string.ascii_lowercase + string.digits)
            escape_char = "-"
            chars = []
            for c in to_escape:
                if c in safe_chars:
                    chars.append(c)
                else:
                    # escape one character
                    buf = []
                    # UTF-8 uses 1 to 4 bytes per character, depending on the Unicode symbol
                    # so we need to transform each byte to its hex value
                    for byte in c.encode("utf8"):
                        buf.append(escape_char)
                        # %X is the hex value of the byte
                        buf.append('%X' % byte)
                    escaped_hex_char = "".join(buf)
                    chars.append(escaped_hex_char)
            return u''.join(chars)

        def cluster_options(user):
            safe_username = escape_string_label_safe(user.name)
            def option_handler(options):
                if ":" not in options.image:
                    raise ValueError("When specifying an image you must also provide a tag")
                scheduler_extra_pod_annotations = {
                    "hub.jupyter.org/username": safe_username,
                    "prometheus.io/scrape": "true",
                    "prometheus.io/port": "8787",
                }
                extra_labels = {
                    "hub.jupyter.org/username": safe_username,
                }
                return {
                    "worker_cores_limit": options.worker_cores,
                    "worker_cores": options.worker_cores,
                    "worker_memory": "%fG" % options.worker_memory,
                    "image": options.image,
                    "scheduler_extra_pod_annotations": scheduler_extra_pod_annotations,
                    "scheduler_extra_pod_labels": extra_labels,
                    "worker_extra_pod_labels": extra_labels,
                    "environment": options.environment,
                }
            return Options(
                Integer("worker_cores", 4, min=1, label="Worker Cores"),
                Float("worker_memory", 8, min=1, label="Worker Memory (GiB)"),
                # The default image is set via DASK_GATEWAY__CLUSTER__OPTIONS__IMAGE env variable
                String("image", label="Image"),
                Mapping("environment", {}, label="Environment Variables"),
                handler=option_handler,
            )
        c.Backend.cluster_options = cluster_options
    prefix: /services/dask-gateway
  traefik:
    service:
      type: ClusterIP
jupyterhub:
  hub:
    config:
      Authenticator:
        admin_users:
        <Redacted>
      GitHubOAuthenticator:
        allowed_organizations:
        - <Redacted>
        client_id: <Redacted>
        client_secret: <Redacted>
        oauth_callback_url: https://<Redacted>/hub/oauth_callback
        scope:
        - read:org
      JupyterHub:
        authenticator_class: github
    loadRoles:
      binder:
        services:
          - binder
        scopes:
          - servers
          # we don't need admin:users if auth is not enabled!
          - "admin:users"
    extraConfig:
      0-binderspawnermixin: |
        """
        Helpers for creating BinderSpawners

        FIXME:
        This file is defined in binderhub/binderspawner_mixin.py
        and is copied to helm-chart/binderhub/values.yaml
        by ci/check_embedded_chart_code.py

        The BinderHub repo is just used as the distribution mechanism for this spawner,
        BinderHub itself doesn't require this code.

        Longer term options include:
        - Move BinderSpawnerMixin to a separate Python package and include it in the Z2JH Hub
          image
        - Override the Z2JH hub with a custom image built in this repository
        - Duplicate the code here and in binderhub/binderspawner_mixin.py
        """
        from tornado import web
        from traitlets import Bool, Unicode
        from traitlets.config import Configurable


        class BinderSpawnerMixin(Configurable):
            """
            Mixin to convert a JupyterHub container spawner to a BinderHub spawner

            Container spawner must support the following properties that will be set
            via spawn options:
            - image: Container image to launch
            - token: JupyterHub API token
            """

            def __init__(self, *args, **kwargs):
                # Is this right? Is it possible to having multiple inheritance with both
                # classes using traitlets?
                # https://stackoverflow.com/questions/9575409/calling-parent-class-init-with-multiple-inheritance-whats-the-right-way
                # https://github.com/ipython/traitlets/pull/175
                super().__init__(*args, **kwargs)

            auth_enabled = Bool(
                False,
                help="""
                Enable authenticated binderhub setup.

                Requires `jupyterhub-singleuser` to be available inside the repositories
                being built.
                """,
                config=True,
            )

            cors_allow_origin = Unicode(
                "",
                help="""
                Origins that can access the spawned notebooks.

                Sets the Access-Control-Allow-Origin header in the spawned
                notebooks. Set to '*' to allow any origin to access spawned
                notebook servers.

                See also BinderHub.cors_allow_origin in binderhub config
                for controlling CORS policy for the BinderHub API endpoint.
                """,
                config=True,
            )

            def get_args(self):
                if self.auth_enabled:
                    args = super().get_args()
                else:
                    args = [
                        "--ip=0.0.0.0",
                        f"--port={self.port}",
                        f"--NotebookApp.base_url={self.server.base_url}",
                        f"--NotebookApp.token={self.user_options['token']}",
                        "--NotebookApp.trust_xheaders=True",
                    ]
                    if self.default_url:
                        args.append(f"--NotebookApp.default_url={self.default_url}")

                    if self.cors_allow_origin:
                        args.append("--NotebookApp.allow_origin=" + self.cors_allow_origin)
                    # allow_origin=* doesn't properly allow cross-origin requests to single files
                    # see https://github.com/jupyter/notebook/pull/5898
                    if self.cors_allow_origin == "*":
                        args.append("--NotebookApp.allow_origin_pat=.*")
                    args += self.args
                    # ServerApp compatibility: duplicate NotebookApp args
                    for arg in list(args):
                        if arg.startswith("--NotebookApp."):
                            args.append(arg.replace("--NotebookApp.", "--ServerApp."))
                return args

            def start(self):
                if not self.auth_enabled:
                    if "token" not in self.user_options:
                        raise web.HTTPError(400, "token required")
                    if "image" not in self.user_options:
                        raise web.HTTPError(400, "image required")
                if "image" in self.user_options:
                    self.image = self.user_options["image"]
                return super().start()

            def get_env(self):
                env = super().get_env()
                if "repo_url" in self.user_options:
                    env["BINDER_REPO_URL"] = self.user_options["repo_url"]
                for key in (
                    "binder_ref_url",
                    "binder_launch_host",
                    "binder_persistent_request",
                    "binder_request",
                ):
                    if key in self.user_options:
                        env[key.upper()] = self.user_options[key]
                return env

      00-binder: |
        # image & token are set via spawn options
        from kubespawner import KubeSpawner

        class BinderSpawner(BinderSpawnerMixin, KubeSpawner):
            pass

        c.JupyterHub.spawner_class = BinderSpawner        
  ingress:
    annotations:
      cert-manager.io/cluster-issuer: incommon
      nginx.ingress.kubernetes.io/proxy-body-size: 600m
      nginx.ingress.kubernetes.io/proxy-read-timeout: "1800"
      nginx.ingress.kubernetes.io/proxy-send-timeout: "1800"
      nginx.ingress.kubernetes.io/rewrite-target: /
      nginx.ingress.kubernetes.io/secure-backends: "true"
      nginx.ingress.kubernetes.io/ssl-redirect: "true"
      nginx.ingress.kubernetes.io/websocket-services: proxy-public
      nginx.org/client-max-body-size: 10m
      nginx.org/websocket-services: proxy-public
    enabled: true
    hosts:
    - <Redacted>
    ingressClassName: nginx
    tls:
    - hosts:
      - <Redacted>
      secretName: https-auto-incommon
  proxy:
    secretToken: <Redacted>
    service:
      type: ClusterIP
  singleuser:
    extraEnv:
      DASK_DISTRIBUTED__DASHBOARD__LINK: $(JUPYTERHUB_SERVICE_PREFIX)proxy/{port}/status
      DASK_GATEWAY__CLUSTER__OPTIONS__IMAGE: $(JUPYTER_IMAGE_SPEC)
    image:
      name: pangeo/pangeo-notebook
      tag: 2023.05.18
    profileList:
    - default: true
      description: Start a container with the chosen specifications on a node of this
        type
      display_name: Pangeo Notebook
      kubespawner_override:
        cpu_limit: null
        mem_limit: null
      profile_options:
        requests:
          choices:
            mem_1:
              default: true
              display_name: ~1 GB, ~0.125 CPU
              kubespawner_override:
                cpu_guarantee: 0.013
                mem_guarantee: 0.904G
          display_name: Container Selection
      slug: pangeo
     - <Redacted>
    storage:
      extraVolumeMounts:
      - mountPath: /test/campaign
        name: campaign
        readOnly: true
      extraVolumes:
      - name: campaign
        nfs:
          path: /gpfs/<Redacted>
          server: <Redacted>

Please suggest.

It looks like you’re still missing the definition of the binder service in your JupyterHub config: