Trouble starting a Binder after inactivity (While my Hub gently sleeps...)

#1

Hi,

I’m encountering a slightly annoying characteristic with my BinderHub in that it seems to “go to sleep” after long periods of inactivity. This manifests as “Internal Server Error” or failure to launch server on the Binder page, and the repo/user pod not showing up when I run kubectl get pods.

The only way I’ve found to rectify this is to scale the hub (effectively a restart).
kubectl scale deployment hub --replicas=0 (wait for the hub pod to terminate)
kubectl scale deployment hub --replicas=1

This is really manual and makes me look silly when I want to demonstrate the BinderHub to other people :disappointed: Can anyone suggest what might be happening?

Thanks!

Config is below:
config:
  BinderHub:
    # connect image registry and set image prefix
    use_registry: true
    image_prefix: <redacted>

    # jupyterhub IP address
    hub_url: http://<redacted>

    # enable authentication
    auth_enabled: true

    # allow customisation of web pages
    template_path: /etc/binderhub/custom/templates
    extra_static_path: /etc/binderhub/custom/static
    extra_static_url_prefix: /extra_static/
    template_variables:
      EXTRA_STATIC_URL_PREFIX: "/extra_static/"

jupyterhub:
  cull:
    # cull every 11 minutes so it is out of phase
    # with the proxy check-routes interval of five minutes
    every: 660
    timeout: 600
    # maxAge is 6 hours: 6 * 3600 = 21600
    maxAge: 21600

  hub:
    services:
      binder:
        oauth_redirect_uri: "http://<redacted>/oauth_callback"
        oauth_client_id: "binder-oauth-client-test"
    extraConfig:
      hub_extra: |
        c.JupyterHub.redirect_to_server = False

      binder: |
        from kubespawner import KubeSpawner

        class BinderSpawner(KubeSpawner):
          def start(self):
              if 'image' in self.user_options:
                # binder service sets the image spec via user options
                self.image = self.user_options['image']
              return super().start()
        c.JupyterHub.spawner_class = BinderSpawner

  singleuser:
    # to make notebook servers aware of hub
    cmd: jupyterhub-singleuser
    # limit CPUs and RAM
    memory:
      limit: 1G
      guarantee: 1G
    cpu:
      limit: .5
      guarantee: .5

  auth:
    type: github
    admin:
      access: true
      users:
        - sgibson91
    github:
      clientId: "<redacted>"
      clientSecret: "<redacted>"
      callbackUrl: "http://<redacted>/hub/oauth_callback"

  # pre-pull images onto pods so that users aren't left waiting as long
  prePuller:
    continuous:
      enabled: true

  scheduling:
    # schedule 3 dummy pods to reduce user wait time
    podPriority:
      enabled: true
    userPlaceholder:
      # Specify three dummy user pods will be used as placeholders
      replicas: 3

    # enable user scheduler to facilitate the efficient scheduling of user pods
    userScheduler:
      enabled: true

    # match core pods to nodes with label 'hub.jupyter.org/node-purpose=core'
    corePods:
      nodeAffinity:
        # matchNodePurpose valid options:
        # - ignore
        # - prefer (the default)
        # - require
        matchNodePurpose: require

# allow customisation of web pages
initContainers:
  - name: git-clone-templates
    image: alpine/git
    args:
      - clone
      - --single-branch
      - --branch=master
      - --depth=1
      - --
      - https://github.com/alan-turing-institute/hub23-custom-files
      - /etc/binderhub/custom
    securityContext:
      runAsUser: 0
    volumeMounts:
      - name: custom-templates
        mountPath: /etc/binderhub/custom
extraVolumes:
  - name: custom-templates
    emptyDir: {}
extraVolumeMounts:
  - name: custom-templates
    mountPath: /etc/binderhub/custom

# enable rbac
rbac:
  enabled: true
#2

Running kubectl logs on the hub pod, I see:

[W 2019-05-07 13:40:09.249 JupyterHub base:900] User sgibson91 is slow to start (timeout=0)

and also:

[W 2019-05-07 13:40:14.133 JupyterHub _version:56] jupyterhub version 1.0.0b2 != jupyterhub-singleuser version 0.9.4. This could cause failure to authenticate and result in redirect loops!

Helm chart version is 0.2.0-7b2c4f8, will try upgrading.

#3

Upgraded to chart version 0.2.0-10ac4d8. This seemed to spin the user pod up on the Kubernetes cluster, but the Binder page didn’t respond and no Build Logs were available. Still seeing the same messages as the previous comment in the hub logs. Going to try one commit-hash behind.

1 Like
#4

Tried helm chart version 0.2.0-5536a0f. This launched a repo pretty quickly. Still seeing the following in the hub logs:

[W 2019-05-07 14:10:39.182 JupyterHub _version:56] jupyterhub version 1.0.0b2 != jupyterhub-singleuser version 0.9.4. This could cause failure to authenticate and result in redirect loops!

Will wait an hour or so and see if the Hub is still as keen to launch repos.

1 Like
#5

I think in this case it is “harmless” and means we need to update repo2docker here to use the latest version of JupyterHub.

#6

Thanks @betatim

My JHub seems to “slow down” whenever I try to increase my GitHub API limit with a Personal Access Token. Remove the token, much quicker spin up of pods.

My secret.yaml file is set out as follows and the PAT was given no permissions when created.

jupyterhub:
  hub:
    services:
      binder:
        apiToken: "<apiToken>"
  proxy:
    secretToken: "<secretToken>"

config:
  GitHubRepoProvider:
    access_token: <accessToken>

registry:
  username: <docker-id>
  password: <password>
#7

Previous comment is irrelevant, problem is persisting :disappointed:

#8

So I think I’ve solved this issue, or at least improved it. Once again, I was bitten by the indentation goblin and the rbac key should have been under jupyterhub, not out by itself. This has definitely improved the spin up time for pods, though tbh it’s still not as fast as I’d like. The Binder page pretty much always gives an Internal Server Error on the first try and I assume this is a lag from the proxy telling the JHub about Binder. A couple of refreshes of the launch page usually wakes things up and normal service resumes.