Binder failed to launch: User already has a server running

Hi,

I have authentication running on my BinderHub, my question is what to do about the following message? This seems to be occurring when a user is launching Binder instances in quick succession, but I feel like the user should be able to leave the Binder homepage open after they’ve authenticated and keep relaunching a Binder should they choose, or a different Binder repo. At the minute, they would have to wait for the culler to kill their user pod before they could open another Binder instance, or stop their server from the Control Panel in the Notebook environment (depending on the setting of cull.users, I’ve tried playing with True and False values). Any advice?

Found built image, launching...
Launching server...
Launch attempt 1 failed, retrying...
Launch attempt 2 failed, retrying...
Launch attempt 3 failed, retrying...
User <username> already has a running server.

Config below:

config:
  BinderHub:
    use_registry: true
    image_prefix: <image-prefix>-
    hub_url: http://<jupyter-ip>
    auth_enabled: true

jupyterhub:
  cull:
    users: false
    every: 660
    timeout: 600
    maxAge: 21600

  hub:
    services:
      binder:
        oauth_redirect_uri: "http://<binder-ip>/oauth_callback"
        oauth_client_id: "binder-oauth-client-test"
    extraConfig:
      hub_extra: |
        c.JupyterHub.redirect_to_server = False

      binder: |
        from kubespawner import KubeSpawner

        class BinderSpawner(KubeSpawner):
          def start(self):
              if 'image' in self.user_options:
                # binder service sets the image spec via user options
                self.image = self.user_options['image']
              return super().start()
        c.JupyterHub.spawner_class = BinderSpawner

  singleuser:
    cmd: jupyterhub-singleuser

  auth:
    type: github
    github:
      clientId: "<redacted>"
      clientSecret: "<redacted>"
      callbackUrl: "http://<jupyter-ip>/hub/oauth_callback"

I’ve never run a server with auth, so wild speculation ahead: I think what you diagnosed is exactly right and probably the code doesn’t contain any “shutdown and then launch new repo” logic. I think it would make sense to add that.

Orr, investigate “named servers” in combination with auth. Using the “name of the repo” as the name for the server and let people start lots of servers in parallel. Which would be similar to how an anonymous BinderHub works at the moment.

This is how I ended up with cull.maxAge set so low as I regularly launch the same repo when changing config.yaml to check everything is still working. But sometimes I think the Hub gets confused if a pod with the same repo URL is still running so I want to clear out those pods as quick as possible while I’m testing things. And kubectl delete pod feels a bit renegade.

“Named servers” definitely sounds like the route I want to take for the time being, as there’s no persistent volume claim associated with the user (yet). Auth is only to determine who can have access to the Binder launch page right now. (This of course may all change!)

delete first, ask questions later. Not sure this is the official guidance for mybinder.org ops but close to it :wink:

Easier to ask forgiveness than permission!

Are there any docs on named servers or shall I dive into the source code?

I’ve reassigned this topic as JupyterHub as I probably need to track down where JH names users pods in the Helm Chart.

@sgibson91 not sure if you’ve found a solution to this, but there is a pod_name_template configuration variable you can set for KubeSpawner. See e.g. https://github.com/jupyterhub/kubespawner/blob/46a4b109c5e657a4c3d5bfa8ea4731ec6564ea13/kubespawner/spawner.py#L263 and https://github.com/jupyterhub/kubespawner/blob/46a4b109c5e657a4c3d5bfa8ea4731ec6564ea13/kubespawner/spawner.py#L142.

Also, note that if you use kubectl delete you still have to either wait for JH (and the proxy) to clean up their internal state (I think by default this takes 5 minutes but I could be wrong) or restart JH.

1 Like

Thank you @rokroskar, I will look into this!

Yes, I’ve now made myself admin on the JH so I can stop servers there. Still very manual but seems to trigger the JH and proxy to update their state more consistently.

Hi @rokroskar, I’m just getting back to this as I got distracting squishing other bugs. Could you provide some advice on how to begin implementing this please? Under which key in config.yaml do I implement the pod_name_template and how do I parse the repo/image name instead of the user name? Many thanks!

Oops, clearly this dropped off my radar - sorry!

In the values file, you can do something like this:

jupyterhub:
  hub:
    extraConfig:
      myConfig:  |
        c.KubeSpawner.pod_name_template = 'my-pod-template-{username}'

If you wanted to support a custom naming scheme with other template variables, you could just subclass the KubeSpawner and override the _expand_user_properties method. You then need to register your spawner with JupyterHub - see these docs for more info.

Thanks @rokroskar, I’ll have a play with this soon :slight_smile:

So I’m reviving this topic now that I’ve had some time to play :joy:

I’ve added the following to my values config:

jupyterhub:
    hub:
      extraConfig:
        myConfig: |
          c.KubeSpawner.pod_name_template = 'jupyter-{username}{servername}'

(Copying this line)

But I can’t launch Binders now, and the Hub logs show the following error:

[E 2019-10-23 11:58:25.946 JupyterHub user:626] Unhandled error starting sgibson91's server: (422)
    Reason: Unprocessable Entity
    HTTP response headers: HTTPHeaderDict({'Audit-Id': '9e6d5f1c-576b-4fdc-bacc-604bb09cc4ad', 'Content-Type': 'application/json', 'Date': 'Wed, 23 Oct 2019 11:58:25 GMT', 'Content-Length': '880'})
    HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Pod \"my-pod-template-\" is invalid: metadata.name: Invalid value: \"my-pod-template-\": a DNS-1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*')","reason":"Invalid","details":{"name":"my-pod-template-","kind":"Pod","causes":[{"reason":"FieldValueInvalid","message":"Invalid value: \"my-pod-template-\": a DNS-1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*')","field":"metadata.name"}]},"code":422}

I assume I need to pull username and servername from somewhere to fill in the template, but where?

Ack, I think this is my answer: https://binderhub.readthedocs.io/en/latest/authentication.html#authentication-with-named-servers

Thank you to whoever has already done this work and contributed back!

This happens to me every so often:

I successfully launch the server, then close that tab, forget, and then try to launch the server again, which results in:

Found built image, launching...
Launching server...
Launch attempt 1 failed, retrying...
Launch attempt 2 failed, retrying...
Launch attempt 3 failed, retrying...
User rsignell-usgs already has a running server.

I then just have to remember that my already running server name looks like this:

https://hub.aws-uswest2-binder.pangeo.io/user/rsignell-usgs/lab

Would it be possible to spit that out in the logs?
Something like:

User rsignell-usgs already has a running server at https://hub.aws-uswest2-binder.pangeo.io/user/rsignell-usgs/lab
1 Like

I believe that should be quite doable, yes