Passing the JupyterHub profile in the URL

Hello,
I have a use case where Jupyter is being used for interactive coding lessons, as with W3Schools, MDN, DataCamp, etc., etc…too many to name. The advantage to Jupyter is that it has a much richer set of environments.
In order to do this, we’re putting the JupyterHub requests in an iframe on the tutorial page (or opening Jupyter in a new tab), reading that and activating it. We’re using a subclass of TmpAuthenticator for the authentication, which we’ll be happy to contribute back.
To make this frictionless, we’re passing in the relevant profile in the URL. This works very well, but if the user already has a session going with a non-expired cookie our authentication and spawning code gets bypassed and the user is directed to his existing notebook server. We’re addressing this right now through the culler, by setting aggressive idle timeouts and setting the culler to invalidate the user. This works well, but it’s dependent on timing.
What we’d like is (when a different profile is selected in the URL) to stop the current server and start the selected server, else redirect to the current server. Something like this:

profile = get_desired_profile_from_url()
if the user has an existing session:
  if profile == profile of existing session:
    redirect to existing session # this is what happense in all cases now
  else:
    kill the old session
    spin up the server with profile and direct the user to that
else:
  spin up the server with profile and direct the user to that

Implementing this is no problem; most of it we’ve already got, and “kill the old session” has a REST call at least, and ideally a Python call. Our problem is that when there’s a cookie, none of the Hub code we’ve written gets called.
We tried setting c.Authenticator.refresh_pre_spawn = True and implementing authenticator.refresh_user, but that code just doesn’t seem to be getting called. Is there any way we can get into the authentication/spawn flow when there’s a valid cookie and an existing Notebook server?

And, incidentally, if anyone else is thinking of a use case like this and would like our code (modified to be reusable) we’re happy to contribute it. We usually use a BSD license, but we’re happy to use the community standard

Some questions about your goals:

  • Should the previous server always stop, or should the new session be started while the other is still running?
  • Does it make sense for there to be more than one instance of a given profile?
  • Do you always want to start a new server, or is connecting to an existing server okay, as long as it’s the right profile?
  • Is there a limit to how many servers a given user should run at a time?

It may be that you should be operating a JupyterHub service that your site talks directly to, which can query the JupyterHub API and make more granular, specific decisions. It could do things like create temporary users as appropriate, ensure the right server is running, etc. This is how BinderHub works, for example. BinderHub actually uses NullAuthenticator, disabling user auth with JupyterHub entirely, instead issuing tokens to users created via the API. TmpAuthenticator can work, but has this issue that once logged in, the browser does represent a JupyterHub user, so you need to deal with the case that they may be logged in as a user with one or more running servers.

One possible solution if I’ve understood your goals would be to use named servers, so each given environment maps onto a unique server name (this can actually be the profile name if two sessions running in the same server instance is okay as long as the profile is right). Then it won’t conflict with other servers owned by the same user with a different environment, and both can be running concurrently.

1 Like

@minrk
First, many thanks for the thoughtful and detailed questions and suggestions.
I think the best way to think about this is to think about the Jupyter servers as advanced versions of the code boxes on instructional web pages. I can imagine a student having two different lessons open at the same time, talking to different servers. However, two different windows on the same lesson should be talking to the same server. And of course different users should be talking to different servers.
So:

  1. The same user with two different windows on the same lesson (a “lesson” is a standard JupyterHub profile, and I’m using auth_state and pre_spawn_hook to pick the right one from the URL) should be using a single server.
  2. The same user looking at two different lessons in separate tabs should be talking to two different servers.
  3. I can’t imagine any case where a single user should have more than two lessons open at a time, and come to that I think two would be a rare case. Thinking about my own usage of the code boxes on instructional pages today, I very rarely have more than one active at a time, and never more than two – and these lessons (or so my colleagues tell me) are far more intense and immersive than checking out how a particular API works in W3 Schools.
  4. No more than one user should be talking to any server, since the lessons and code exercises are stateful.

So it seems one way to do this would be to build a JupyterHub service like BinderHub which would take a (signed) request for a lesson (aka z2jh profile) and a username and:

  1. If there’s a server username-profile running, simply redirect the user to that
  2. If there isn’t, spin one up and then redirect the user to that

One thing we want (obviously) is to prevent third parties from accessing a running server, but now that I think of it storing username/profile in the JH DB with a UUID as the URL will work fine, since a third party would have to guess the UUID and these things will be short-lived. We’re culling after a few minutes idle and in any case in four hours.

Thanks! I will check out BinderHub. Are there any other JH services you can think of I should look at?

That sounds like a good summary! I think one point missing is what exactly to do if the user has a server running with the wrong profile. The main choices would be:

  1. spin up the right profile at a different server name, or
  2. stop the previous one first (enforce only one at a time), or
  3. error if some concurrency limit is exceeded

All of these are possible with a GET /hub/api/users/:name and then decide what to do from there.

I don’t think so. Most of the relevant logic in BinderHub will be in launching and the BinderSpawner, where we process launch arguments.