Make JupyterHub authentication pluggable

Current state

During the July’s JupyterHub/BinderHub meeting, amongst other things, we discussed the growing interest in using BinderHub with user specific private repos.

To some extent, BinderHub is able to do that but this requires to have a central user that can access private repositories which can only be achieved if one has the control of the target provider like for example a custom GitLab instance or an organisation in the case of GitHub with again a dedicated user to access the repositories. The security issues related to a single account being able to access everything are pretty clear.

BinderHub can make use of JupyterHub authentication system when one wants to use BinderHub in a private fashion. In the context of private repos, JupyterHub’s authentication could be used to get the required credentials in order to pull private repo content and build images from them. However, beside the need to implement the credential retrieval part, the authentication system can only handle one provider thus making it of limited use in the scenario of supporting multiple providers in BinderHub.

As we can see here, this is not just a question of minor tweaks to the current code base.

Where we would like to go

To put it simply: create an authentication mechanism that allows credentials retrieval and being able to configure and use multiple sources for a single installation.

The way the current authentication works does not scale with the requirements made above. Therefore @manics has suggested to refactor the authentication system to make it easier to use in standard JupyterHub installations as well as for the purposes of BinderHub. Part of this work will be to make the authentication pluggable so that it will allow to:

  1. Use more than one authentication provider at the same time
  2. Implement new providers in an easier fashion
  3. Request credentials for the logged in user to access private repos appropriately

This last point is of more interest to BinderHub than JupyterHub itself at this time.

JupyterHub is not the first system that needs that kind of support and inspiration could be taken from projects like django-allauth.

There’s also python-social-auth which is more generic and might fit the current needs without having to re-implement all the flows.

For the record, there are already some ideas that can be found in this jupyterhub/oauthenticator issue.

In any case, the task here is not a small item as it touches a core element with security implications thus @sgibson91 proposed to organise this work in a more coordinated fashion so that we can better prepare the related subtasks with the support of @yuvipanda, @minrk and @betatim.

The goal of this post is to gather ideas and suggestions about this topic in order to prepare a JEP and lay out a plan for the implementation.

2 Likes

As discussed in oauthenticator, I believe a MultiAuthenticator can work today with the JupyterHub APIs, following the example of wrapspawner and satisfy all of these requirements. It would be simplest if they were all oauth providers, and I think that’s a sensible requirement, but not strictly necessary.

Essentially, a MultiAuthenticator would be a proxy implementation for a collection of any number of Authenticator classes. There is already a proof of concept that works today. The main things it would need to do:

  1. custom login html to offer multiple links instead of just one
  2. record which sub-authenticator class is chosen in auth state
  3. wrap returned usernames to avoid collisions (could be fancy, and allow users to pick and map on successful login, or could be simple and apply a provider prefix/suffix.)

However, for the most part, oauth providers are consistent with a collection of URLs for configuration, and can be relatively brief configurations of GenericOAuthenticator (a few urls and a scope list).

It may well make sense to promote the ‘collection of authenticators’ concept to a core JupyterHub feature (i.e. multiple Authenticators instead of one), but I think it’s best to implement it as MultiAuthenticator first. I don’t believe there are any current impediments to this, but the way to start is to try building it and see if/where that’s true. I think it’s appropriate to do this in the OAuthenticator package.

Looking at python-social-auth and django-allauth, they aren’t substantially different from OAuthenticator, but some folks have gone through all the tedious work to define lots of providers, which is great. I think it would be super useful to investigate consuming the python-social-auth API to generate authenticators so we don’t need to duplicate everything in OAuthenticator. If it goes well, maybe it should even be a dependency and we can reduce a lot of the redundant definitions we have.

So I see these main tasks to explore:

  1. implement MultiAuthenticator prototype in OAuthenticator, and test it out
  2. implement SocialAuthenticator that takes a name from python-social-auth and adapts it to the Authenticator API. The biggest impediment seems to be that social-auth is all sync APIs, so we’d need to wrap it in @run_in_executor to put blocking requests in a background thread.
  3. investigate python-social-auth to see if there are any other common APIs we should be making more consistent across our OAuthenticators.

If intrepid folks were really ambitious, this could start as a new package meant to be a successor to OAuthenticator, but not developing the original package. I think either approach is fine and has trade-offs - a new package means some repeated work, but minimal backward-compatibility concerns.

4 Likes