A question about BinderHub authentication and privacy

Hello all!

I’m trying to understand what level of privacy BinderHub/JupyterHub/JupyterServer provide when working with private repos.

I saw that BinderHub does support authentication and that users can be set up to only launch one instance at a time. But I also saw this section of the docs which say “any user can build any private repository that BinderHub has access to.”

Do I understand right that enabling authentication for BinderHub does not actually make the private repos launched by BinderHub private internally? Instead, the authentication just restricts the number of servers a user can run (and stops non-authenticated users from launching instances)?

My hope was to be to limit access to each notebook to one user and a few administrators who can access all notebooks.

Enabling authentication for BinderHub allows you to control who can access your Hub, e.g. through GitHub organisations. However, allowing access to private repos is a separate process since logging into the Hub and then starting a Binder instance are separate events. I think we are waiting on this PR to have finer control over private repos.

Basically, someone (an admin) has to provide an access token to BinderHub that allows it to authenticate to the repo-provider in order to clone “as BinderHub” (but it’s really cloning as the person who created the token). Hence if I provided my access token to BinderHub, anyone who could log into my Hub could clone my private repos regardless of whether they had access through the repo provider’s interface or not. The above PR is trying to dynamically forward a logged in user’s credentials to BinderHub in order to clone “as them” instead.

4 Likes

Thanks Sarah, this info is really helpful! I had not seen this PR, that use case is similar to the use case my organization has. Knowing there is a PR in progress definitely influences the path we’ll take for utilizing binder inside our publication data curation application.

2 Likes

BTW, I would be eager to push this PR forward. For now I do not understand why the handler’s current_user does not contain an auth_info key and whether this is by design (for security reasons?).

I wonder if the root cause is tied to the handler’s current_user data being retrieved (indirectly) through the /auth/authorizations/token API which never seems to include auth_state info (from a shallow look at the code) while calling the /users/foo API will? (at least when the requester is an admin, which is the case for the binderhub app?).

But that’s just a guess really, and I’d be interested in any insight :slight_smile:

3 Likes