Plans on bringing RTC to jupyterhub

Hi,

are there already plans to bring the new real time collaboration feature from jupyterlab to jupyterhub?

6 Likes

Hi all,

Just to add +1 here. I’m excited to see RTC will be available in JupyterLab 3.1 and have been wondering how to enable it in our JupyterHub deployment on GKE.

I saw enabling RTC requires JupyterLab being launched with the --collaborative option, and spent some time trying to understand how this might be configured in JupyterHub, but I got a bit lost. Any advice much appreciated :slight_smile:

Thanks,
Alistair

1 Like

It’s definitely on the wish list! Unfortunately it’s not as easy as it might first seem. You can configure JupyterHub to launch jupyterlab c.Spawner.cmd = ['jupyter-labhub'] and pass additional arguments c.Spawner.args = ['--collaborative'] but the problem is how do you safely share a link?

In https://jupyterhub.example.org/user/user-1/lab/tree/foo.ipynb only user-1 can access anything under https://jupyterhub.example.org/user/user-1/, and this is enforced by JupyterHub, so we’d need a way to provide RTC access to one server without giving someone access to everyone’s servers.

Thinking out loud, maybe it’d be doable with a third-party hub extension/service that handles just RTC- and a JupyterLab plugin that communicates with this service?

1 Like

With the current implementation you get out of the box: you don’t. It’s all working on MyBinder, but everybody’s anonymous, and you have to trust everybody (and the MyBinder federation), but an in-house Binder is probably going to Just Work today, if you’ve already figured out the other abstractions.

As mentioned over here, some classes of solutions to connecting the UI of n spawned servers:

  • a matched pair of…
    • a Hub service that implements/wraps “the other end” of…
    • a labextension that provides IDocumentProviderFactory pointing at something other than /lab/api/yjs installed in every spawned server
  • a matched pair of…
    • some backing store which can store the stuff, and can be found/connected to/authenticated by the environment variables/request headers available to…
    • a serverextension that replaces the as-shipped YjsEchoWebSocket on /lab/api/yjs installed in every spawned server

While the latter option is attractive as it might not require any custom JS/TS/WebPack, the user experience might be rather lacking from what people would expect.

My gut feel is doing this without also offering out-of-document, searchable, in-Lab chat is probably not going to feel very good… as mentioned over there, being able to back yjs with an embeddable XMPP web client and known-good XMPP server (e.g. the MMORPG/telecon-grade ejabberd… your NIC will give up before it does) would be a very strong play. Oh, and look, if done right, you’d get self-hosted video chat, too, with jupyterlab-videochat (once the rooms there are also extensible, a WIP) given a self-hosted Jitsi, which could then (weirdly) power your own private virtual world.

Even given those, this is not going to magically solve the rest of the problems that sysadmins will have to deal with at scale, e.g. role-based, fine-grained permissions at the folder/document/cell/line, integration with DVCS/CI, logging/auditing, data spill remediation, but having a demonstration of a way this could work is important for getting to those next steps.

2 Likes

With the access:servers scope in JupyterHub 2.0, sharing a link to your server can work, as the deployment can choose to grant collections of users access to each other’s servers, e.g. access:servers!group=students to grant access to student servers. There isn’t yet a mechanism for users to grant access to their own servers, so it has to be done at the admin-level, but if you know you want e.g. group-level sharing, that can work. Without teaching the collaboration about JupyterHub auth, this means that you are granting other folks full permissions to act ‘as you’ with the running server, but that’s okay in limited circumstances, and enough to get off the ground for early adopters to try things out.

2 Likes

Thanks @minrk for the pointers! I’ve put together a quick example:

1 Like

At the moment roles are defined in the config file at startup, and can’t be changed whilst JupyterHub is running. If support for changing roles at runtime was added then a separate Hub service running with admin scope could take care of adding an appropriate role for only the requesting user.

Without that you can still get quite close to giving users control of who else can access their server if you know the list of users in advance. You can create a set of groups in advance, whose membership can be modified at runtime:

List of users defined at startup:

allowed_users = [
  'user-1',
  'user-2',
  'user-3',
]

Iterate through list of users…

load_groups = {}
load_roles = []
for user in allowed_users:

… create a group rtc-access-{user} that will store the list of other users who will have access to this user’s server, initially empty:

    access_group_name = f'rtc-access-{user}'
    load_groups[access_group_name] = []

… create a role that allows access to this user’s server (access:servers!user={user}), and assign that role to the above group:

    load_roles.append({
        'name': access_group_name,
        'description': f'RTC access to {user}',
        'scopes': [f'access:servers!user={user}'],
        'groups': [access_group_name],
    })

… create a role that allows this user to manage the rtc-access-{user} group (groups!group={access_group_name})

    manage_name = f'rtc-manage-{user}'
    load_roles.append({
        'name': manage_name,
        'description': f'Manage users in group {access_group_name}',
        'scopes': [f'groups!group={access_group_name}'],
        'users': [user],
    })
c.Authenticator.allowed_users = set(allowed_users)
c.JupyterHub.load_groups = load_groups
c.JupyterHub.load_roles = load_roles

This should allow a user to add other users to the rtc-access-{user} group through the POST /groups/rtc-access-{user}/users/ API endpoint using their own token.

1 Like

Hi @manics, thanks for creating the blueprint here. I tried to expand on it to allow JupyterHubs with non-predefined list of users (e.g. OAuthenticator login with your institutional email) and created:

  • a wrapper script that runs jupyterhub as asyncio task and restarts it when new users are detected
  • jupyterhub-config.py that creates a sharing group and role for each user
  • JupyterLab extension that allows to edit sharing group via API calls

This is how it looks in practice: JupyterHub_RTC_collaboration

Code is available here: GitHub - ktaletsk/jupyterhub-rtc-config-wrapper: Example JupyterHub deployment with auto-generating server sharing permissions

5 Likes

This is neat, and should be enough for a start. Do I understand correctly though that this gives other users full server access, including abilities to share/unshare/stop, etc?

JupyterHub’s RBAC can only control whether or not someone has access to your singleuser server. For more granular control of what someone can do inside the singleuser server it needs its own permissions system. There’s an ongoing discussion here:

2 Likes

Quickly reporting here about how I decided to support RTC on my group’s jupyterhub (small instance, trusted users).

  1. I create a shared folder (volume really) accessible by all users (same mounted volume).
  2. I create a shared hub user and override the roles as follows:
     c.JupyterHub.load_roles = [
         {
             "name": "user",
             "description": "Allow users to access the shared server in addition to default perms",
             "scopes": ["self", "access:servers!user=shared"],
         }
     ]
    
  3. I provide the users the following instructions.

    Collaboration workflow

    1. Find someone with whom you want to edit a file
    2. Copy the file and all dependencies into a subfolder of the shared folder in your home.
    3. Go to a URL https://<hub_url>/user/shared/workspaces/<your_team_name> together with your friend.
    4. Edit collaboratively
    5. Once done, copy the files back from your shared folder.

    IMPORTANT: DON’T USE SHARED FOLDER AS THE PRIMARY STORAGE LOCATION!!! THIS IS A BAD IDEA AND IT WILL HURT YOU


Seems to work well except for a weird bug: after restarting the hub the users can’t start their own servers with the following error


Deleting the user and creating the user again removes the error.

4 Likes

Wonderful, thanks for testing! I’ll see if I can track down the permission subset issue.

2 Likes

A warning for those stumbling on this topic: we’ve disabled RTC in our user containers because of losing data, see issues below. We plan to reenable it once the issues are resolved.

2 Likes

These issues should be fixed once we save from the back-end, see this PR:

3 Likes

Yep, already subscribed to the notifications! I just thought that a warning here is appropriate since the issues are nondeterministic and it took us some time to even realize what is broken.

4 Likes

It’s been some time since I last checked the status of RTC, and since it’s spread over multiple repos it’s hard to track. What is the current state of RTC? What is the best way to track it? Is it reliable (can be used without data loss) in the versions that are released now or is there a milestone where it is expected to happen?

1 Like

There was a lot of work on it and user reports confirm that it works much better in alpha version of JupyterLab 4.0: Real-time collaborative editing causes notebook duplication - race condition. · Issue #12996 · jupyterlab/jupyterlab · GitHub but many of these changes were breaking public APIs and not backported to 3.x, so the milestone to track is 4.0. 4.0 is pretty far in terms of alpha cycle, so I would suggest giving it a try to help flag any issues which might have been missed.

4 Likes

I believe that the relevant RTC fixes were backported to jupyterlab 3.6. I’ve given the rc0 a shot, and it seems to work well at a glance. It’s nice that the user identities are read from the hub.

I also have a couple of questions:

  • Is there a way to make permissions dynamic or more granular now?
  • RTC doesn’t seem to work with regular text documents, e.g. .md: the cursors don’t show up and every time someone else edits, they jump to the start. Is this a known issue?
  • I remember seeing that because there’s a backend yjs representation of the document, the outputs produced while jupyterlab is closed should still show up in the notebook. That, however, doesn’t seem to work, is this supposed to be the case?

I’m actually working on a demo of that right now, but the custom scopes in JupyterHub 3 plus the Authorizer API in jupyter-server 2 enable this (the granular part, at least).

The gist is:

  1. JupyterHub IdentityProvider that grabs the currently authorized scopes
  2. define the custom scopes for jupyter-server in jupyterhub config
  3. JupyterHub Authorizer that resolves authorization based on the custom scopes
  4. via roles, assign custom scopes to users as appropriate

I’m tantalizingly close to having the base implementations of these APIs (not the custom scopes part, but basic Authorizer+IdentityProvider) for JupyterHub in https://github.com/jupyterhub/jupyterhub/pull/3888. A lightweight demo of read-only is available here, but I’l have a finished on in that PR when it’s ready.

I think it’s a valid and open question how much of this (knowledge of specific jupyter-server scopes) belongs in JupyterHub vs extensions/custom deployments. It’s an awful lot more work to say “Write and enable a jupyter-server Authorizer and define all the custom scopes” than to say “grant X read-only access to Y”. I think with the level of maturity of jupyter-server 2, it’s not in scope to be baked into JupyterHub yet (how do we handle new/updating scopes?), but it will be when we have a better understanding of how folks want to use it.

For the ‘dynamic’ bit, we’ll need user-defined roles, so that assignments can change over time. Adding manage_roles feature by vladfreeze · Pull Request #4050 · jupyterhub/jupyterhub · GitHub is a big step in that direction, but we still need to be able to reconcile roles from config with roles defined/assigned at runtime. That’s essentially my next big thing to tackle once I’ve landed the server extension.

ydoc makes this feasible, but it doesn’t mean it’s implemented. The mapping of message->output still happens in the (web) client. The Python server would need to take over the mapping of message to document transaction (essentially implement the output part of the notebook+kernel model itself). It has all the information now - the open document, access to all the messages, but implementing the logic there is still a substantial task.

4 Likes

For anyone following along, there’s another bug in RTC: