I’m new to this and just trying to figure out how to set up JupyterHub with per-user subdomains and suchlike.
Two important questions: are we able to restrict notebooks in single-user instances to authenticated users? If so, can we limit the type of access?
In certain environments, the use case for JupyterHub would require either no sharing of notebooks or the ability for a user to set any notebook to be shared only to specific users. Beyond simple yes-no access, I’d expect four levels of access to be possible:
Read: A user can read the notebook (with cached output from prior run by a privileged user?)
Read and Run: A user can read and execute the notebook
Read-Write: A user can read and edit the notebook, but not execute it
Read-Write-Run: A user can edit and execute the notebook, a dangerous and highly-trusted permission
This would be useful for notes regarding highly-controlled information such as trade secrets. In the most stringent environment, two separate JupyterHub installations would be created, with a sticky banner at the top indicating in large, friendly letters which instance the user is in (“NO PROPRIETARY INFORMATION” vs “PROPRIETARY INFORMATION, DO NOT SHARE”) and a link to the opposite instance for fast switching; those considerations are on the administrator’s side, and JupyterHub is already fully capable of that kind of labeling.
JupyterHub is attractive in these situations because users have tendencies to take notes, whether that be in Notepad, in a word processor, on a network share, on cloud services such as Microsoft 365, or even in some cases making notes about confidential information on their own private Google docs account—to which they keep access even after leaving the company! In 15 years of information security experience, I’ve seen it all. I’ve seen notes taken in gmail and saved as drafts. This is behavioral, and a centralized notebook application would be an excellent solution, hence my interests in what kind of access control is currently available.
Note that this is all-or-nothing for a given singleuser server, you can’t share a server as read-only.
For finer-grained control you’ll probably need new permissions at the singleuser server level. There’s some discussion in
Short term, using operating system permissions (with extended attributes if necessary) is probably your best option to control read/write. Preventing execution isn’t possible here, since if someone has read access they can always copy the notebook to obtain a writeable version, that they can execute and save.
If your use-case is to present information have a look at some of the dashboarding or presentation frameworks for notebooks, e.g. Voila and others that I can’t think of right now…
Interesting. Yeah I was thinking more along the lines of users wanting to share notebooks, but not letting them share with everyone. Think about if you have legal or policy obligations to restrict information to need-to-know, and two people in the company are sharing notes on a shared project. You don’t want person A on projects X and Y to share notebook X with person B on project X, and share notebook Y with person C on project Y, but have all three able to access both notebooks X and Y.
As to running a notebook, someone can copy it to their own single-user server, which may be in a Docker container via DockerSpawner and may have access to different data (e.g. you can import a separate Python script that contains credentials instead of entering them into the notebook). That the user doesn’t have execution access to any notebook would protect against arbitrary execution, but as soon as you give them any execution access to any notebook in your environment they obviously have access to your entire environment anyway.
I’ll have a look in the other thread. This isn’t necessarily about RTC, but when two users have write access it most certainly is about RTC.
Standard file system permissions with users and groups already handles some of this, so I think it’d be worthwhile to write down the additional Jupyter requirements.
There’s an abstraction around the “filesystem”: https://jupyter-server.readthedocs.io/en/latest/developers/contents.html
The default is to use the local filesystem but you can implement a remote ContentsManager with custom permissions. For example if you used an S3 object store you could give each singleuser server a set of API credentials tied to the user, and set fine grained access permissions on each object.
This still doesn’t prevent someone downloading a notebook (or viewing and copying the raw JSON), and sending it to someone else. For that you’ll probably need another layer e.g. a virtual desktop interface that prevents downloading to the local machine… but a user can still take a screenshot.