And, if we set it up so that any time a user requests a ‘workspace’ aka ‘server’, the following happens:
2 containers get spun up - ideally as 2 separate pods (rather than 2 containers in a pod)
- Pod A: Lightweight, Linux desktop environment with limited tools and a modern browser, wrapped in Guacamole.
- Pod B: The usual single user server with a choice of interfaces running on customisable compute, etc.
- Pod A and Pod B can potentially be scheduled to run on different node pools with different resources, etc.
Pod B is only accessible from Pod A - enforced through network policy ?Calico.
Persistent storage is disabled on Pod A so that even if the user downloads data from JupyterLab served via Pod B - this ends up in Pod A which is entirely ephemeral and is lost when the user has to shutdown the server to access a different workspace. Pod A (Guacamole) prevents removal of data from the desktop environment.
Pod B gets its persistence storage through PV/PVC, in our case Azure File storage.
This prevents moving data from one workspace to another. Let’s discuss when we talk on Wednesday.
Let’s drag one more person to the party
@yuvipanda - I have been looking at this → [Request for Implementations] Disabling downloads from a JupyterHub in the context of this → Secure data environment for NHS health and social care data - policy guidelines - GOV.UK.
Will be great to get your thoughts. NHS England has budgeted £100m for Secure Data Environments and seeing that a major chunk of data science happens within the Jupyter universe, will be good to discover what is possible and what has already been done. I love z2jh and will be great to see what others are doing in this space.