Banning libraries in a Jupyter Hub environement for increasing security

joseberlines · February 15, 2023, 9:24am

We are running a jupyter Hub in a server for several thousand people with k8s and a whole infrastructure behind that allow for authentication to access sensible data.
Is it possible to create a kind of “black list” of libraries that is checked in case any of the users would like to install in their user space libs that we think might cause potential damage or cause security bridges?

MridulS · February 15, 2023, 11:50am

How are users installing the packages? Are they using pypi/conda? One way to fix this would be to block all access to PyPI and anaconda.org, and running in an internal mirror to distribute the package. This way user would have access to only “approved” packages.

They could still get the packages if they really wanted (like directly from github) assuming they still have internet connection.

bollwyvl · February 15, 2023, 2:22pm

Yerp: the only safe computer is one not connected to the internet.

In k8s, individual pods should only be able to connect to exactly what the custodian of the infrastructure/data chooses, and a fully virtualized/containerized/buzzworded deployment should make this possible, and explicit. A policy of “no data from anywhere except the following domains” is a much better place to be starting from, but of course can’t prevent someone from hand-jamming packages in by upload once given interactive tools.

Inside the container: locking down environments is certainly possible, but a fully 700 file system, owned by some other system user than the user themselves, can’t actually be used for interactive computing, and can’t fully exclude data exfiltration.

On the client: requiring access to the hub be gated behind a VPN, with lots of logging and monitoring, etc. is a place to start, but again, users, by definition, use these tools from a computer where they can presumably run programs, hand-type malicious code, etc.

If arbitrary packages are needed, running a package proxy is the right play. Presumably this deployment is operating at the scale where one could handle running an enterprise tool (I’m not going to shill any of them here, “enterprise package management,” has enough SEO that you’d find the top couple contenders). This would allow moving this concern to the perimeter, and these tools offer caching mirrors, block/allowlists and continuous scanning (given subscriptions). Even failing that, a plain-old-proxy would potentially do the job, but would take a lot more work.

Topic		Replies	Views
Running Juyter environments in secure environments / prisons General	10	115	August 13, 2024
How to install packeges (Python, R libraries) globally on the jupyterhub server (z2jh with kubes) so that all notebook users have access to them and can import and use them? Zero to JupyterHub on Kubernetes community , jupyterhub	4	1912	January 22, 2022
About the General category General	4	2603	May 6, 2024
Restict Users Terminal/CMD Privleges JupyterHub	3	60	June 17, 2025
How to disable Shell access to the Jupyter Notebook docker images? JupyterHub how-to	0	987	February 25, 2020

Banning libraries in a Jupyter Hub environement for increasing security

Related topics