Hardening a JupyterHub deployment


#1

Hi!

I need to deploy a jupyterhub instance that will have to support a peak of 1k concurrent users. This will only be used for a dew days.

Is there a way to:

  • stop users from using import?
  • stop users from accessing magic commands?
  • set a limit to the amount of ram or cpu a single user can consume?

Also, if you have any recommendations on the resources this would require, I would really appreciate it!

PS I’m planning to deploy on GCP.


#2

I don’t think you can block import or magic commands. Even if you could there’s nothing to stop a user writing the equivalent Python code.

Since you’re using GCP you can set Kubernetes resource limits: https://zero-to-jupyterhub.readthedocs.io/en/latest/user-resources.html
You could also restrict outbound network access: https://zero-to-jupyterhub.readthedocs.io/en/latest/security.html#kubernetes-network-policies


#3

The Jupyter kernel is what’s responsible for controlling the interpretation of code, so writing a custom kernel that extends IPython would be the way to modify what code users can run. However, I would strongly recommend against attempting to go that route. It’s extremely difficult to prove that Python code can’t do what you don’t want it to, especially since it can call out to shell commands, make syscalls with ctypes, define new modules in-memory, etc. Instead, I would assume that the user can execute totally arbitrary code in the container and protect yourself against that at the deployment level. Kubernetes/docker/systemd/etc. all have resource limiting/sandboxing features that are going to be much more robust than attempting to specify a safe subset of language as general as Python. As @manics pointed out, limiting outbound network (it sounds like blocking all outbound traffic is what you want) is usually the most important way to protect against serious abuse. The second step is the standard cpu/ram limits supported by the above deployment tools, which protect you from both mischief and mistakes.