I need to preprocess code executed by the user in the Jupyter Notebook to make sure that it doesn’t contain any malicious code. How do I approach that?
I’m wondering if there is any “code execution hook” available for extensions, so I can interject code execution on Jupyter Lab Server before sending it to Kernel. Or maybe there is some other approach how to achieve that.
Others may disagree with me, but fundamentally: with remote-code-execution-as-a-service, you can’t substantively change what folk can do with it beyond changing what the underlying server and kernel process can do. There are just too many ways to circumvent any countermeasures.
Some things you can do:
- run the server/kernel process in a more isolated fashion
- e.g. chroots, docker, vms, another computer, or all of the above
- these can still be escaped by a dedicated attacker, but if it must be…
- run everything against a read-only file system
- this makes many useful things fail
- run everything in the client’s browser e.g. jupyterlite
- you can expose some features behind, e.g. a REST API
- carries lots of its own issues
So if we want to process user code before the execution, we have basically two options:
- Update Jypyter Server to process the code before sending it to the kernel
- OR Update each kernel that we are using to do code preprocessing before executing it
Is my understanding correct?
A product that ships such countermeasures will be more frustrating for non-malicious users, and not actually offer better protection from the other kind of users. With only the static analysis approaches one can code up in a feasible amount of time, even a lazy attacker will still find a way to circumvent in a couple minutes with automated tools.
I’m suggesting if a product allows folks to use kernels in production, it needs a defense-in-depth strategy that isolates untrusted arbitrary code execution.