I want to provide jupyterhub users with some proprietary modules in jupyter notebooks. However, one can use
import inspect to view the source code. How can I prevent that?
I have seen in Quantopian notebooks that
import inspect is forbidden. How is blocking an import statement like this implemented?
Also, what’s the best way to provide the modules without having an explicit directory in the notebook server of a user? I’d imagine something to do with adding source code to the docker image. I’d appreciate if someone could point me in the right direction.
What you’re looking for is called “obfuscation”.
PyArmor only mentions scripts, not modules: https://pypi.org/project/pyarmor/
PyMinifier also talks about executables: https://liftoff.github.io/pyminifier/
There’s an SO question that goes into your direction:
But you should be aware that there’s no way to stop a determined person from decompiling your Python code, no matter how much you obfuscate it:
Thanks, @rolweber. After going through these links, I better understand how code protection works. I don’t think code obfuscation is worth doing for my case.
I understand that the source code of the modules and/or models I provide in a jupyterhub notebook can be accessed. I just wanted to limit it’s exposure to the APIs.
Just hit this conundrum, I presume the answer still holds?
My situation is a custom python package that I don’t really want to give away the secret sauce
As long as python is still python, there just isn’t going to be a good way to obfuscate it. Everything in the SO posts @rolweber posted above still applies:
The only semi-functional option is to rewrite the sensitive parts of your python sources as a C extension.
Better though would be to really question if you need the obfuscation in the first place
Given the “arbitrary code execution is the whole point” nature of Jupyter and Python, this is very hard to do robustly. So it depends a bit on whether you want to make it hard/inconvenient for who you know your users to be, vs truly not technically feasible. If it really is important to you, especially in the context of JupyterHub, I would recommend separating it entirely:
Run your proprietary code somewhere else and talk to it via an API (rest, rpc, mounted socket, whatever). Then your ‘client’ can be accessible, but it’s not super interesting anymore. Data transfer/serialization can be an issue, depending on what your code does, but then the boundaries are at least well understood.