Hi, I’m working on a solution for incrementally executing AI-generated code in as-sandboxed-as-reasonable environments. I’m new to developing with the Jupyter suite of components, so I would appreciate any pointers in the right direction.
The solution is similar to the Advanced Data Analysis feature in ChatGPT where users can upload their documents and the AI generates & executes code to answer the user’s question. High level requirements are:
- CPU and memory are governed and segregated from the main application
- Code executed in the kernel can only access the files provided by the user - it cannot access other users’ files.
- Users are non-technical and are interfacing with a chat-based web application (not a notebook), so code and files are submitted in a headless incremental fashion
- This is in an enterprise setting, so executing arbitrary code client-side is not welcome
Enterprise Gateway in kubernetes seems like it checks a lot of boxes since the kernels run in their own containers, and volume mounts can be customized to segregate files on a per-kernel basis. I fired it up to test it out, and out of the box I find the kernel is not very contained. e.g. I can execute code in the kernel to list its env vars, use that info to connect back to EG REST API, and then discover/connect to other kernels. This unfortunately doesn’t pass the “as-sandboxed-as-reasonable” test.
Doing my research, I see there are a wide range of configuration options for EG, and I see other related offerings like gateway_provisioners and Kernel Gateway. I’m a little lost in assessing all the options and determining which tool is best-positioned for this use case. And maybe I’m totally missing something Any pointers or help thinking through this is appreciated. Thank you!