Is there any way to allow (sensitive) data to be displayed in Jupyter on screen, but not saved to the .ipynb (when checkpoint or save)? If there is no such way at present, what is the best/easier way to implement this function? Pointers are greatly appreciated!
Here is a use case for discussion purpose: say there are sensitive data in some data source (e.g. PAI data like zipcode stored in a database), and there is a special API that a user can call to retrieve the PAI data. After retrieving the PAI data in memory, the user may inspect the data and do some aggregation. We would like the user to be able to see the data, but not to save to the .ipynb notebook.
The current notebook behavior is that the displayed data on screen is also saved in the .ipynb (in the output section of a cell in the ,ipynb json file). Thus PAI data will be stored in file, which is what we want to avoid.
For simplicity, we can assume this not-allowed-to-save-to-file behavior is at a cell level (i.e. can be controlled by some %%magic to be written). I can further assume that this cell can be cell-tagged to identify that this is a special cell.
I have taken a look at %%capture which works at a cell level. It seems to be pretty close, but I couldn’t figure how to use it to allow me to do what I need to do.
I am wondering if there is an extension hook for saving the .ipynb: upon checkpoint/saving, jupyter can call this hook to remove the output section of a cell tagged with certain cell-tag?
Thanks for the tips. Since all of the notebook code and output (except the PAI output) needed to be written out faithfully, I probably cannot use quota to deny the saving/checkpointing of the .ipynb, right?
The ContentsManager looks interesting. I am not familiar with how to work with it though. I’ll need to read through the doc in greater depth… One thing is not clear to me is where the updated code would sit? Is it like defining a new class file in a ts file parallel to manager.ts in jupyberlab/packages/services/src/manager.ts? Or could this be done as a lab extension? Thanks very much!
Oh… By that, you mean both the ContentsManager of the jupyterlab (ts code) and the backend Jupyter Server need to be updated to support this function? That would be bad news for me, as I have hard enough time to worry about either one of them…
BTW, your first line of the comment is right on, but in our specific case, we do provision a specific jupyterlab for the user and that this cannot be overridden in our env (but the user can create their own conda env to use in conjunction with this specific jlab). What we have in mind is to create a specific cell template (such that parameter tagging can be done) for user to copy to use; or with some scaffolding code to do something similar.
The ideal packaging for us is not to come up with our own patch of jupyterlab, but simply an extension (or additional code somewhere) we can install on the top of standard juypyterlab from conda forge…
If we can keep the cell tag as a guide/marker, it looks to me to be real easy to write a python prog to simply parse the .ipynb to remove the output section of the particular cells with matching cell tag. This amount to a post-processing step after the generation/saving of .ipynb. To folks like me with very limited working experience with jlab source code, if there is an invocation hook at this level (i.e. based on file level), that would be very attractive…
I think overriding the backend (jupyter-server) ContentsManager.save(model, path) in a server extension (written in Python) would do exactly this, except that it would be automatic instead of needing a separate Python program. In your custom ContentsManager you’d take the model object, modify it to remove the sensitive data, then write it to path. In theory anyway
Cool! I am more in the alley of python than js/ts. So this is great pointer and points me to the right place where I can start the experimentation. Thanks again!
Just to elaborate the detail of manics’ solution here: one can override the pre-save and post-save handler in jupyter_notebook_config.py: c.FileContentsManager.pre_save_hook and c.FileContentsManager.post_save_hook. For this issue purpose, the pre-save hook would be the more appropriate hook. (Jupyterlab Doc has details in both hooks). Thanks again to manics!