How to allow sensitive data to be displayed on screen but not saved to file?

hikemtshasta · November 10, 2021, 8:52pm

Is there any way to allow (sensitive) data to be displayed in Jupyter on screen, but not saved to the .ipynb (when checkpoint or save)? If there is no such way at present, what is the best/easier way to implement this function? Pointers are greatly appreciated!

Here is a use case for discussion purpose: say there are sensitive data in some data source (e.g. PAI data like zipcode stored in a database), and there is a special API that a user can call to retrieve the PAI data. After retrieving the PAI data in memory, the user may inspect the data and do some aggregation. We would like the user to be able to see the data, but not to save to the .ipynb notebook.

The current notebook behavior is that the displayed data on screen is also saved in the .ipynb (in the output section of a cell in the ,ipynb json file). Thus PAI data will be stored in file, which is what we want to avoid.

For simplicity, we can assume this not-allowed-to-save-to-file behavior is at a cell level (i.e. can be controlled by some %%magic to be written). I can further assume that this cell can be cell-tagged to identify that this is a special cell.

I have taken a look at %%capture which works at a cell level. It seems to be pretty close, but I couldn’t figure how to use it to allow me to do what I need to do.

I am wondering if there is an extension hook for saving the .ipynb: upon checkpoint/saving, jupyter can call this hook to remove the output section of a cell tagged with certain cell-tag?

Any ideas are appreciated. Thanks in advance!

manics · November 10, 2021, 11:22pm

I found a related idea in:

You could try overriding the ContentsManager which handles saving/loading notebooks:
https://jupyter-server.readthedocs.io/en/latest/developers/contents.html#writing-a-custom-contentsmanager

bollwyvl · November 10, 2021, 11:48pm

Of course all of this goes out the window if the end user can decide what packages/extensions to use!

The ContentsManager on both the backend and frontend could be patched to clobber the offending data.
Inside a kernel, one could have, say, a context manager that patched output/*/metadata, e.g. with some_metadata_thing(pai=True):
- setting cell metadata is rather harder from the kernel itself

hikemtshasta · November 11, 2021, 7:24am

Thanks for the tips. Since all of the notebook code and output (except the PAI output) needed to be written out faithfully, I probably cannot use quota to deny the saving/checkpointing of the .ipynb, right?

The ContentsManager looks interesting. I am not familiar with how to work with it though. I’ll need to read through the doc in greater depth… One thing is not clear to me is where the updated code would sit? Is it like defining a new class file in a ts file parallel to manager.ts in jupyberlab/packages/services/src/manager.ts? Or could this be done as a lab extension? Thanks very much!

hikemtshasta · November 11, 2021, 7:44am

Oh… By that, you mean both the ContentsManager of the jupyterlab (ts code) and the backend Jupyter Server need to be updated to support this function? That would be bad news for me, as I have hard enough time to worry about either one of them…

BTW, your first line of the comment is right on, but in our specific case, we do provision a specific jupyterlab for the user and that this cannot be overridden in our env (but the user can create their own conda env to use in conjunction with this specific jlab). What we have in mind is to create a specific cell template (such that parameter tagging can be done) for user to copy to use; or with some scaffolding code to do something similar.

The ideal packaging for us is not to come up with our own patch of jupyterlab, but simply an extension (or additional code somewhere) we can install on the top of standard juypyterlab from conda forge…

If we can keep the cell tag as a guide/marker, it looks to me to be real easy to write a python prog to simply parse the .ipynb to remove the output section of the particular cells with matching cell tag. This amount to a post-processing step after the generation/saving of .ipynb. To folks like me with very limited working experience with jlab source code, if there is an invocation hook at this level (i.e. based on file level), that would be very attractive…

Thanks very much again for your comments.

manics · November 11, 2021, 9:21am

I think overriding the backend (jupyter-server) ContentsManager.save(model, path) in a server extension (written in Python) would do exactly this, except that it would be automatic instead of needing a separate Python program. In your custom ContentsManager you’d take the model object, modify it to remove the sensitive data, then write it to path. In theory anyway

hikemtshasta · November 11, 2021, 8:25pm

Cool! I am more in the alley of python than js/ts. So this is great pointer and points me to the right place where I can start the experimentation. Thanks again!

hikemtshasta · December 2, 2021, 12:05am

Just to elaborate the detail of manics’ solution here: one can override the pre-save and post-save handler in jupyter_notebook_config.py: c.FileContentsManager.pre_save_hook and c.FileContentsManager.post_save_hook. For this issue purpose, the pre-save hook would be the more appropriate hook. (Jupyterlab Doc has details in both hooks). Thanks again to manics!

Topic		Replies	Views
Jupyterlab save notebook without cells output JupyterLab jupyterlab , how-to	1	1106	October 31, 2024
How to save Jupyter notebook via API? Notebook	3	1670	December 3, 2019
Is there a way to save the output of a cell? JupyterLab	12	93604	March 21, 2024
[Request for Implementations] Disabling downloads from a JupyterHub JupyterHub	3	3153	April 24, 2024
Having a JuptyerLab Extension force a notebook to save JupyterLab help-wanted	2	926	November 3, 2020

How to allow sensitive data to be displayed on screen but not saved to file?

Related topics