Hello,
I am looking for recommendations or best practices on how to safely do exports of arbitrary user-submitted notebooks to HTML. I am going to be accepting user-uploaded notebooks and displaying rendered versions in my web application.
(I have previously opened an associated issue in the nbconvert repository, but have so far not received any response. I think any relevant information could also be helpful to include in nbconvert’s documentation.)
I have seen that there is a sanitize-html
/ should_sanitize_html
option when using the HTML exporter. My understanding from looking at the code is that cells are run through the clean_html
filter. Some questions:
- What are the safety implications if just using default settings and not using
should_sanitize_html
? - How should I understand this filter’s level of safety in a broader context?
- This doesn’t appear customizable (in an obvious way—I guess some of these allow lists could be monkeypatched?). Should this not be customized? From trying to use it, it seems like paragraph and header tags are not allowed, which seems to break fairly basic markdown formatting in notebooks.
- Are there other basic vulnerabilities to watch out for that using the sanitize option doesn’t address?
One notable model for rendering user-uploaded notebooks is GitHub. I understand that GitHub does some kind of cleaning or places restrictions on the rendering, but I haven’t been able to find details or code about what that actually is. If anyone knows of a reference about this, that would also be very helpful.
Thank you in advance!