Notebook-as-PDF, save notebooks as PDFs

Notebook-as-pdf is a new Jupyter extension to save your notebooks as PDF. It combines three ideas: no page breaks (who needs pages anyway?), use chromium (byebye latex!) and attaches the original notebook to the PDF (hello reproducibility!). Try it on mybinder.org or look at the source code.

The created PDF will have as few pages as possible, in many cases only one. This is useful if you are exporting your notebook to a PDF for sharing with others who will view it on a screen.

There is an example notebook in the repo which renders to a PDF that looks like this:

To make it easier to reproduce the contents of the PDF at a later date the original notebook is attached to the PDF. Not all PDF viewers know how to deal with attachments. This mean you need to use Acrobat Reader or pdf.js to be able to get the attachment from the PDF. Preview for OSX does not know how to display/give you access to PDF attachments.

I built this because i realised a lot of people convert their notebooks to PDF for sharing by email or archiving for compliance reasons, not for printing. This means you don’t really need to have A4 pages any more. Then I learnt that you can attach files to a PDF which gave me the idea to attach the original notebook so you could later find it.

Currently it is a bit tricky to install notebook-as-pdf on Windows because of the library it uses for PDF handling. Switching to a library that is “pure python”, yet robust (PDF is a crazy format) would be great. Alternatively I am thinking of making “attach to PDF” a optional feature to. Thoughts (or code) on this would be very welcome.

Hope this is useful for you (or at least an entertaining toy) :smiley:

10 Likes

This is great!

Don’t know if it’s useful, but PyQt is hard to not have if you’re using conda, and it works fine on Windows. It contains it’s meandering root that led to khtml, etc. so it’s an alternative to rendering with Chromium but still in the same family tree. Using PyQt is pretty similar to the strategy employed by phantomjs, just with python.

The core logic I wrote to render from HTML to PDF is here: https://github.com/davclark/27DaysForms/blob/fe35c8f006ec4ea2c08d4a178dce645662fb1360/html2pdf.py

If you want it, name a license and I’ll add that to the repo. Note that the more up-to-date version is only in the 2017fall branch.

(In case you’re curious, this was to generate certificates of completion for a kind of on-line buddhish program - you can totally ignore the templating stuff and the form data fetch).

I actually think there is a reasonable approach to modern typesetting that uses a stack like this - there are REALLY good tools now for controlling layout in a browser. Feel free to pull me in if you want for a contribution or review. And thanks for this contribution!

1 Like

I’ll try out the QT based rendering. Even if it is just for laughs :slight_smile:

The problem isn’t chromium for the HTML -> PDF conversion. That seems to work well (I’ve not had any complaints yet).

The thing which is tricky is pikepdf which is used to attach the original notebook to the PDF. pikepdf is a Python interface to qpdf which is written in C++ or C. Getting those two installed on windows seems to trip some people up. There is a conda-forge package for pikepdf but last time I checked there wasn’t one for windows.

There is also the fantastic pdfrw (pure Python and robust) but for reasons the way it handles Python 2 and 3 compatibility (strings vs bytes) means I couldn’t figure out how to use it to attach a file to a PDF :frowning:

This is why I am looking for a pure Python PDF library to handle the attachment bit.