Notebook-as-PDF, save notebooks as PDFs

Notebook-as-pdf is a new Jupyter extension to save your notebooks as PDF. It combines three ideas: no page breaks (who needs pages anyway?), use chromium (byebye latex!) and attaches the original notebook to the PDF (hello reproducibility!). Try it on mybinder.org or look at the source code.

The created PDF will have as few pages as possible, in many cases only one. This is useful if you are exporting your notebook to a PDF for sharing with others who will view it on a screen.

There is an example notebook in the repo which renders to a PDF that looks like this:

To make it easier to reproduce the contents of the PDF at a later date the original notebook is attached to the PDF. Not all PDF viewers know how to deal with attachments. This mean you need to use Acrobat Reader or pdf.js to be able to get the attachment from the PDF. Preview for OSX does not know how to display/give you access to PDF attachments.

I built this because i realised a lot of people convert their notebooks to PDF for sharing by email or archiving for compliance reasons, not for printing. This means you donā€™t really need to have A4 pages any more. Then I learnt that you can attach files to a PDF which gave me the idea to attach the original notebook so you could later find it.

Currently it is a bit tricky to install notebook-as-pdf on Windows because of the library it uses for PDF handling. Switching to a library that is ā€œpure pythonā€, yet robust (PDF is a crazy format) would be great. Alternatively I am thinking of making ā€œattach to PDFā€ a optional feature to. Thoughts (or code) on this would be very welcome.

Hope this is useful for you (or at least an entertaining toy) :smiley:

9 Likes

This is great!

Donā€™t know if itā€™s useful, but PyQt is hard to not have if youā€™re using conda, and it works fine on Windows. It contains itā€™s meandering root that led to khtml, etc. so itā€™s an alternative to rendering with Chromium but still in the same family tree. Using PyQt is pretty similar to the strategy employed by phantomjs, just with python.

The core logic I wrote to render from HTML to PDF is here: https://github.com/davclark/27DaysForms/blob/fe35c8f006ec4ea2c08d4a178dce645662fb1360/html2pdf.py

If you want it, name a license and Iā€™ll add that to the repo. Note that the more up-to-date version is only in the 2017fall branch.

(In case youā€™re curious, this was to generate certificates of completion for a kind of on-line buddhish program - you can totally ignore the templating stuff and the form data fetch).

I actually think there is a reasonable approach to modern typesetting that uses a stack like this - there are REALLY good tools now for controlling layout in a browser. Feel free to pull me in if you want for a contribution or review. And thanks for this contribution!

1 Like

Iā€™ll try out the QT based rendering. Even if it is just for laughs :slight_smile:

The problem isnā€™t chromium for the HTML -> PDF conversion. That seems to work well (Iā€™ve not had any complaints yet).

The thing which is tricky is pikepdf which is used to attach the original notebook to the PDF. pikepdf is a Python interface to qpdf which is written in C++ or C. Getting those two installed on windows seems to trip some people up. There is a conda-forge package for pikepdf but last time I checked there wasnā€™t one for windows.

There is also the fantastic pdfrw (pure Python and robust) but for reasons the way it handles Python 2 and 3 compatibility (strings vs bytes) means I couldnā€™t figure out how to use it to attach a file to a PDF :frowning:

This is why I am looking for a pure Python PDF library to handle the attachment bit.

The extension has received a few updates in the mean time.

The interesting ones:

  • we generate a table of contents in the PDF based on the main headings in the notebook
  • installing on windows should be easier
  • works better on linux/in CI systems

You can try it here. or just look at the example PDF.

1 Like

Hereā€™s complementary approach for people who are using myst_nb/sphinx. It provides hard pagebreaks, custom headers and margin sizing along with the myst cross-referencing etc.

https://github.com/eoas-ubc/paged_html_theme

demo: https://phaustin.github.io/paged_html_theme/

1 Like

There is a new version of notebook-as-pdf :tada:

Three things worth mentioning:

  1. notebooks arenā€™t executed twice when converting them
  2. the original notebook file is attached to the PDF via PyPDF2
  3. the PDF now contains level 2 headings in its table of contents, as well as level 1 headings

As usual, give it a test drive on mybinder.org and if you find bugs or have ideas what could be improved: create an issue :grinning:

1 Like

Hi, Iā€™m trying to convert my Notebook as PDF in Visual Studio Code. I was able to install and use your extension but I cannot open the pdf file created as I get ā€˜There was an error opening this document. The root object is missing or invalidā€™.
While converting the file I get no error, just a warning:
[NbConvertApp] WARNING | Alternative text is missing on 39 image(s).
Do you know what might be the reason?
Thank you