Papermill 2.0 and NBClient 0.1 Releases!

A huge amount of work across dozens of developers went out between yesterday and today. I’m happy to announce papermill 2.0 is now available and a new library in jupyter, nbclient 0.1, was released. Papermill 2.0 uses nbclient now instead of nbconvert as it’s default execution dependency.

NBClient

I’ll start with the new nbclient library. The library is an extract of nbconvert’s ExecutePreprocessor into an isolated library. In the first release it mirrors the original execution capabilities and arguments as the ExecutePreprocessor, with a slightly cleaner interface (see the docs link above for how to use). The name of the package is meant to mirror jupyter_client, which is a manager of a kernel, as a manager of a notebook. It captures how to setup a jupyter_client connection and execute a notebook with that connection. The intention is to have a core library which can be iterated more quickly and developed independently of the larger nbconvert library. In relation to papermill, nbclient is a low-opinion in-memory notebook execution library with all the low-level primitives for cell execution in-place, whereas papermill is a high-opinion, more specialized extension of nbclient with plug-in play options for various io, modifications, and execution patterns. When deciding what library to use, I would recommend nbclient when you’re developing a new notebook execution interface (UI, toolchain, etc) which has opions that differ from papermill or are already established with existing tooling that doesn’t mesh with papermill’s interface since nbclient is lower-level. If you’re unsure, or just trying to execute a notebook and don’t have pre-existing opinions on the subject, I’d use papermill first.

Ok now to papermill’s changes!

Papermill 2.0.0

Papermill 2.0 has a number of awesome features from many different contributors. We used the major version change mostly to signify the change to Python 3 only, but we also allowed for PRs which has small interaction changes to also be made. No major functionality should change with this release, but many minor improvements might impact specific execution patterns. We’ll keep an eye on issues and post bug fixes ASAP if any of these cause larger unexpected issues.

Features

  • Papermill is now Python 3.5+ only!

  • nbconvert is no longer a dependency of papermill, instead the smaller and newly released nbclient is now the execution dependency.

  • Support added for parameterizing C# kernels

  • Support added for parameterizing F# kernels

  • sys.exit(0) now respected by papermill

  • Python parameters are now black formatted (in python versions >= 3.6)

  • Notebook documents are saved periodically now rather than solely on cell completion.

  • A cell --execute-timeout option was added.

  • HDFS io support added with hdfs:// scheme (with papermill[hdfs] install).

Fixes

  • Fixed metadata writing on markdown and raw cells to follow v4.4 schema correctly

  • Azure Blob Storage support fixed for newer blob storage. azure-storage-blob >= 12.1.0 is now supported, older version support was dropped.

  • IOPub timeouts now raise an exception instead of a warning.

Interaction Changes (more details)

  • nbconvert dependency has been replaced with nbclient. This means the default engine is now nbclient rather than nbconvert and the NBConvertEngine class no longer exists. This may mean extensions that extended this class will need to be updated slightly to the new class.

  • sys.exit(0) in python kernels now transfers exit code to papermill, meaning papermill will gracefully stop the notebook execution and not raise an exception to the user. sys.exit(1) or other exceptions still raise as expected and change the status code from 0 for the papermill process.

  • When generating parameters for python (when running on 3.6+) the parameters will be printed more cleanly with a pass of black before injecting into the notebook.

  • Older Azure Blob Storage support was dropped: azure-storage-blob < 12.1.0

  • The --autosave-cell-every option now controls the minimum time between notebook saves during cell execution. This time will exponentially backoff if it takes more than 25% of the autosave-cell-every value. Setting --autosave-cell-every to 0 disabled this feature.

  • The --execute-timeout option can be set to enable a per-cell execution timeout limit.

  • IOPub timeouts used to only warn and attempt to continue execution. This can be triggered by printing ‘0’ in a wide for-loop without any sleeps. The side-effect of best-effort execution was that outputs and failures could be lost in the IOPub timeout event and notebooks would “succeed” when they were actually failing. We chose to change this pattern from a warning to an error for papermill. To fix the issue when it occurs you need to delay the number of print or display messages per second being produced in your notebooks.

Please open issue if you run into anything unexpected with the large changelist. We hope you enjoy all the new capabilities.

Best,
Jupyter and nteract teams

8 Likes

This is excellent work - thanks all!

1 Like