Generating reports for Jupyter notebooks

#1

In Deploying JupyterHub for Education @lawasser made a nice description of what she’d want out of a “report generation tool” for Jupyter notebooks. I wonder if this is a specific enough and common enough issue to warrant its own thread here, in case others would like to chime in.

From her post, it sounds like there are a few things that you’d want to go from notebook -> HTML report.

  1. Export to HTML or PDF, doing the following things:
  2. Hide things in the final report. E.g., metadata in a cell should selectively remove things like:
    • The stderr
    • Image outputs
    • Code blocks
  3. Strip input/output numbers, or anything that is unique to the “interactive” session vs. just the cell contents and outputs.
  4. Do some fancier things with some cells (such as adding captions to images).
  5. Make the output pretty to look at.

What else would be important? Most of those things I think are pretty doable (the one exception being how far down the rabbit hole you’d wanna go creating new features for #4 ). Perhaps we can get a nice design spec on a tool that would be useful! I’m thinking it could be a ‘bundler extension’ which basically just means a new option under “Download as” that’d output a nicely-formatted HTML file

1 Like

#2

That would be a cool way of exposing this. I think most of the requirements can be achieved with nbconvert + a new HTML template. This means that the “hard” part is making it accessible to people who don’t know how to construct their own template and making it “configurable” through the existing notebook (classic and lab) interface.

  • Do bundler extensions work in lab as well?
  • Could we use tags as a way to expose the choices to users?
1 Like

#3

I threw up the code from that GIF into a little python module here:

Worth iterating on? It’s pretty lightweight so would be easy to extend, modify, etc.

re: your questions, I think that bundler extensions should “just work” with lab (at least, it seemed like bundlers were a way to abstract away the concept so that it wasn’t notebook-interface-dependent).

re: tags, yep that’s what I’m imagining. nbreport already uses the presence of some tags to choose what to do with cells. You could also use metadata for the cell to add stuff like caption:"my caption"

3 Likes

#4

@choldgraf just a note about how rmarkdown works and how you could keep this simple!
For jupyter notebooks, i’ve added a caption tag to the jupyte template
The code is as follows:

{% block data_png %}
<figure>
{% if cell.metadata.caption %}
<img src = "{{ output.metadata.filenames['image/png'] | path2url }}" alt = "{{ cell.metadata.caption }}">
<figcaption>{{ cell.metadata.caption }}</figcaption>
{% else %}
<img src = "{{ output.metadata.filenames['image/png'] | path2url }}">
{% endif %}
</figure>
{% endblock data_png %}

it relies on a caption metadata input. This could be build into a sweet notebook cell interface like what hide_code has to be easier to add… it produces a caption like you see on this page when built:

Here is the beauty of this all. if you publish to HTML and also allow a user to specify a css style sheet now… WALLAH you have beautiful customization at your fingertips :slight_smile:

I LOVE YOUR suggestion of a download_as implementation. so much easier than having to make a special button that might not work. I would also love to be able to run it at the CLI!
Please let me know how i can help with this effort. students would LOVE IT. and so would anyone doing reproducible work who might write a paper or just a report in this envt in the future.

0 Likes

#5

I love it! My only question is: why isn’t there a “try it on binder” badge in that repo? :smile:

2 Likes

#6

I had one last thought. One thing about rmarkdown that makes it flexible is having the yaml header at the top. I suppose on one hand this clutters up the file at the top. On the other you can easily add new element if you wish. They use that to provide a css file that will style the output html file. they also use it to add functionality like bibliographies, etc. Not sure if this is ideal vs storing things in the first cell via the metadata… but just wanted to chime in as providing the ability to customize a report using a standard approach is nice .

1 Like

#7

OK there’s a proof-of-concept Binder here: https://mybinder.org/v2/gh/choldgraf/nbreport/master?filepath=example%2Fan_example_notebook.ipynb try going to File -> Download As -> NBReport. It should generate an HTML file in the same folder as the ipynb file.

btw: thanks for the suggestion on the caption! feature added to nbreport :slight_smile:

0 Likes

#8

I guess the styling should be left to pandoc, right? What bugs me most, is that you still cannot by default tag the visibility of individual code cells in the notebook itself (yes there is an extension for it but try installing that in a singularity container … :wink: it just seems to be such a sensible thing to have.
Also, any reporting tool should make it easy to switch between html/pdf output -> R Markdown + knitr is just so darn flexible…

Would be so awesome to have :wink:

0 Likes

#9

It really is just so sad, that I cannot get a publication-quality report from a jupyter notebook as of now. To me, the main isses are:

  • no support for yaml headers (can be integrated as plain markdown, but look ugly in the notebook itself)
  • no official way of hiding input cells (the extension also only allows hiding in html, not pdf)
  • difficult (if not impossible?) to get automatically generated markdown tables (cf. Jupyter Notebook + R + nbconvert ... tables?)

In essence, I’d dream of using the notebook format for interactive development jointly with my non-coding collaborators and then just use nbconvert to get a .md file (which I can process using pandoc to whatever output format I need for submission). Cross-referencing would also be nice, but is not supported by pandoc, so probably not going to happen.

0 Likes

#10

@kkmann have you seen Jupyter Book? (jupyter.org/jupyter-book)

I think that it gets the closest to what you’re talking about. Still doesn’t support “single document” outputs, though there’s an open PR that prototypes this already!

1 Like

#11

yeah, great initiative, especially when integrated with binder. ATM bookdown for R has a few advantages from a publishing perspective (also makes it super easy to edit the entire markdown sources).

I guess, my main point is that jupyter is the go-to framework for interactive and literate programming (I do not like the R Studio markdown-based notebooks at all) but the R Markdown framework is so much more conveneint when it comes to publishing your results in a polished way. It is kind of frustrating that I cannot develop my analyses in a jupyter notebook and then just convert to .md (hiding the code ;)) and create a publication-ready article using pandoc + template.
All the bits and pieces seem to be there x)

I wasn’t sure whether I missed something, will open an issue on GitHub, maybe someone of the jupyter notebook crew is interested in pursuing this a bit; any suggestions who might be interested?

0 Likes

#12

There are definitely people interested in this sort of work. I think it’s mainly a matter of resources. Most of the relevant work is or would be in nbconvert and there’s a lot of interesting activity in this direction driven by papermill.

I’d say the main thing is sketching out a use case and what exactly the pain points are for that use case and how it could be improved. nbconvert is where I expect that issue mostly belongs, though if it’s about UI for creating the relevant metadata, then JupyterLab is probably it.

For instance, nbconvert does support filtering tags via TagRemovePreprocessor. This allows you to mark cells which:

  1. should have input or output hidden, or
  2. should be excluded entirely

However, there is not a default tag list, so you must set them yourself in nbconvert config, e.g. with

c.TagRemovePreprocessor.remove_input_tags = {'remove-input'}

nbclassic has a tags cell toolbar for managing cell tags to make adding and removing these tags more convenient.

Here’s an example notebook and its markdown output showing a cell with excluded input.

As for the yaml header, notebooks do have document-level metadata, which is editable and the logical place for such information. The question

I think nbreport is a super cool demo of what’s possible, so the question for me is to some degree, does this belong in nbconvert itself, which is a hugely powerful and complex project, or a more use-case-targeting thing like nbreport that uses nbconvert, adding a bunch of specific choices and customizations that nbconvert is built to enable.

In terms of where work should happen, it would be:

  1. nbconvert for core functionality (if any is deemed missing or needing improvement)
  2. either nbconvert or a wrapper like nbreport for the polished CLI/API
  3. jupyterlab if there’s UI work needed to help creating notebooks with the right structure
0 Likes

#13

@minrk - my thoughts are that this could/should be prototyped and documented well first, as opposed to building directly into nbconvert.

On the JupyterLab side, I found that the cell tags extension was enough for adding the necessary metadata, and nbreport (as well as jupyter-book) are basically lightweight templates paired with some nice CSS. That’s how I generated the content here: https://jupyter.org/jupyter-book/features/hiding.html

I wonder - since including YAML metadata with files is a common pattern in SSGs, if there would be value in a package similar to the cell tags package, but that exposes a block where you can add arbitrary YAML key/values. These would then be wrapped up in the notebook JSON w/ some validating, and then a little nbconvert template would ensure that anything in the yaml header would be put first in the output HTML or markdown document.

1 Like