Auto-generating ipynb files for documentation with sphinx-gallery and jupytext

choldgraf · September 18, 2019, 4:14pm

I just heard from @mwouts about this interesting patch/deploy of sphinx-gallery for building Binder links in Sphinx documentation. Here’s the blog post about it:

tl;dr

Sphinx-Gallery generates a collection of .ipynb files for each example and creates a Binder link to point to it. However, this has the problem that you must host the documentation on gh-pages, not readthedocs. The PlasmaPy folks patch decided to utilize Jupytext so that, in Binder, their sphinx-gallery .py files are automatically converted to ipynb when the Binder session launches. This lets you host the documentation on readthedocs!

betatim · September 19, 2019, 5:08am

Looks interesting. Could you explain a bit why without this patch you have to host things on GH pages? I thought the way that sphinx-gallery works is to generate the notebooks when you build the HTML and then have a link to the repo in the “launch” badge at the bottom. Meaning where you host the built output shouldn’t matter?

Advert: scikit-learn now uses mybinder.org for their examples https://scikit-learn.org/dev/auto_examples/plot_anomaly_comparison.html#sphx-glr-auto-examples-plot-anomaly-comparison-py via sphinx-gallery (of course).

mwouts · September 19, 2019, 7:49pm

I’d like to add that the PlasmaPy gallery relies on Jupytext’s capacity to open Sphinx Gallery scripts directly as notebooks in Jupyter. No .ipynb file is created at any point in this process. See for instance: when you click on the Binder link at the bottom of the Magnetostatic Fields example, Binder opens the script as a notebook in Jupyter. And that notebook… has a .py extension!

Being able to open scripts or Markdown documents as notebooks on Binder is just a matter of adding jupytext to the project (or Binder)'s requirement file. And if you want to state explicitly that all scripts should be treated as Sphinx Gallery scripts, and that you want the rST content to be converted to markdown, then you can add a .jupyter/jupyter_notebook_config.py file:

github.com

PlasmaPy/PlasmaPy/blob/b2295d5d26318ce2f3c69b6ef1503f9a8490ebaf/.jupyter/jupyter_notebook_config.py

c.NotebookApp.contents_manager_class = "jupytext.TextFileContentsManager"
c.ContentsManager.preferred_jupytext_formats_read = "py:sphinx"
c.ContentsManager.sphinx_convert_rst2md = True

In the case of the scikit-learn repo, I think the challenge is the same, i.e. deploy Binder on the main repository, but the setup is a bit different. There, the .ipynb files are created explicitly in .binder\postBuild:

github.com

scikit-learn/scikit-learn/blob/f8a3f4b90143da5ab34ffbbf052a8f41d226bc18/.binder/postBuild#L26


TMP_CONTENT_DIR=/tmp/scikit-learn
mkdir -p $TMP_CONTENT_DIR
cp -r examples .binder $TMP_CONTENT_DIR
# delete everything in current directory including dot files and dot folders
find . -delete


# Generate notebooks and remove other files from examples folder
GENERATED_NOTEBOOKS_DIR=.generated-notebooks
cp -r $TMP_CONTENT_DIR/examples $GENERATED_NOTEBOOKS_DIR


find $GENERATED_NOTEBOOKS_DIR -name '*.py' -exec sphx_glr_python_to_jupyter.py '{}' +
NON_NOTEBOOKS=$(find $GENERATED_NOTEBOOKS_DIR -type f | grep -v '\.ipynb')
rm -f $NON_NOTEBOOKS


# Put the .binder folder back (may be useful for debugging purposes)
mv $TMP_CONTENT_DIR/.binder .
# Final clean up
rm -rf $TMP_CONTENT_DIR


# This is for compatibility with binder sphinx-gallery integration: this makes
# sure that the binder links generated by sphinx-gallery are correct even tough

lesteve · September 20, 2019, 11:44am

Looks interesting. Could you explain a bit why without this patch you have to host things on GH pages? I thought the way that sphinx-gallery works is to generate the notebooks when you build the HTML and then have a link to the repo in the “launch” badge at the bottom. Meaning where you host the built output shouldn’t matter?

binder needs a repo where your notebooks live
The sphinx-gallery/binder integration assumes that your have a repo where you put the output of your doc build. Notebooks are generated by sphinx-gallery as part of the doc build. Having a GH pages is a way to have both a repo (for binder) and a website for your doc.
when you are using ReadTheDocs you only have the website for your doc. You can not really use binder because you don’t have a repo where your notebooks live. Yes you could set-up a CI job that would push in an external github repo, but this defeats the simplicity of using ReadTheDocs in the first place.

My 2 cents:

for a simple repo where I have only some python examples and I want them on binder as notebook, I would use the jupytext approach.
for a more complex repo where you already use sphinx-gallery, I think the scikit-learn approach (generating notebooks in the postBuild step), although slightly hacky, is good enough. In particular you don’t have the additional jupytext dependency (so you don’t have to convince reviewers that it is worth adding this additional dependency), the notebooks on binder and the notebooks you can download from the example HTML are generated by the same tool (i.e. sphinx-gallery). Minor point: to me it also feels slightly less magical: the notebooks exist in the docker image and you don’t have to configure Jupyter content manager to create the notebook from the python file when the file is loaded.
maybe there is a way to change sphinx-gallery to be able to do what is needed for the jupytext approach (in particular having the binder links point to .py files rather than .ipynb files). I think sphinx-gallery is already tricky to configure (and have a good mental model of as a maintainer), so providing yet another variation may make not improve the situation in this respect. Having said that, if someone want to give this approach a go, please do!

choldgraf · September 21, 2019, 1:17pm

so the tl;dr on that is: “Because Sphinx-Gallery assumes it can link to a notebook that’s in a git repository somewhere, but if the gallery is built on ReadTheDocs, the generated notebooks will only be within the RTD-hosted website, not in a git repository”

@lesteve what do you think about adding a quick docs PR to sphinx-gallery to document the “build the notebooks in a postBuild step so the links work” process? I’m happy to review a PR

lesteve · October 15, 2019, 5:27am

Sorry I dropped the ball on this one.

Adding a PR in the sphinx-gallery doc seems a reasonable idea, although if I am being honest it feels a bit too hacky to be featured as an official work-around. I am afraid I am unlikely to do it in the short-term …

If people try the “no notebooks in the git repo but some notebooks in the docker image” (for lack of a better name, please suggest one …) a la scikit-learn approach and have questions/issues, more than happy to try help on this thread.

mgeier · March 13, 2020, 6:38pm

Since it hasn’t yet been mentioned in this discussion, I’d like to make some advertisement for my little Sphinx extension nbsphinx: https://github.com/spatialaudio/nbsphinx.

It allows Sphinx to use Jupyter notebooks as source files.

It also enables support for any files that Jupytext can handle, see https://nbsphinx.readthedocs.io/en/0.5.1/custom-formats.html. I’m working on a concrete example in https://github.com/spatialaudio/nbsphinx/pull/408.

It also makes it easy to add Binder links, see https://nbsphinx.readthedocs.io/en/0.5.1/prolog-and-epilog.html.

And soon, it will also get its own gallery feature: https://github.com/spatialaudio/nbsphinx/pull/392.

Topic		Replies	Views
Binder Notebook Builder Bot Binder	6	1238	April 14, 2020
ANN: nbsphinx 0.8.0: Create HTML pages and LaTeX/PDF from your Jupyter notebooks Meta announcement , release	0	693	October 20, 2020
[ANN] nbsphinx 0.6.0: Create HTML pages and LaTeX/PDF from your Jupyter notebooks -- now with thumbnail galleries! Notebook announcement , release	0	470	April 3, 2020
Save as or duplicate notebook workflow with Binder examples Binder help-wanted	3	532	July 23, 2022
An interactive Binder config file builder GUI Meta	6	1312	June 23, 2020

Auto-generating ipynb files for documentation with sphinx-gallery and jupytext

tl;dr

Related topics