Auto-generating ipynb files for documentation with sphinx-gallery and jupytext

I just heard from @mwouts about this interesting patch/deploy of sphinx-gallery for building Binder links in Sphinx documentation. Here’s the blog post about it:

tl;dr

Sphinx-Gallery generates a collection of .ipynb files for each example and creates a Binder link to point to it. However, this has the problem that you must host the documentation on gh-pages, not readthedocs. The PlasmaPy folks patch decided to utilize Jupytext so that, in Binder, their sphinx-gallery .py files are automatically converted to ipynb when the Binder session launches. This lets you host the documentation on readthedocs!

4 Likes

Looks interesting. Could you explain a bit why without this patch you have to host things on GH pages? I thought the way that sphinx-gallery works is to generate the notebooks when you build the HTML and then have a link to the repo in the “launch” badge at the bottom. Meaning where you host the built output shouldn’t matter?


Advert: scikit-learn now uses mybinder.org for their examples :slight_smile: https://scikit-learn.org/dev/auto_examples/plot_anomaly_comparison.html#sphx-glr-auto-examples-plot-anomaly-comparison-py via sphinx-gallery (of course).

1 Like

I’d like to add that the PlasmaPy gallery relies on Jupytext’s capacity to open Sphinx Gallery scripts directly as notebooks in Jupyter. No .ipynb file is created at any point in this process. See for instance: when you click on the Binder link at the bottom of the Magnetostatic Fields example, Binder opens the script as a notebook in Jupyter. And that notebook… has a .py extension!

Being able to open scripts or Markdown documents as notebooks on Binder is just a matter of adding jupytext to the project (or Binder)'s requirement file. And if you want to state explicitly that all scripts should be treated as Sphinx Gallery scripts, and that you want the rST content to be converted to markdown, then you can add a .jupyter/jupyter_notebook_config.py file:

In the case of the scikit-learn repo, I think the challenge is the same, i.e. deploy Binder on the main repository, but the setup is a bit different. There, the .ipynb files are created explicitly in .binder\postBuild:

1 Like

Looks interesting. Could you explain a bit why without this patch you have to host things on GH pages? I thought the way that sphinx-gallery works is to generate the notebooks when you build the HTML and then have a link to the repo in the “launch” badge at the bottom. Meaning where you host the built output shouldn’t matter?

  • binder needs a repo where your notebooks live
  • The sphinx-gallery/binder integration assumes that your have a repo where you put the output of your doc build. Notebooks are generated by sphinx-gallery as part of the doc build. Having a GH pages is a way to have both a repo (for binder) and a website for your doc.
  • when you are using ReadTheDocs you only have the website for your doc. You can not really use binder because you don’t have a repo where your notebooks live. Yes you could set-up a CI job that would push in an external github repo, but this defeats the simplicity of using ReadTheDocs in the first place.

My 2 cents:

  • for a simple repo where I have only some python examples and I want them on binder as notebook, I would use the jupytext approach.
  • for a more complex repo where you already use sphinx-gallery, I think the scikit-learn approach (generating notebooks in the postBuild step), although slightly hacky, is good enough. In particular you don’t have the additional jupytext dependency (so you don’t have to convince reviewers that it is worth adding this additional dependency), the notebooks on binder and the notebooks you can download from the example HTML are generated by the same tool (i.e. sphinx-gallery). Minor point: to me it also feels slightly less magical: the notebooks exist in the docker image and you don’t have to configure Jupyter content manager to create the notebook from the python file when the file is loaded.
  • maybe there is a way to change sphinx-gallery to be able to do what is needed for the jupytext approach (in particular having the binder links point to .py files rather than .ipynb files). I think sphinx-gallery is already tricky to configure (and have a good mental model of as a maintainer), so providing yet another variation may make not improve the situation in this respect. Having said that, if someone want to give this approach a go, please do!
3 Likes

so the tl;dr on that is: “Because Sphinx-Gallery assumes it can link to a notebook that’s in a git repository somewhere, but if the gallery is built on ReadTheDocs, the generated notebooks will only be within the RTD-hosted website, not in a git repository”

@lesteve what do you think about adding a quick docs PR to sphinx-gallery to document the “build the notebooks in a postBuild step so the links work” process? I’m happy to review a PR

Sorry I dropped the ball on this one.

Adding a PR in the sphinx-gallery doc seems a reasonable idea, although if I am being honest it feels a bit too hacky to be featured as an official work-around. I am afraid I am unlikely to do it in the short-term …

If people try the “no notebooks in the git repo but some notebooks in the docker image” (for lack of a better name, please suggest one …) a la scikit-learn approach and have questions/issues, more than happy to try help on this thread.

1 Like