Possible to send markdown text to a markdown cell in a new notebook via Papermill? Other options?

I am using papermill to parameterize making a bunch of notebooks. These notebooks are meant to be stubs so that not everything is 100% complete in them. One of the things I have for input into Papermill is raw markdown formatted text that I got from a conversion via pandoc. Ultimately I want this raw markdown text to show up as a typical markdown cell in the new notebook. I could get it to display the rendered markdown as output using this but then it cannot be edited further. And of course, I could just use print() to get the raw markdown to show up as text I could that I can then copy and paste in a new markdown cell in the generated notebook and render as markdown. And that would work, but it sounds tedious for a lot of notebooks and it feels like I am missing a step where I could go direct from markdown text stored as a Python variable to the text as a markdown cell in a new notebook.

Discussing it with JĂĽrgen Hermann (@jhermann), we came up with some options. Options discussed:

My present solution is to insert using notedown to fist make an unrun notebook that puts the markdown as markdown cells and then I can run the notebooks via Papermill to have the required resulting output be added. But I am still curious if there is a more direct way to inject programmatically generated markdown text into a new notebook and have it be in a markdown cell? Suggestions?

It seems like this could be pretty straightforward to do just with a lightweight bit of Python code to manually insert markdown into a generated notebook. Have you given that a shot?

That is definitely an option. I am doing a lot of changes in multiple parts of the notebook. And because I knew I would need papermill to run it, I guess I was relying on that avoiding touching the json myself.

you wouldn’t necessarily need to touch the raw JSON, you could use nbformat to grab the output notebooks and insert markdown as needed. E.g.

import nbformat as nbf
ntbk = nbf.read(myntbk.ipynb, nbf.NO_CONVERT)
for cell in ntbk.cells:
    if cell.source == "<insert_placeholder>":
        cell.source = "# My markdown\n\nGoes here
1 Like

That looks useful to know how to do. I think that and some of the other abilities of nbconvert were things I was missing that had me thinking this had to be more easily automated and also explained why Papermill didn’t have that ability. Thanks.

I ended up using python to build up a string for each notebook I wanted to make from each starting markdown. I saved that as markdown with the specific code blocks to be ultimately code cells tagged specially with {.python .input}. The other code blocks among the markdown text will ultimately be rendered as literal markdown code blocks by notedown, which is what I needed to happen. With the markdown built up then all I needed was the following to for each markdown file to make a new notebook via notedown and execute it via nbconvert to get the code output to show in the resulting notebook:

!notedown --match=strict input.md > {notbeook_name}
!jupyter nbconvert --to notebook --execute {notbeook_name} --output={notbeook_name}

notedown made it easy to mix injecting code and markdown to the resulting notebook via markdown which is where I was starting from.

1 Like

Another option, if you wanted to use papermill but add some injected code is to inherit the nbconvert engine, register it as “markdown-nbconvert”, and do the cell source injection as @choldgraf suggested before performing the rest of the execution as normal. There’s instructions for how to register these extensions here. Then from papermill you could call papermill input.ipynb output.ipynb --engine markdown-convert -p foo bar -p etc etc

1 Like

Thanks, MSeal. That would allow me return to just primarily using papermill. One thing I am understanding though is the order of injecting the markdown? Wouldn’t I want to do it after the new notebook executed so I don’t mess up my template? I see under here, it specifically says, “could apply post-processing to the executed notebook.”

You could do either before or after. The execute_managed_notebook function is wrapping the notebook execution so manipulating the nb_man.nb (this is the underlying nbformat object) can happen as a pre or post processing step.

Also note that with papermill you typically want to keep the input and output paths separate. That way your original template/input isn’t edited no matter what papermill does. Injecting the markdown before it executes only affects the in-memory representation. It won’t edit the source file in any way unless you point the output back to that source.

2 Likes

Thanks for the useful tip. I just ran

!notedown --match=strict input.md > {notbeook_name}

but I am getting this error.

Failed validating 'oneOf' in notebook['properties']['source']:

On instance['source']:
None

Any suggestions?

Hmmm…I haven’t used notedown recently. I see the Github issues listing a few things but nothing like yours. I suspect some dependency that notedown uses now needs something it isn’t providing. If you are working with environments, maybe you can pin some things it uses back to older versions. If you decide to go the troubleshooting route, you may want to post an issue on the Github page.

This post at Notedown’s issues page makes me think pandoc and Jupytext would be the way to go now. I have used Jupytext with lots of success; however, I think the tagging system for what are to be code cells is different than Notedown.I think I only have one or pipelines that were using Notedown and I suspect in my case it would be fairly easy to convert to ways more actively supported. Maybe you aren’t as fortunate?

Thank you very much, I will give it a try