Normally, Chris Holdgraf’s nbclean is my go-to tool for this type of thing; however, I couldn’t see a way to easily parse just markdown. Fortunately, nbformat
can handle it.
This answer by Chris Holdgraf outlined how to do something similar. Editing that code to just leave the markdown cells can be done this way:
import nbformat as nbf
ntbk = nbf.read("old_notebook.ipynb", nbf.NO_CONVERT)
cells_to_keep = []
for cell in ntbk.cells:
if cell.cell_type == "markdown":
cells_to_keep.append(cell)
new_ntbk = ntbk
new_ntbk.cells = cells_to_keep
nbf.write(new_ntbk, "new_md_only_notebook.ipynb", version=nbf.NO_CONVERT)
This was helpful for dissecting ntbk
.
(Shorter version of the same code using list comprehension is below:)
import nbformat as nbf
ntbk = nbf.read("old_notebook.ipynb", nbf.NO_CONVERT)
new_ntbk = ntbk
new_ntbk.cells = [cell for cell in ntbk.cells if cell.cell_type == "markdown"]
nbf.write(new_ntbk, "new_md_only_notebook.ipynb", version=nbf.NO_CONVERT)