Extract specific cells from students' notebooks

The JSON content and the .ipynb file are one in the same. It just differs how you access them. If you open it as a notebook in Jupyter or if you open the file for the notebook in a text editor or via reading in a file stream in Python, etc… The latter two specific examples would let you see the JSON directly.

I would actually suggest using one level up from the JSON by using nbformat to access the noetebook elements and then cell content you want. It already handles parsing notebooks. nbformat let’s you access the cell contents as strings and so you don’t have to worry about the encoding that may be in the JSON code. Search in this forum for ‘nbformat’ for a number of examples of using it I’ve linked to in this forum. In particular, this rather complex-looking example links to some of my uses of it.

Your implementation would just include the reading part (see mainly first few lines here and then you’d then be able to parse the cells in a number of ways to extract the particular ones and the content you want.

Alternatively, you could use Jupytext to convert the .ipynb file to a Python script that would then allow you access the content you want if you further parsed the .py file using Python to read it in as you would a text file.

Which one of those three you choose sort of depends on what is the cell content you are looking for and how easy it is to find the hooks you need to extract what will work for your downstream uses.

And I could easily see what you describe in your post as snakefile that let’s you run a Snakemake pipeline to do that process for student’s notebooks. The advantage there is that snakemake defines recipes to process each file all the way through the pipeline and you can add files later without running the steps on all the input files again, just the new ones. That may be nice since your students may not hand things on all at the same time. And even if they do, certain notebooks may require some massaging to get them to process right and so that way you aren’t running all the steps again on all the ones that may have worked on the first pass. If that seems unclear or unfamiliar, I’d be glad you to help you more.

2 Likes