Modify the ASCII cell output to a HTML table

I’m using a spark kernel inside the notebook. Unlike Python, the resultant output table is not beautified. When I run myDataFrame.show(10), it prints an ASCII table which is very hard to read given the possible nested nature of the result. I was thinking a tabular output would be a good first step or an optional JSON array to handle the nested ness. How can I intercept this output and reformat with a HTML interactive table?

Have you tried something like this:

myDataFrame.limit(10).toPandas()

See here and here. According to here you’d need Pandas installed and available. However, I’m unsure though what kernel they are referring to since the first SO post just says Jupyter or if pyspark and the spark kernel are different?

Thanks for the idea. I forgot to add that I’m working with Scala spark and not pyspark which doesn’t have toPandas() method.
In general how to format the output of a cell before painting it on screen?

Can you try putting the code credited to Aivean shown here in a cell and running that cell and then try:

myDataFrame.showHTML(10, 300)

Problem with that idea is that I cannot tell if it is particular to the almond kernel?

Neat idea! But as you guessed the sparkmagic kernel I’m using does not provide (afaik) an output handler like publish used by Aivean. I raised a feature request for the same on sparkmagic github at https://github.com/jupyter-incubator/sparkmagic/issues/670#issue-705831198

But I was thinking of implementing a magic command (if it doesn’t exist) which is kernel agnostic, that intercepts the output of myScalaDataFrame.show() and transforms it to a HTML table. Any thoughts on that?

Sorry, I haven’t made any magic commands.

If you want something less fancy and working now… does write work in your kernel? If so, you could send the output of .show() to a file with the following,like in step 3 here:

text_file = open("filename.html", "a")
# write the ASCI
text_file.write(myDataFrame.show(10))

Or try:

 %store myDataFrame.show(10) >filename.html

Then you can adjust the ASCII to HTML using this or pandoc, perhaps? And then display that HTML in your notebook if the kernel allows that?

This idea is cool too. Thanks for sharing these. However, I think they expect myDataFrame to be a local Python data structure. In my case, all data frames are in spark context managed by livy, which cannot be accessed by local kernel. So, I think hooking into output handler of the Jupyter notebook ecosystem, would be the only way. I’m assuming this is possible using magic commands.