How to obtain HTML string from Markdown, as Jupyter does it?

I’d like to get the HTML corresponding to a Markdown in Jupyter, so I could put it in a widget.

There is this question, 2 years old, no answer:

stackoverflow .com/questions/51935234/can-i-get-html-string-from-ipython-display-markdown

I have also seen discourse.jupyter .org/t/possible-to-render-a-markdowncell-inside-a-widget-not-a-notebook-extension/6698 → github .com/jupyter-widgets/ipywidgets/issues/2428#issuecomment-500084610 , which basically just does import markdown; however, I’d like to use the same engine that Jupyter uses to make the Markdown to HTML conversion.

So, I was trying to look into it a bit myself; typically, if I do this:

from IPython.display import display, Markdown as md

myvar = 10
my_md  = md(f"""There are {myvar} **elements** """)
display(my_md)

… then everything works fine.

However, now I’d like to put this output inside a widget, and that is a problem - because if I do:

from IPython.display import display, Markdown as md
from ipywidgets import widgets, Layout

myvar = 10
my_md  = md(f"""There are {myvar} **elements** """)

display(widgets.HTML("<b>hello</b>")) # works fine
display(widgets.HTML( my_md  )) # error

… it results with:

TraitError: The 'value' trait of a HTML instance expected a unicode string, not the Markdown <IPython.core.display.Markdown object>.

So, a widget wants a HTML string - which is why I want to convert Markdown to HTML.

So, I’ve seen that github .com/ipython/ipython/blob/master/IPython/display.py has a display_markdown function.

But display_markdown just calls _display_mimetype('text/markdown', objs, **kwargs)

And _display_mimetype just calls display(*objs, raw=raw, metadata=metadata, include=[mimetype])

There are 4 definitions in IPython source of a display function; the “biggest” one is here: github .com/ipython/ipython/blob/master/IPython/core/display_functions.py#L88 - but it apparently just calls publish_display_data.

publish_display_data just calls display_pub.publish, which I believe is defined here: github .com/ipython/ipython/blob/master/IPython/core/displaypub.py#L61

And the entirety of the publish method seems to be:

        handlers = {}
        if self.shell is not None:
            handlers = getattr(self.shell, 'mime_renderers', {})

        for mime, handler in handlers.items():
            if mime in data:
                handler(data[mime], metadata.get(mime, None))
                return

        if 'text/plain' in data:
            print(data['text/plain'])

So, I might get handlers as the mime_renderers attribute of self.shell, or I might not; if I do get it, I should iterate through it, to find the right handler for the mime type, in this case I believe text/markdown.

But I can not find where is this mime_renderers populated, so I can see what is the actual handler for text/markdown. So at this point, the only remaining thing I could do on my own, is find some way to have a breakpoint in the publish method, and printout out handler, to see what it does - and I’m not exactly sure how I’d do this.

So, is there anyone here, that could point out, what is it, that is actually used to do the Markdown - HTML conversion in Jupyter? And even better, could I somehow retrieve the HTML string corresponding to a Markdown object?

OK, got a bit further - but not to the solution, so just want to document this.

First of all, it is possible to start jupyter on the server with pdb, to be able to debug; in my case, I had to look up the actual command string via ps axf on the string, and then stop the jupyter service, and then I could start “jupyter” manually with pdb, so it looked like this on the command line:

$ /home/jupyter/Jupyter/notebook/bin/python -m pdb /home/jupyter/Jupyter/notebook/bin/jupyter-notebook --config=/home/jupyter/.jupyter/jupyter_notebook_config.py
> /home/jupyter/Jupyter/notebook/bin/jupyter-notebook(3)<module>()
-> import re
(Pdb)

Unfortunately, not everything will break, if you specify breakpoints at this instance in Jupyter.

More of a surprise for me, is that you can issue a pdb breakpoint directly in a Jupyter cell - and you get a small GUI textbox to interact with pdb there:

This way, it is a bit easier to track down what happens. And what I’ve seen, is this:

When Jupyter starts, you can set a breakpoint on init_settings:

from notebook.notebookapp import NotebookWebApplication
b NotebookWebApplication.init_settings

Along those lines, I learned that when initially Jupyter renders the main page (with the file list), it is done via IPythonHandler.render_template:

from notebook.base.handlers import IPythonHandler
b IPythonHandler.render_template

# only does the tree - but not individual ipynb:
# ns = {'page_title': 'Home Page - Select or create a notebook', 'notebook_path': '',
# <Template 'tree.html'> .render(**ns)

… however, this does not handle actual .ipynb. The .ipynb file, as such, is handled by the Tornado webserver - seen from a high level, through a template:

from tornado.web import RequestHandler
b RequestHandler._execute

# this one also handles .css files with StaticFileHandler.get; / with TreeHandler.get ...
# when .ipynb, it ends in self.path_kwargs, {'path': '/teststart.ipynb'}, handler is NotebookHandler.get of <notebook.notebook.handlers.NotebookHandler; but it might get web.py(2994)_get_cached_version() ...
# class NotebookHandler(IPythonHandler): -> self.write(self.render_template('notebook.html',

But ultimately, we can break inside the cell, and see what display(my_md) (as in OP, or display(mdout) as on the screenshot) would have done. Essentially, this function prepares a message with the Markdown text data, and then uses display_pub method of the InteractiveShell, which in case of Jupyter in a browser is ZMQInteractiveShell, to somehow send this message to the browser. So, here is an edited snipped of my pdb session here:

ipdb> p display
<function display at 0x7f8cc6858940>
ipdb> b display
Breakpoint 1 at /home/jupyter/Jupyter/notebook/lib/python3.8/site-packages/IPython/core/display.py:131
ipdb> c

> /home/jupyter/Jupyter/notebook/lib/python3.8/site-packages/IPython/core/display.py(281)display()

ipdb> p InteractiveShell.initialized()
True
ipdb> p display_id
None
ipdb> p objs
(<IPython.core.display.Markdown object>,)

ipdb> p InteractiveShell.instance()
<ipykernel.zmqshell.ZMQInteractiveShell object at 0x7f8cc54e6310>
ipdb> p InteractiveShell.instance().display_formatter
<IPython.core.formatters.DisplayFormatter object at 0x7f8cc4484490>
ipdb> p InteractiveShell.instance().display_formatter.format
<bound method DisplayFormatter.format of <IPython.core.formatters.DisplayFormatter object at 0x7f8cc4484490>>
ipdb> InteractiveShell.instance().display_formatter.format({'text/markdown': 'hello'})
({'text/plain': "{'text/markdown': 'hello'}"}, {})

ipdb> p raw
False

format_dict, md_dict = format(obj, include=include, exclude=exclude)
ipdb> p format_dict
{'text/plain': '<IPython.core.display.Markdown object>', 'text/markdown': 'There are 10 **elements** '}
ipdb> p md_dict
{}
ipdb> p include
None
ipdb> p exclude
None


publish_display_data(data=format_dict, metadata=md_dict, **kwargs)
display_pub = InteractiveShell.instance().display_pub

ipdb> p display_pub
<ipykernel.zmqshell.ZMQDisplayPublisher object at 0x7f9498baa4f0>

display_pub.publish( data=data, metadata=metadata, ...

> /home/jupyter/Jupyter/notebook/lib/python3.8/site-packages/ipykernel/zmqshell.py(87)publish()

ipdb> p data
{'text/plain': '<IPython.core.display.Markdown object>', 'text/markdown': 'There are 10 **elements** '}
ipdb> p metadata
{}

ipdb> p content
{'data': {'text/plain': '<IPython.core.display.Markdown object>', 'text/markdown': 'There are 10 **elements** '}, 'metadata': {}, 'transient': {}}

ipdb> p msg_type
'display_data'

--> 125         msg = self.session.msg(
    126             msg_type, json_clean(content),
    127             parent=self.parent_header

> /home/jupyter/Jupyter/notebook/lib/python3.8/site-packages/jupyter_client/session.py(632)msg()

--> 632     def msg(
    633         self,
    634         msg_type: str,

ipdb> p msg
{'header': {'msg_id': '3483f1bc-056fb71471d9ea57e18f4859_8115_233', 'msg_type': 'display_data', 'username': 'jupyter', 'session': '3483f1bc-056fb71471d9ea57e18f4859', 'date': datetime.datetime(2021, 9, 2, 8, 25, 43, 310540, tzinfo=datetime.timezone.utc), 'version': '5.3'}, 'msg_id': '3483f1bc-056fb71471d9ea57e18f4859_8115_233', 'msg_type': 'display_data', 'parent_header': {'msg_id': 'cc2337816f1a4881960342d01f63e29e', 'username': 'username', 'session': '0824348cbbc049da90301820dc611574', 'msg_type': 'execute_request', 'version': '5.2', 'date': datetime.datetime(2021, 9, 2, 8, 17, 10, 301038, tzinfo=datetime.timezone.utc)}, 'content': {'data': {'text/plain': '<IPython.core.display.Markdown object>', 'text/markdown': 'There are 10 **elements** '}, 'metadata': {}, 'transient': {}}, 'metadata': {}}

    656         return msg

zmqshell.py(138)publish():

--> 138         self.session.send(
    139             self.pub_socket, msg, ident=self.topic,
    140         )

ipdb> p self.pub_socket
<ipykernel.iostream.BackgroundSocket object at 0x7f949d401310>
ipdb> p self.topic
b'display_data'

> /home/jupyter/Jupyter/notebook/lib/python3.8/site-packages/jupyter_client/session.py(737)send()

ipdb> p msg
{'header': {'msg_id': '3483f1bc-056fb71471d9ea57e18f4859_8115_233', 'msg_type': 'display_data', 'username': 'jupyter', 'session': '3483f1bc-056fb71471d9ea57e18f4859', 'date': datetime.datetime(2021, 9, 2, 8, 25, 43, 310540, tzinfo=datetime.timezone.utc), 'version': '5.3'}, 'msg_id': '3483f1bc-056fb71471d9ea57e18f4859_8115_233', 'msg_type': 'display_data', 'parent_header': {'msg_id': 'cc2337816f1a4881960342d01f63e29e', 'username': 'username', 'session': '0824348cbbc049da90301820dc611574', 'msg_type': 'execute_request', 'version': '5.2', 'date': datetime.datetime(2021, 9, 2, 8, 17, 10, 301038, tzinfo=datetime.timezone.utc)}, 'content': {'data': {'text/plain': '<IPython.core.display.Markdown object>', 'text/markdown': 'There are 10 **elements** '}, 'metadata': {}, 'transient': {}}, 'metadata': {}}
ipdb> p buffers
[]

ipdb> p to_send
[b'display_data', b'<IDS|MSG>', b'24f5f5b435815a230e8208664440b11f8698406608c55f7f7d68d615091469b9', b'{"msg_id": "3483f1bc-056fb71471d9ea57e18f4859_8115_233", "msg_type": "display_data", "username": "jupyter", "session": "3483f1bc-056fb71471d9ea57e18f4859", "date": "2021-09-02T08:25:43.310540Z", "version": "5.3"}', b'{"msg_id": "cc2337816f1a4881960342d01f63e29e", "username": "username", "session": "0824348cbbc049da90301820dc611574", "msg_type": "execute_request", "version": "5.2", "date": "2021-09-02T08:17:10.301038Z"}', b'{}', b'{"data": {"text/plain": "<IPython.core.display.Markdown object>", "text/markdown": "There are 20 **elements** of both `the_x`, and `the_y`;\\n\\nThose values are plotted on the diagram on the left"}, "metadata": {}, "transient": {}}']

    840             tracker = DONE
--> 841             stream.send_multipart(to_send, copy=copy)


> /home/jupyter/Jupyter/notebook/lib/python3.8/site-packages/ipykernel/iostream.py(271)send_multipart()

    271     def send_multipart(self, *args, **kwargs):
    272         """Schedule send in IO thread"""
--> 273         return self.io_thread.send_multipart(*args, **kwargs)

> /home/jupyter/Jupyter/notebook/lib/python3.8/site-packages/ipykernel/iostream.py(218)send_multipart()

--> 223         self.schedule(lambda : self._really_send(*args, **kwargs))

ipdb> p self._really_send
<bound method IOPubThread._really_send of <ipykernel.iostream.IOPubThread object at 0x7f949d4012b0>>
ipdb> b self._really_send
Breakpoint 2 at /home/jupyter/Jupyter/notebook/lib/python3.8/site-packages/ipykernel/iostream.py:225

Here is where it gets tricky - you can set a breakpoint in _really_send, but it won’t break, likely because of the lambda there. But, just for the record, here is how the break looks like in self.schedule:

> /home/jupyter/Jupyter/notebook/lib/python3.8/site-packages/ipykernel/iostream.py(206)schedule()

    211         if self.thread.is_alive():
--> 212             self._events.append(f)
    213             # wake event thread (message content is ignored)
    214             self._event_pipe.send(b'')

ipdb> p f
<function IOPubThread.send_multipart.<locals>.<lambda> at 0x7f946ea214c0>
ipdb> p self._events
deque([])
ipdb> p self
<ipykernel.iostream.IOPubThread object at 0x7f949d4012b0>
ipdb> p self._event_pipe
<zmq.Socket(zmq.PUSH) at 0x7f949d410e20>


ipdb> n  ################ HTML was output at this point!

So, as soon as self._events.append(f) executes, where f is a IOPubThread.send_multipart lambda function - I get the HTML rendered.

So, unfortunately, I’m still none the wiser of where does this Markdown → HTML conversion occur; or if it occurs in Python at all?!

In any case, it can be seen that _really_send calls send_multipart; the problem is which send_multipart - in my Jupyter installation there’s 7 instances of def send_multipart

Jupyter/notebook/lib/python3.8/site-packages/ipykernel/inprocess/socket.py:32
Jupyter/notebook/lib/python3.8/site-packages/ipykernel/iostream.py:218
Jupyter/notebook/lib/python3.8/site-packages/ipykernel/iostream.py:271
Jupyter/notebook/lib/python3.8/site-packages/zmq/_future.py:240
Jupyter/notebook/lib/python3.8/site-packages/zmq/green/core.py:271
Jupyter/notebook/lib/python3.8/site-packages/zmq/eventloop/zmqstream.py:253
Jupyter/notebook/lib/python3.8/site-packages/zmq/sugar/socket.py:543

I’m suspecting it is this one ./lib/python3.8/site-packages/ipykernel/inprocess/socket.py:

    def send_multipart(self, msg_parts, flags=0, copy=True, track=False):
        msg_parts = list(map(zmq.Message, msg_parts))
        self.queue.put_nowait(msg_parts)
        self.message_sent += 1

… but still, there is no mention of markdown in the zmq library, so I doubt any Markdown → HTML conversion happens here…

Actually, I found a comment in ./lib/python3.8/site-packages/jupyter_nbextensions_configurator/static/nbextensions_configurator/render/render.js:

    var render_markdown = function(md_contents, relative_url_root) {
        var div = $('<div>');
        // the bulk of this functon is adapted from
        // notebook/js/textcell.Markdowncell.render
        // with the addition of code to absolutify relative href/src attributes

… and since I’m discussing markdown code cells here, this is likely handled in ./lib/python3.8/site-packages/notebook/static/notebook/js/notebook.js:


    /**
     * Re-render the output of a code cell.
     */
    Notebook.prototype.render_cell_output = function (code_cell) {
        var cell_data = code_cell.toJSON();
        var cell_index = this.find_cell_index(code_cell);
        var trusted = code_cell.output_area.trusted;
        this.clear_output(cell_index);
        code_cell.output_area.trusted = trusted;
        code_cell.fromJSON(cell_data);
    };

… but still, nothing specific about markdown to HTML conversion.


EDIT: Ok, I think I finally have it confirmed, that the Markdown → HTML conversion of a Python statement like display(my_md) in a code cell, actually happens in JavaScript - more specifically, in the append_markdown function, in

Jupyter/notebook/lib/python3.8/site-packages/notebook/static/notebook/js/outputarea.js

I guess, that same file is here in the source:

You can set a JavaScript breakpoint in the browser on the first line of append_markdown function, re-run the cell with display(my_md) - and it will break, and you will be able to see both the original Markdown text, and the resulting HTML - which is a result of a call to JavaScript markdown.render function.

Ultimately, I just realizes that these messages have a tag (self.topic) which here was display_data; so I just looked up display_data in the .js files of my Jupyter installation - and found it, also in the file mentioned above:

...
    OutputArea.prototype.handle_output = function (msg) {
        var json = {};
        var msg_type = json.output_type = msg.header.msg_type;
        var content = msg.content;
        switch(msg_type) {
        ...
        case "update_display_data":
        case "display_data":
        ...

… which was a clear sign, that JavaScript takes over at least a part of the rendering of the page; and then I kind of luckily stumbled upon append_markdown, and could confirm it with a breakpoint.

So, it’s great that now I know what is going on - not so great that I cannot use this Markdown->HTML conversion in Python; and it would have been nice, because this conversion also includes MathJax rendering, which I’d need in this case (otherwise I’d have just written HTML directly, and not bothered so much to see what’s going on).

Oh well…

1 Like

Markdown to html conversion happens in JavaScript using the marked.js library.

2 Likes

See notebook/textcell.js at 2cfff07a39fa486a3f05c26b400fa26e1802a053 · jupyter/notebook · GitHub for markdown cells, at least.

2 Likes

See notebook/outputarea.js at 2cfff07a39fa486a3f05c26b400fa26e1802a053 · jupyter/notebook · GitHub for the code responsible for rendering an output of type markdown.

2 Likes