OK, got a bit further - but not to the solution, so just want to document this.
First of all, it is possible to start jupyter on the server with pdb
, to be able to debug; in my case, I had to look up the actual command string via ps axf
on the string, and then stop the jupyter service, and then I could start “jupyter” manually with pdb
, so it looked like this on the command line:
$ /home/jupyter/Jupyter/notebook/bin/python -m pdb /home/jupyter/Jupyter/notebook/bin/jupyter-notebook --config=/home/jupyter/.jupyter/jupyter_notebook_config.py
> /home/jupyter/Jupyter/notebook/bin/jupyter-notebook(3)<module>()
-> import re
(Pdb)
Unfortunately, not everything will break, if you specify breakpoints at this instance in Jupyter.
More of a surprise for me, is that you can issue a pdb
breakpoint directly in a Jupyter cell - and you get a small GUI textbox to interact with pdb
there:
This way, it is a bit easier to track down what happens. And what I’ve seen, is this:
When Jupyter starts, you can set a breakpoint on init_settings
:
from notebook.notebookapp import NotebookWebApplication
b NotebookWebApplication.init_settings
Along those lines, I learned that when initially Jupyter renders the main page (with the file list), it is done via IPythonHandler.render_template
:
from notebook.base.handlers import IPythonHandler
b IPythonHandler.render_template
# only does the tree - but not individual ipynb:
# ns = {'page_title': 'Home Page - Select or create a notebook', 'notebook_path': '',
# <Template 'tree.html'> .render(**ns)
… however, this does not handle actual .ipynb
. The .ipynb
file, as such, is handled by the Tornado webserver - seen from a high level, through a template:
from tornado.web import RequestHandler
b RequestHandler._execute
# this one also handles .css files with StaticFileHandler.get; / with TreeHandler.get ...
# when .ipynb, it ends in self.path_kwargs, {'path': '/teststart.ipynb'}, handler is NotebookHandler.get of <notebook.notebook.handlers.NotebookHandler; but it might get web.py(2994)_get_cached_version() ...
# class NotebookHandler(IPythonHandler): -> self.write(self.render_template('notebook.html',
But ultimately, we can break inside the cell, and see what display(my_md)
(as in OP, or display(mdout)
as on the screenshot) would have done. Essentially, this function prepares a message with the Markdown text data, and then uses display_pub
method of the InteractiveShell
, which in case of Jupyter in a browser is ZMQInteractiveShell
, to somehow send this message to the browser. So, here is an edited snipped of my pdb
session here:
ipdb> p display
<function display at 0x7f8cc6858940>
ipdb> b display
Breakpoint 1 at /home/jupyter/Jupyter/notebook/lib/python3.8/site-packages/IPython/core/display.py:131
ipdb> c
> /home/jupyter/Jupyter/notebook/lib/python3.8/site-packages/IPython/core/display.py(281)display()
ipdb> p InteractiveShell.initialized()
True
ipdb> p display_id
None
ipdb> p objs
(<IPython.core.display.Markdown object>,)
ipdb> p InteractiveShell.instance()
<ipykernel.zmqshell.ZMQInteractiveShell object at 0x7f8cc54e6310>
ipdb> p InteractiveShell.instance().display_formatter
<IPython.core.formatters.DisplayFormatter object at 0x7f8cc4484490>
ipdb> p InteractiveShell.instance().display_formatter.format
<bound method DisplayFormatter.format of <IPython.core.formatters.DisplayFormatter object at 0x7f8cc4484490>>
ipdb> InteractiveShell.instance().display_formatter.format({'text/markdown': 'hello'})
({'text/plain': "{'text/markdown': 'hello'}"}, {})
ipdb> p raw
False
format_dict, md_dict = format(obj, include=include, exclude=exclude)
ipdb> p format_dict
{'text/plain': '<IPython.core.display.Markdown object>', 'text/markdown': 'There are 10 **elements** '}
ipdb> p md_dict
{}
ipdb> p include
None
ipdb> p exclude
None
publish_display_data(data=format_dict, metadata=md_dict, **kwargs)
display_pub = InteractiveShell.instance().display_pub
ipdb> p display_pub
<ipykernel.zmqshell.ZMQDisplayPublisher object at 0x7f9498baa4f0>
display_pub.publish( data=data, metadata=metadata, ...
> /home/jupyter/Jupyter/notebook/lib/python3.8/site-packages/ipykernel/zmqshell.py(87)publish()
ipdb> p data
{'text/plain': '<IPython.core.display.Markdown object>', 'text/markdown': 'There are 10 **elements** '}
ipdb> p metadata
{}
ipdb> p content
{'data': {'text/plain': '<IPython.core.display.Markdown object>', 'text/markdown': 'There are 10 **elements** '}, 'metadata': {}, 'transient': {}}
ipdb> p msg_type
'display_data'
--> 125 msg = self.session.msg(
126 msg_type, json_clean(content),
127 parent=self.parent_header
> /home/jupyter/Jupyter/notebook/lib/python3.8/site-packages/jupyter_client/session.py(632)msg()
--> 632 def msg(
633 self,
634 msg_type: str,
ipdb> p msg
{'header': {'msg_id': '3483f1bc-056fb71471d9ea57e18f4859_8115_233', 'msg_type': 'display_data', 'username': 'jupyter', 'session': '3483f1bc-056fb71471d9ea57e18f4859', 'date': datetime.datetime(2021, 9, 2, 8, 25, 43, 310540, tzinfo=datetime.timezone.utc), 'version': '5.3'}, 'msg_id': '3483f1bc-056fb71471d9ea57e18f4859_8115_233', 'msg_type': 'display_data', 'parent_header': {'msg_id': 'cc2337816f1a4881960342d01f63e29e', 'username': 'username', 'session': '0824348cbbc049da90301820dc611574', 'msg_type': 'execute_request', 'version': '5.2', 'date': datetime.datetime(2021, 9, 2, 8, 17, 10, 301038, tzinfo=datetime.timezone.utc)}, 'content': {'data': {'text/plain': '<IPython.core.display.Markdown object>', 'text/markdown': 'There are 10 **elements** '}, 'metadata': {}, 'transient': {}}, 'metadata': {}}
656 return msg
zmqshell.py(138)publish():
--> 138 self.session.send(
139 self.pub_socket, msg, ident=self.topic,
140 )
ipdb> p self.pub_socket
<ipykernel.iostream.BackgroundSocket object at 0x7f949d401310>
ipdb> p self.topic
b'display_data'
> /home/jupyter/Jupyter/notebook/lib/python3.8/site-packages/jupyter_client/session.py(737)send()
ipdb> p msg
{'header': {'msg_id': '3483f1bc-056fb71471d9ea57e18f4859_8115_233', 'msg_type': 'display_data', 'username': 'jupyter', 'session': '3483f1bc-056fb71471d9ea57e18f4859', 'date': datetime.datetime(2021, 9, 2, 8, 25, 43, 310540, tzinfo=datetime.timezone.utc), 'version': '5.3'}, 'msg_id': '3483f1bc-056fb71471d9ea57e18f4859_8115_233', 'msg_type': 'display_data', 'parent_header': {'msg_id': 'cc2337816f1a4881960342d01f63e29e', 'username': 'username', 'session': '0824348cbbc049da90301820dc611574', 'msg_type': 'execute_request', 'version': '5.2', 'date': datetime.datetime(2021, 9, 2, 8, 17, 10, 301038, tzinfo=datetime.timezone.utc)}, 'content': {'data': {'text/plain': '<IPython.core.display.Markdown object>', 'text/markdown': 'There are 10 **elements** '}, 'metadata': {}, 'transient': {}}, 'metadata': {}}
ipdb> p buffers
[]
ipdb> p to_send
[b'display_data', b'<IDS|MSG>', b'24f5f5b435815a230e8208664440b11f8698406608c55f7f7d68d615091469b9', b'{"msg_id": "3483f1bc-056fb71471d9ea57e18f4859_8115_233", "msg_type": "display_data", "username": "jupyter", "session": "3483f1bc-056fb71471d9ea57e18f4859", "date": "2021-09-02T08:25:43.310540Z", "version": "5.3"}', b'{"msg_id": "cc2337816f1a4881960342d01f63e29e", "username": "username", "session": "0824348cbbc049da90301820dc611574", "msg_type": "execute_request", "version": "5.2", "date": "2021-09-02T08:17:10.301038Z"}', b'{}', b'{"data": {"text/plain": "<IPython.core.display.Markdown object>", "text/markdown": "There are 20 **elements** of both `the_x`, and `the_y`;\\n\\nThose values are plotted on the diagram on the left"}, "metadata": {}, "transient": {}}']
840 tracker = DONE
--> 841 stream.send_multipart(to_send, copy=copy)
> /home/jupyter/Jupyter/notebook/lib/python3.8/site-packages/ipykernel/iostream.py(271)send_multipart()
271 def send_multipart(self, *args, **kwargs):
272 """Schedule send in IO thread"""
--> 273 return self.io_thread.send_multipart(*args, **kwargs)
> /home/jupyter/Jupyter/notebook/lib/python3.8/site-packages/ipykernel/iostream.py(218)send_multipart()
--> 223 self.schedule(lambda : self._really_send(*args, **kwargs))
ipdb> p self._really_send
<bound method IOPubThread._really_send of <ipykernel.iostream.IOPubThread object at 0x7f949d4012b0>>
ipdb> b self._really_send
Breakpoint 2 at /home/jupyter/Jupyter/notebook/lib/python3.8/site-packages/ipykernel/iostream.py:225
Here is where it gets tricky - you can set a breakpoint in _really_send
, but it won’t break, likely because of the lambda
there. But, just for the record, here is how the break looks like in self.schedule
:
> /home/jupyter/Jupyter/notebook/lib/python3.8/site-packages/ipykernel/iostream.py(206)schedule()
211 if self.thread.is_alive():
--> 212 self._events.append(f)
213 # wake event thread (message content is ignored)
214 self._event_pipe.send(b'')
ipdb> p f
<function IOPubThread.send_multipart.<locals>.<lambda> at 0x7f946ea214c0>
ipdb> p self._events
deque([])
ipdb> p self
<ipykernel.iostream.IOPubThread object at 0x7f949d4012b0>
ipdb> p self._event_pipe
<zmq.Socket(zmq.PUSH) at 0x7f949d410e20>
ipdb> n ################ HTML was output at this point!
So, as soon as self._events.append(f)
executes, where f
is a IOPubThread.send_multipart
lambda function - I get the HTML rendered.
So, unfortunately, I’m still none the wiser of where does this Markdown → HTML conversion occur; or if it occurs in Python at all?!
In any case, it can be seen that _really_send
calls send_multipart
; the problem is which send_multipart
- in my Jupyter installation there’s 7 instances of def send_multipart
Jupyter/notebook/lib/python3.8/site-packages/ipykernel/inprocess/socket.py:32
Jupyter/notebook/lib/python3.8/site-packages/ipykernel/iostream.py:218
Jupyter/notebook/lib/python3.8/site-packages/ipykernel/iostream.py:271
Jupyter/notebook/lib/python3.8/site-packages/zmq/_future.py:240
Jupyter/notebook/lib/python3.8/site-packages/zmq/green/core.py:271
Jupyter/notebook/lib/python3.8/site-packages/zmq/eventloop/zmqstream.py:253
Jupyter/notebook/lib/python3.8/site-packages/zmq/sugar/socket.py:543
I’m suspecting it is this one ./lib/python3.8/site-packages/ipykernel/inprocess/socket.py
:
def send_multipart(self, msg_parts, flags=0, copy=True, track=False):
msg_parts = list(map(zmq.Message, msg_parts))
self.queue.put_nowait(msg_parts)
self.message_sent += 1
… but still, there is no mention of markdown
in the zmq
library, so I doubt any Markdown → HTML conversion happens here…
Actually, I found a comment in ./lib/python3.8/site-packages/jupyter_nbextensions_configurator/static/nbextensions_configurator/render/render.js
:
var render_markdown = function(md_contents, relative_url_root) {
var div = $('<div>');
// the bulk of this functon is adapted from
// notebook/js/textcell.Markdowncell.render
// with the addition of code to absolutify relative href/src attributes
… and since I’m discussing markdown code cells here, this is likely handled in ./lib/python3.8/site-packages/notebook/static/notebook/js/notebook.js
:
/**
* Re-render the output of a code cell.
*/
Notebook.prototype.render_cell_output = function (code_cell) {
var cell_data = code_cell.toJSON();
var cell_index = this.find_cell_index(code_cell);
var trusted = code_cell.output_area.trusted;
this.clear_output(cell_index);
code_cell.output_area.trusted = trusted;
code_cell.fromJSON(cell_data);
};
… but still, nothing specific about markdown to HTML conversion.
EDIT: Ok, I think I finally have it confirmed, that the Markdown → HTML conversion of a Python statement like display(my_md)
in a code cell, actually happens in JavaScript - more specifically, in the append_markdown
function, in
Jupyter/notebook/lib/python3.8/site-packages/notebook/static/notebook/js/outputarea.js
I guess, that same file is here in the source:
You can set a JavaScript breakpoint in the browser on the first line of append_markdown
function, re-run the cell with display(my_md)
- and it will break, and you will be able to see both the original Markdown text, and the resulting HTML - which is a result of a call to JavaScript markdown.render
function.
Ultimately, I just realizes that these messages have a tag (self.topic
) which here was display_data
; so I just looked up display_data
in the .js
files of my Jupyter installation - and found it, also in the file mentioned above:
...
OutputArea.prototype.handle_output = function (msg) {
var json = {};
var msg_type = json.output_type = msg.header.msg_type;
var content = msg.content;
switch(msg_type) {
...
case "update_display_data":
case "display_data":
...
… which was a clear sign, that JavaScript takes over at least a part of the rendering of the page; and then I kind of luckily stumbled upon append_markdown
, and could confirm it with a breakpoint.
So, it’s great that now I know what is going on - not so great that I cannot use this Markdown->HTML conversion in Python; and it would have been nice, because this conversion also includes MathJax rendering, which I’d need in this case (otherwise I’d have just written HTML directly, and not bothered so much to see what’s going on).
Oh well…