I’m trying to understand the JupyterLab architecture. One thing that I’m currently unclear on is the actual processes and HTTP servers involved. I see both Typescript code and Python code in the repo, and mention of Tornado in the Python code. Is there both a node.js HTTP server and a Tornado server in a running JupyterLab instance?
The server side is written in Python. The main server code (as of the upcoming JupyterLab 3.0) is here:
This is a good question and one that really could use some fleshing out in the documentation. I hope others chime in here if this needs improvement or clarification.
When you launch a jupyter web application (in this case
jupyter lab) one process is started. It is important to note that a jupyter web application uses a client/server model and encompasses several components* (the user-interface, web server, APIs and kernel). JupyterLab is the main process and everything runs as part of it except:
- The kernels (managed as sub-processes of main)
- The browser running the jupyter notebook UI
Within this main process, a web server (tornado) is started. The package containing code for this is jupyter_server (a dependency of Jupyter Lab). Aside from starting the web server, jupyter_server provides the APIs and core services to applications (e.g., get contents of a directory, rename a file, get active kernels, get active sessions, etc). The UI calls these APIs on the user’s behalf and most users never know it exists.
Opening a notebook document starts a kernel as a child process of the jupyter-lab main process. Each kernel (notebook) is its own process. If you have 3 notebooks open, you have 1 main process and 3 kernels/subprocesses running. This is referred to as the two-process kernel architecture. The kernel is responsible for running code depending on type of notebook (e.g., python, R, Julia). The main process uses packages jupyter_server and jupyter_client to establish communication between the browser and kernel over websockets. These websockets allow the UI to react to updates from kernel (e.g., after kernel code has executed or an error has occurred) and issue commands from browser to kernel (e.g., run code in cell).
*Sometimes the user-interface, web server and APIs are collectively referred to as the “front-end” in the documentation where Jupyter is just one of many front-ends that can interact with the kernel. In this context, the kernel is a separate entity and not part of the jupyter web application but rather managed by jupyter.
Node.js is used for JupyterLab extensions but I am not sure if it actually starts an http server. I don’t think so but I could be wrong.
- 1 main process (started by
jupyter lab) that starts the http server (tornado).
- 1 sub-process for each kernel started
- 1 process for the browser
The browser talks to the the http server. The server issues commands to the system (e.g., start a kernel or read directory or save file). When a notebook (kernel) is opened, a new process is started for that kernel and the server sets up a websocket so that the browser and kernel can talk to one another directly.
Jupyter uses python for the server, and the typescript that you see is generally sent over to be run on the browser.
One thing that makes this messy is that it turns out that because jupyterlab is nicely modular, you can replace one piece with something else. For example, the folks at Xeus have written a kernel that’s entirely in C++ that plugs into the rest of the infrastructure.
The Lab frontend is written in typescript, and so needs to be compiled when building a new version of the JupyterLab. node.js is used to build/compile the frontend, but is not used during regular running of JupyterLab.