How do we Multi-language (computing lang) support in jupyter notebook?

I have used Jupyter notebook for more than a year and found they are great for learning by doing and showing what we have done. As part of this process, i stumbled upon inflexibility to include codes of C++ and python at same time. Technically speaking “cells” we include are of either c++ or python, and not both in same notebook. Is there way to get around this limitation? I am open to contribute back if this is desired behavior.
You can imagine a use case with AI/ML domain including C/C++ snippets at host, python for ML and minizinc for constraint solving in same notebook as single problem solving.
Thank you.

These are solid critiques of the existing system.

At present, choosing a “host” language, such as ipykernel and extending it with custom magics, or a polyglot kernel like Calysto metakernel or allthekernels is currently the most-supported option. This approach works within the existing system, and is portable between multiple clients and tools, given the higher complexity of managing two, potentially entirely unrelated, kernel environments.

Even representing a polyglot notebook is a challenge at present, as the notebook format would need to duplicate much of the required notebook-level metadata at the cell level, mirroring the problem of custom markup languages (such as not-widely-adopted, or even dynamic, extensions to markdown). Some initial work is ongoing to support this (and more), as a major re-thinking of what a “cell” is, starting with more standards alignment work.

Adding multiple kernel sessions, managed by the client and/or server, as part of a single document, is a much deeper endeavor, as it would touch many more parts of the architecture, and exposes the inevitable thorny issue of sharing data between kernels. The ideal would of course be a a not-horribly-inefficient-out-of-the-box way, as opposed to e.g. serializing everything to a lowest-common denominator format like JSON.

Oddly where this might occur soonest would be in the WebAssembly/JupyterLite space, where there is ongoing upstream specification work to make this feasible, which, once widely implemented, would allow proper, standards-based interchange of data… but for the specific use case, self-hosting gcc or clang in WASM is its own challenge.

3 Likes

@bollwyvl Thanks for custom magics resource. It works perfectly learning, may not be scalable for production.
appreciate your help.

Well, I suppose that wasn’t the initial question… it all depends on the meaning of “production,” for a specific use case. If “production,” is “one analyst working with production data in a managed JupyterHub setting with locked down, managed dependencies,” if it runs, it is then by definition production.

In other settings, Notebooks-with-hot-kernels are a rather peril-fraught way to get to robust, performant, reproducible outcomes in the loop with users-at-scale, HPC workloads, ETL jobs or cron… the more exotic a setup one needs (either multiple kernels, magics, whatever) and the more complex the interplay between packages not managed at a distribution level (e.g. yum, apt, nix or conda) with an overall system delivery mechanism (containers, VMs, etc.), and continuous supply chain vulnerability scanning and abuse mitigation, the greater the likelihood that someone’s pager will go off in the middle of the night because someone shipped a breaking change on an npm package.

Using notebooks as a way to prove out and document concepts, and then transitioning the embedded thought process to well-packaged, tested, versioned software artifacts, either by extraction, or at-rest (a la importnb) with a minimum of magic can reduce the dependency tree substantially, and the risks implicit in cargo-culting, copy-paste, debug-by-print, and out-of-order execution, all of which the notebook mindset joyfully embraces.

1 Like

@bollwyvl I have a use-case where a single session will be used for multiple languages. In-fact my original language is python but I will use the same database session used in python to execute SQL code. Very similar to Databricks.

How can I do it today? If I mention python in my kernel spec then frontends like Jupyter for VS Code does not show “SQL” as an option in cell language selector.

How can I do it today?

Please help us understand more what it is in this case.

1 Like

What I want to acheive
SQL and python in the same notebook handled by the same kernel.

Method 1
Jupyterlab already has dropdown to select markdown, code, text. If an extension can provide its own dropdown options then has option to preprocess cell content. Then I can wrap the SQL cell content into db.sql("<SQL cell content here>").to_pandas() upon execution and acheive this with ipykernel.

Method 2
I can create my own wrapper kernel around ipykernel that handles multiple languages given the wire message contains language info.

Background
I have this tool which is completely custom for now. User can open their CSV or parquet files and run SQL queries on it within browser using wasm, no backend. But jupyter wasm already does 99% of the work for me much better. So I want to avoid redoing so much. I am happy to contribute if there is a plan and approach.

https://lab.ducklake.io/