Multilingual Jupyter Notebook by maintaining separate Markdown text

Hello there,

I wonder whether anybody has come accross some kind of plugin/extension/… for Jupyter Notebooks that would allow Markdown cells to be in several languages, e.g. English and German? When I google that, I only get responses for several programming languages, not natural languages. With a switch on the top the reader could change between the two (or even more) languages as it is common practice for many webpages. Such a feature would make it much easier to use Jupyter Notebooks in mixed groups where people have different mothertongues. Currently I maintain several Jupyter Notebooks in parallel - the programming code is the same, the Markdown cells differ.

Best wishes

2 Likes

I imagine that there could be multiple cells, each with a different language assigned in metadata and then a filter extension that shows only markdown cells matching chosen language. I believe it should be very easy to create such an extension for JupyterLab (adding a toolbar language switch button that would also include “Show all languages” option). The hiding of cells would occur by iterating notebook cells and adding them a CSS hidden class.

I would be happy to help if you decide to create such an extension.

1 Like

That sounds like a very concise plan and the proposed implementation should allow a very suitable fallback behavior. I am not an experienced JupyterLab extension developer, so please don’t expect too much from the first draft. If you, @krassowski, currently have capacities and a design idea, I call no dips on that idea. Currently I am rather busy but in the middle of March I could dive into the topic of how to write a JupyterLab plugin.

1 Like

I’m a busy too with my weekends taken by another extension, but I would be happy to help if you encounter any issues. Feel free to ping me on GitHub.

1 Like

@1kastner I have been redirected here for an issue I created today on Github: View and render markdown content based on locale · Issue #12375 · jupyterlab/jupyterlab · GitHub. It is similar to the goal of your suggestion. What do you think? Any pointers would also help.

1 Like

An interesting avenue to pursue would be adding this as an extension to jupyterlab-markup and then a similar plugin for nbconvert (and/or myst).

When writing, one could imagine all of the languages present in the source, rather than messing with metadata.

Alternately, using classic .po-based tools might be a better play: these are already in use for internationalizing jupyter software and documentation, and have version-control-based tooling. Reducing the barrier to entry for enabling these file-based workflows would be good across the board.

Expanding internationalization into more parts of interactive scientific computing might be a rich part of an (existing, or in progress) funding proposal, for someone so motivated…

3 Likes

To be honest, I did not pursue this idea further and the two notebooks (just like you describe in the GitHub post they only different regarding the language of the markup text) actually deviated over time because they belonged to different seminars/… so that also the content slightly changed. Yet, I am still a big fan of the idea!

I not not fully understand what the plugin jupyterlab-markup provides. The explanation on GitHub is scarce and there is no documentation link. Why should exactly this plugin be extended?

The road of .po-based tools looks promising to me, especially if you say that they are already in use! I have heard about it but have no experience with these tools myself. From your description, it just seems like we would give up having one central Jupyter Notebook but instead the translation would be stored in another file, typically edited by some other software. This would somehow harm the end-user’s experience of “everything is in UI” - I could not quickly edit my markdown texts. This would be especially harmful when using some of the features like links, HTML code, etc. To re-establish the principle of “everything in one UI” and integrate the editing of .po files into JupyterLab, quite a lot of coding would be required…

explanation on GitHub is scarce

Try the binder! An example of a downstream extension of it is jupyterlab-myst.

Why should exactly this plugin be extended?

If only concentrating on markdown (and not e.g. code comments), that plugin offers an alternative markdown renderer to marked, used everywhere in JupyterLab, called markdown-it. Unliked marked, markdown-it is much more extensible (though there are still some challenges).

Out of the box it already supports things like mermaidjs, which is now supported by GitHub-flavored markdown.

In this approach, translations would appear as close to its neighbor content as possible, inside the same field of the notebook format. The idea would be to offer a new markdown syntax (or ideally reuse an existing one, but I don’t know of one), e.g.

```{@en title}
# Hello World
```

```{@de title}
# Hallo Welt
```

The extension would then offer rendering the blocks only of a specific language(s) based on user preference, or offer like-named language blocks as e.g. tabs.

But now there’s a new syntax that no other tooling understands.

This syntax could then also be implemented in nbconvert, or markdown-it-py so that this behavior could get out to a sphinx site with myst-nb or the whole jupyter-book contraption.

one central Jupyter Notebook

keeping the information “close,” but not inside the same field, I actually explored implementing a prototype intermediate solution back in classic, using cell metadata. Unfortunately, it was not open source… but worse yet, was also not based on any standard.

This required a bunch of metadata to work well (e.g. keeping track of the source that had been translated for each language) and didn’t work very well with revision-control-based workflows, and probably wouldn’t work very well with the CRDT-based stuff unless the internationalization metadata scheme became a well-known notebook format (like kernel_info).

stored in another file

The advantage to this approach is… it already exists! By knowing identity of the whole file, and how to extract what needs to be translated given the entire document, one could extract markdown, code comments (with knowledge of the kernel) and a raft of other aspects, and be able to know exactly what needed to be changed.

some other software

Right, the idea would be to enable the editing of these files, in the right context, in JupyterLab, which is much better at doing more than one thing at a time.

There are a number of existing pieces of software from which to draw inspiration, and this would result in something that improved the usefulness of JupyterLab for more than just writing special syntax in markdown.

If this worked well with jupyterlab-git and jupyterlab-pull-requests, and even jupyterlite, then there would already be a semi-portable workflow for doing doing translation work and looking at history.

For example, we currently rely on a SaaS solution to manage the effort for internationalizing JupyterLab… this is great for a big open source project, but would not scale to thousands of individual groups working on individual notebooks.

3 Likes

Yes, this is what I am looking for. Support for multiple languages in a single markdown cell. I do believe that some of the existing libraries supporting *.po could be used.