Hi all!
Over at Wikimedia Foundation, we’ve got a project we’re working on called Wikifunctions as part of Abstract Wikipedia, aiming to create a library of user-written functions which will be callable from Wikipedia page markup. This is intended to pull data from our Wikidata RDF store or user-contributed data files, process it, and format to suitable table layouts or natural-language text fragments.
I’ve written up an early proposal on building this around Jupyter kernels, which I think would be a good way to balance our requirements for isolation between separate functions, multiple language implementations, and scaling & security without reinventing the wheel on scripting language execution services.
There are a number of performance trade-offs where we get increased speed by making it explicit that functions can be non-deterministic in some ways (such as a function implemented in a language with mutable state returning different results on subsequent calls), but we can keep separate functions from polluting each other’s environments by spinning each function in the call graph up on a separate kernel.
I’d love to hear some feedback from folks more familiar with the low-level kernel interfaces and implementations.
This would involve spinning up a lot of running kernels for a short amount of time: each function used in each render session will spin up its own interpreter context for the duration of the session. A typical article render should last a fraction of a second, might call a few dozen separate functions (potentially this could grow) and invoke calls a few hundred or few thousand times (potentially this could grow as heavier use of the function system is made).
In particular we could potentially get a big benefit from running multiple kernels in one process & thread, so they can make synchronous calls to each other without a context switch. But my impression is that the current kernels are designed to do async communication over a socket, and don’t implement a direct-call API, so this would be “hard”. It also would reduce safety isolation between kernels, and make it harder to spin up standard Docker images as the kernel runtimes.
I think I can reduce the overhead of socket communications by co-locating all kernels in a call session on the same server, which involves a context switch but no network latency. Serialized arguments could be transferred in a shared memory segment to reduce copy overhead.
For debugging sessions and generating stack traces we’ll need to have a “meta-kernel” that can be an intermediary for the Jupyter debugging protocol, sending data requests and commands on to the appropriate kernel implementing a given level of the call stack.
We’re still not decided on this, but it looks very promising and I’m hoping to research it further.
Thanks for any feedback you may have!