I want to integrate AI into the GoNB kernel – for Go language in a manner more ergonomic than copy&pasting from a window to the kernel. Also so that I can integrate the current content of the notebook (or parts of it) as a context.
I’m wondering how to integrate with JupyterLab, a few ideas comes to mind:
(A) Using AutoComplete
I could create a special prefix, something like “%ai:create a function that does X” and whenever the user asks for auto-complete, the kernel would call the LLM server and offer an auto-complete with the generated code. Two problems with this:
LLMs, specially if ran locally, can be slow, and would lock the notebook (I think)
It could be too large an autocomplete ?
(B) Special Command
I could create a special command (like the ! to execute a bash command in the python kernel), with my ai question, and when the cell with it is executed, the output of the LLM (with the generated code) would automatically be copied to the clipboard buffer ? (I suppose the kernel could output some fancy javascript that would achieve that in the clients browser).
(C) Contextual Help
Similar to (A) but instead be activated with Control+I (open contextual help tab), and use the question under the cursor as input to the LLM. Similar concerns on being slow, and would still require a copy&paste from the user.
Any other ideas / suggestions come to mind that would be cool to have in a notebook ?
I should have investigated more before asking, the Jupyter AI plugin implements it for Python, and provides lots of ideas – although it requires a plugin, instead of being implemented in the kernel.
Any comments (what works what doesn’t) by folks using it are also very welcome
(1) not if you run it in a subprocess or a separate thread. You would need to have some special kernel message for, possibly on the control channel
(2) you can use inline completions API to display multi-line suggestions if this is what you are asking about. In fact the only implementation which will ship in JupyterLab 4.1/Notebook 7.1 queries kernel for history to provide inline completions, see https://github.com/jupyterlab/jupyterlab/pull/15160
Yes, definitely the LLM runs on a separate process, and in the kernel it’s trivial to run the request in a separate thread (or goroutine). But still, when the user presses “tab” (for an autocomplete) until the reply comes back, it may take 10s of seconds … (even more if slow computers without a GPU). So it would be nice if the user can keep doing other stuff in the notebook, while the LLM is generating the code.
What would be the control channel message you suggested that can be sent asynchronously as an autocomplete ?
This sounds cool! But looking at the API I didn’t understand: does one need to implement an extension in TypeScript and register an IInlineCompletionProvider and communicate oneself with the Kernel, or can one do it solely in Kernel – presumably with some extended message protocol for that functionality, a variation of complete_request / complete_reply ?
Sorry, I’ve never worked with TypeScript before, so I’m trying to get away without having to learn another ecosystem
I was browsing through krassowski/jupyterlab-transformers-completer, but there seems to be no Python code, so I’m assuming it’s purely a TypeScript extension, with no Kernel interaction. For GoNB, probably I need the context that is stored in the Kernel only, that’s why I mentioned above that there needs to be some communication with the Kernel…
But looking at the API I didn’t understand: does one need to implement an extension in TypeScript and register an IInlineCompletionProvider and communicate oneself with the Kernel, or can one do it solely in Kernel – presumably with some extended message protocol for that functionality, a variation of complete_request / complete_reply ?
Sorry, I’ve never worked with TypeScript before, so I’m trying to get away without having to learn another ecosystem
You could use JavaScript to Seriously though, while I can imagine a potential for having a new dedicated kernel message which would be supported out of box, I am not sure if there is consensus that the AI should reside in the kernel. If you would like to take this further, it might be worth discussing it as a jupyter enhancement pre-proposal (an issue on GitHub - jupyter/enhancement-proposals: Enhancement proposals for the Jupyter Ecosystem).
). But still, when the user presses “tab” (for an autocomplete) until the reply comes back, it may take 10s of seconds … (even more if slow computers without a GPU).
This is why:
streaming is implemented, compare GIF on https://github.com/jupyterlab/jupyter-ai/pull/465 where even the OpenAI models take 3-4 seconds to complete the reply when it shows all at once, compared to the streaming mode when user instataniously see the proposed tokens as they are generated (so there is no feeling of waiting 10 seconds, if you do not like what is generated you continue typing or switch to next suggestion).
the implementation proposed in jupyter-ai is not tied to the kernel (which means it can run on a dedicated machine/your own cluster) - although it also technically can run on the same computer as the jupyter server (which I tried and it works too),.
Thanks again for all the links Michal! I’ll look into TypeScript, the Jupyter framework, the jupyterlab-transformers-completer example.
The LLM connection I think is also not very complicated – just very costly/slow. I was planning to use Ollama for now, assuming things will run on the local machine. But later I could connect to OpenAI/Google/other providers. The more complicated part here is how to decide what context to provide (which cells code already provided) to allow the model to do a good prediction.
Recently I open sourced a project a little bit similar to Copilot or Cursor under jupyterlab environment. But it does not have fast code completion. It has more precise control over cells an LLM chat completions. Please check it out.