Single-user setups for local client & remote kernel

Apologies for the link formatting, I am only allowed to put 2 links in my post.

My requirements

I am interested in a setup where I use Jupyter kernels on a remote/headless system (or in a Docker container, or both)). I would like to interact with these kernels using a local-based client such as Jupyter Lab, Jupyter Console, or Jupyter-compatible code editor like VS Code or Dataspell.

I understand that Jupyterlab can be hosted on the same remote system, but the other clients cannot be hosted. I would prefer to use my own Jupyterlab instance anyway, with all of the plugins and settings that I like to use.

Ideally, I would also be able to disconnect and reconnect at will, so that my connection to the kernel can be restored if I need to turn off my PC, or if my network connection drops.

What options currently exist to address this problem? What options are in active development that are expected to become available in the near future?

Simple solution: static kernel ports and static connection file?

The easiest solution to this problem would be the ability to start a kernel using static, user-provided ports, instead of randomly selecting ports from a range. That way I can control exactly what ports I need to make available through SSH tunnels, and I can control the name of the connection file so that it can be fetched automatically. Is this currently possible with Jupyter?

Provisioners?

I am aware of the “provisioner” system (jupyter-client.readthedocs /en/stable/provisioning.html) that is in development, but it is still in the proof-of-concept development stage. Can I solve this problem with the provisioner system?

Even if I have to do some hands-on yak-shaving hacking, I am willing to put in the effort if it will benefit the community to have a simple provisioner implementation. I am sure I’m not the only one with this use case, so hopefully any work I put into this area would result in benefit for many people in the community.

Jupyter Kernel Gateway

I am also aware of Jupyter Kernel Gateway (jupyter-kernel-gateway.readthedocs). But it looks like a standard Jupyter protocol client is not able to interact with a Gateway-provided kernel, because the protocol is not the standard Jupyter protocol. Jupyter Notebook specifically has special support for Gateway connections (jupyter notebook --gateway-url=...), but that isn’t the client I want to use.

Totally DIY: SSH, SCP, parsing logs

As a last resort, I can write a script to start a kernel over SSH, parse the name of the connection file from the logs (ugh!), and then copy the connection file to the local system. But even then, Jupyter Lab does not yet support connecting to an existing kernel. This seems to be what the 5-years-unmaintained remote_ikernel (pypi:project/remote_ikernel) tool attempted to do, and perhaps the even-older rk (github:korniichuk/rk) tool. I remember trying to work with a setup like this around 2018, and the experience was generally so poor that I gave up and used Jupyter Notebook on the remote system.

2 Likes

Hi @creepingphlox - thanks for your question and welcome to the community!

Your depiction of the landscape in this area is fairly complete (although I’m sure there are other options given the breadth of this ecosystem).

I am aware of the “provisioner” system that is in development, but it is still in the proof-of-concept development stage.

Just to clarify, the Kernel Provisioner framework available from jupyter_client is not a proof-of-concept and available since the 7.0 release. The quoted reference was wrt some remote provisioners that are direct descendants for Jupyter Enterprise Gateway’s process proxies and from which the inspiration for kernel provisioners was derived. As a result, you might checkout EG. It doesn’t currently support provisioners, but will eventually. Out of the box, EG provides Kubernetes, Docker, Hadoop YARN and (essentially) SSH process proxies to spawn remote kernels.

Can I solve this problem with the provisioner system?

If you wanted to take the “provisioner” approach (which is in line with where we want to head), Kernel Gateway would be the mechanism to use, but you’d either want to bring the aforementioned remote provisioners up to date (although what’s there should work to some degree) or implement your own provisioner to do what you want to do.

Jupyter Notebook specifically has special support for Gateway connections (jupyter notebook --gateway-url=...), but that isn’t the client I want to use.

Jupyter Server (which was forked directly from Notebook’s server code and used by JupyterLab >= 3) supports the same --gateway-url (and associated configurable) options relative to using a “Gateway” (Kernel or Enterprise), but you should also know that, when configured, local kernels are not available. Only kernels hosted on the gateway server are managed - albeit, you could spawn kernels local to that server.

2 Likes

Thanks for the reply!

Kernel Provisioner & Kernel Gateway

Thanks for clarifying the readiness state of the provisioner system. Unfortunately (and sorry if I’m just being dense) I don’t think I understand how it helps to use it with Kernel Gateway.

My goal is to be able to use arbitrary Jupyter clients running on my own PC, connected to headless Jupyter kernels running on some other device. I would prefer not to be limited only to the clients that support the Gateway protocol, since that erases a lot of the freedom and flexibility to use arbitrary clients.

In browsing the provisioner docs, I got the impression that you’d be able to build “remote management” tooling with it: starting, stopping, listing, and reconnecting to kernels running on a remote machine. Am I mistaken?

Jupyter Server

I browsed the Jupyter Server docs, but it doesn’t seem like it solves the immediate problem I am facing.

Are you saying that a Jupyter Server can act like a “plain” Jupyter kernel from the perspective of other clients? That would be pretty cool, and would definitely solve my problem. Is the intention that the Gateway protocol become the new standard for remote connections, leaving the old Jupyter protocol for local-only connections? If so, some kind of relay server would seem like an essential part of the ecosystem.

Or did I completely misunderstand?

Or did I completely misunderstand?

You’re doing fine. This is a complicated area of the ecosystem since it doesn’t natively support remote kernels. I hope this response may fill in some gaps.

I don’t think I understand how it helps to use it with Kernel Gateway.

The decision as to whether or not a Gateway server (Kernel or Enterprise) be used is a function or how the provisioner (or process-proxy in the case of Enterprise Gateway) works, along with the requirements for where the client application (Jupyter Server or Notebook server running with a front-end, nbclient applications, etc.) needs to reside. For example, if the remote kernel runs in its own Kubernetes pod, remote from the server, yet the server needs to reside on the user’s desktop, then you’d want to configure a Gateway that is running within the K8s cluster since that particular provisioner (process-proxy) requires that the “launching server” (i.e., the Gatewsy server) reside within Kubernetes.

Other remote provisioners may not impose these kinds of restrictions and could be leveraged without additional configuration settings outside of those required by the provisioner (process-proxy).

I would prefer not to be limited only to the clients that support the Gateway protocol, since that erases a lot of the freedom and flexibility to use arbitrary clients.

The Gateway server doesn’t impose any different protocol. It merely exposes the same kernel (and kernelspec) API endpoints that Jupyter Server (Notebook) exposes. That said, Enterprise Gateway will recognize other items within the kernel’s start request body, that enhance the EG’s behavior.

To provide some background, Jupyter Kernel Gateway was created to detach the user’s notebook server from a compute cluster. However, JKG only runs kernels local to itself - just like the rest of the Jupyter ecosystem. Enterprise Gateway introduced the notion of process-proxies - that enlist the services of resource-managed clusters to manage a kernel’s life cycle where the kernel’s execution (and resources) is remote from the EG server. It accomplished this by extending the only thing it could at the time - the KernelManager class hierarchy - which limits its utility for other aspects of the ecosystem. Kernel provisioners essentially make EG’s process-proxy functionality available to the rest of the Jupyter ecosystem (and enable the development of others) by plugging them into the base KernelManager (in jupyter_client).

I hope this helps in some way.

The Gateway server doesn’t impose any different protocol. It merely exposes the same kernel (and kernelspec) API endpoints that Jupyter Server (Notebook) exposes.

Aha, I think this was what I was missing.

So I can run Jupyter Kernel Gateway on the remote machine where I want to run my Jupyter kernel. And then from my local system I can send a request to POST /api/kernels (as per the API spec) to start a kernel. So JKG takes care of the “remote management” aspect (starting, stopping, etc).

It seems like Enterprise Gateway further extends this system so that you can run the kernel and the gateway on separate machines, but I don’t need that right now.

The missing piece now (I think) is that I don’t see how to actually get the connection info for a kernel that is managed by JKG, which would still be necessary in order to actually connect the client to the kernels managed by JKG. I didn’t see any mention of this specifically in the JKG docs.

Also, is it fair to describe “Jupyter Server” as a “kernel management” tool? The association with Jupyter Notebook/Lab made me think that it was some kind of Jupyter client that also acted as a proxy/relay for other Jupyter clients.

“Jupyter Server” is Jupyter Notebook without a front end bundled in its installation. That is, it does not include a “client” web application and is merely a web server. Jupyter Lab, and its various extensions, utilize the REST API exposed by Jupyter Server. “Jupyter Server” is more than kernel management, in that it also provides content services for maintaining notebook files within a directory structure and exposes a Session manager through which most interactions occur.

“Gateway Servers” (Kernel or Enterprise) only expose the “kernel management” REST APIs. So, in that sense, they are considered a “kernel management” tool. They do not provide content services.

The missing piece now (I think) is that I don’t see how to actually get the connection info for a kernel that is managed by JKG, which would still be necessary in order to actually connect the client to the kernels managed by JKG. I didn’t see any mention of this specifically in the JKG docs.

A kernel’s connection information is maintained by the “launching server”. So, in the case of a gateway configuration, where the gateway is remote from Jupyter Server, the connection information is not directly available to Jupyter Server. However, in this case, Jupyter Server’s KernelManager is configured to proxy requests and gather results to/from the Gateway, thereby allowing the various client applications of Juptyer Server to use remote kernels.

In what sense is the kernel’s connection information “necessary in order to actually connect the client to the kernels managed by JKG”? Are you referring to client applications that are not browser-based? Can you provide concrete examples of such applications?

Exactly this. I am thinking of Jupyter Console for one thing, as well as other third-party Jupyter protocol clients such as nvim-ipy.

Exactly this. I am thinking of Jupyter Console for one thing, as well as other third-party Jupyter protocol clients such as nvim-ipy.

If by ‘Jupyter Console’ you’re referring to the console tiles on the Juptyer Lab launcher that are 1:1 with the kernel tiles, yes, those are fine since they leverage the KernelManager hierarchy that the kernel tiles use.

I’m unfamiliar with nvim-ipy. Generally speaking, if the application uses jupyter_client to launch a kernel, then it’s a candidate to support remote kernels via provisioners. However, that application’s location relative to the kernel is a function of the provisioner’s requirements. Since Gateway integration is essentially only available for the web-based applications residing on Jupyter Server [*], it really isn’t an option to help with keeping the “client application” on the user’s desktop.

[*] There may be another option, depending on how the application is configured, but I don’t want to mention that here as this topic is already too complicated.