The Enterprise Gateway project allows you to spawn kernels in a number of different platforms for distributed computing.
What exactly is the enterprise gateway, in simple terms? What scale must a business be at to benefit from one?
Is it a way for companies to connect multiple servers and have services be accessible from one log-in window? A centralized place to manage access to applications/data? Is it a way to avoid moving data? When do I need one?
Who is responsible at an organization for setting it up? How does it differ from a docker or kubernetes deployment of Jupyterhub?
I tried reading the docs after seeing the announcement, but it’s not really clear to me (perhaps because I’m not the intended audience, but I’d like to help work on language that introduces facets of the Jupyter ecosystem in clearer terms with fewer presumptions of background knowledge).
At a high level, Enterprise Gateway facilitates the ability to run notebook kernels across a compute cluster. This distribution occurs by leveraging the underlying resource manager to determine where a kernel should run. As a result, the Notebook server is no longer susceptible to resource exhaustion because kernels no longer run local to the Notebook server.
This configuration is beneficial to organizations with large compute clusters where a given data scientist/analyst requires multiple simultaneously-active notebooks, that, previously, may have combined to consume the resources of that particular notebook server. In addition, specialized servers with GPUs and other high-end compute configurations can be targeted for only kernel activity.
By installing the NB2KG server extension on the Notebook server, all kernel management is redirected to the Enterprise Gateway server, which then utilizes a pluggable architecture to spawn, locate, and manage the life-cycle of kernels across the compute cluster. Enterprise Gateway currently supports Hadoop YARN, Kubernetes, Docker Swarm, Dask YARN, IBM Spectrum Conductor resource managers, along with a simple, round-robin distributed mode that uses SSH to accomplish the kernel remoting.
The cluster administrators would likely be the most common role for configuring the Enterprise Gateway installation.
Although there is quite a lot of information in our documentation, we completely agree that it could use better organization and would like to break it down to role-based topics for administrators, data scientists, and other stake holders. We would greatly appreciate contributions in this area, as well as others.
Sounds great @choldgraf - thank you.