Jupyter Notebook connecting to existing Spark/Yarn Cluster

I was mostly going trough the issue, and noticed some other requirements. Having said that I have not actually validated these.

Jupyter Notebook today only supports running Spark on YARN client mode.

In Client mode, the Spark Driver (which is responsible for task scheduling/management and management of data aggregations/shufflings) runs locally in your pod, while the workers are created on the spark cluster.

If you want all the Spark processing to be executed in the cluster, then you want to use Spark on YARN Cluster mode, in which case the kernels will be remote and you must use Jupyter Enterprise Gateway to enable remote kernel lifecycle management.

This page can give you more details about the Spark Driver.

This stack overflow post also explains a little more about Client versus Cluster mode.