Jupyter Notebook connecting to existing Spark/Yarn Cluster

lresende · March 22, 2019, 2:10am

I was mostly going trough the issue, and noticed some other requirements. Having said that I have not actually validated these.

Jupyter Notebook today only supports running Spark on YARN client mode.

In Client mode, the Spark Driver (which is responsible for task scheduling/management and management of data aggregations/shufflings) runs locally in your pod, while the workers are created on the spark cluster.

If you want all the Spark processing to be executed in the cluster, then you want to use Spark on YARN Cluster mode, in which case the kernels will be remote and you must use Jupyter Enterprise Gateway to enable remote kernel lifecycle management.

This page can give you more details about the Spark Driver.

This stack overflow post also explains a little more about Client versus Cluster mode.

Topic		Replies	Views
Help running spark jobs on a cluster that is external to K8 Zero to JupyterHub on Kubernetes help-wanted	2	756	October 30, 2024
Spark Client Mode Integration Zero to JupyterHub on Kubernetes	2	1039	September 19, 2019
Executing PySpark code in a Jupyter Notebook using Z2JK where the user configures the Spark Session with the spark deployment mode set to "client" with the Spark executors running in their own dedicated Kubernetes cluster Zero to JupyterHub on Kubernetes jupyterlab , jupyterhub	1	843	June 16, 2022
Running Jupyter Notebook Yarn Cluster Mode in HPE Data Fabric Kernels jupyterlab , jupyterhub	1	921	June 21, 2022
How to run single-user pod in headless mode so that I can run spark jobs on an external Hadoop Cluster Zero to JupyterHub on Kubernetes jupyterhub , help-wanted , notebook	0	812	April 8, 2022

Jupyter Notebook connecting to existing Spark/Yarn Cluster

Related topics