Executing PySpark code in a Jupyter Notebook using Z2JK where the user configures the Spark Session with the spark deployment mode set to "client" with the Spark executors running in their own dedicated Kubernetes cluster

mcberma · April 8, 2022, 4:18pm

Has anyone successfully executed PySpark code in a Jupyter Notebook using Z2JK where the user configures the Spark Session with the spark deployment mode set to “client” with the Spark executors running in their own dedicated Kubernetes cluster?

We can’t get this to work because the spark executors are not able to communicate back to the Spark driver which is running in the Jupyter Notebook - PySpark docker image (jupyter/pyspark-notebook:latest).

That is because the executors require a static Host Name for the Spark Driver (e.g., Jupyter Notebook - PySpark K8S pod).

The only thing I can think of is to MODIFY the KUBESPAWNER so that it creates a “headless” K8S service for each spawned Single User pod. Has anyone done this before AND what are the pitfalls and issues with doing this??

bryan · June 16, 2022, 5:47am

I am facing same issue. the hostname is not an issue, as you can use python command to get hostname from notebook.
I am trying to expose port on singleuser pod, but without lucky.

Topic		Replies	Views
Help running spark jobs on a cluster that is external to K8 Zero to JupyterHub on Kubernetes help-wanted	2	728	October 30, 2024
Spark Client Mode Integration Zero to JupyterHub on Kubernetes	2	1008	September 19, 2019
Jupyter Notebook connecting to existing Spark/Yarn Cluster General	7	15708	April 1, 2019
How to run single-user pod in headless mode so that I can run spark jobs on an external Hadoop Cluster Zero to JupyterHub on Kubernetes jupyterhub , help-wanted , notebook	0	794	April 8, 2022
Single user server as driver node in spark cluster k8s Zero to JupyterHub on Kubernetes jupyterlab , jupyterhub , how-to , help-wanted	1	919	April 13, 2022

Executing PySpark code in a Jupyter Notebook using Z2JK where the user configures the Spark Session with the spark deployment mode set to "client" with the Spark executors running in their own dedicated Kubernetes cluster

Related topics