How to set PYSPARK_PYTHON/PYSPARK_DRIVER_PYTHON

porsche · April 12, 2022, 8:45pm

We are getting ‘Please check environment
variables PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON are correctly set.’ error when running pyspark code.

Can someone please help us.
Thanks

User Server docker image: tried all below versions
- pyspark-notebook:python-3.8.8
- pyspark-notebook:spark-3.2.1
- pyspark-notebook:ubuntu-20.04
Spark Cluster version: 3.2.1
- Workers python 3.8

Spark Code

 import pyspark
 from pyspark.sql import SparkSession
 from pyspark.sql import SQLContext
 from pyspark import SparkConf, SparkContext

 conf = SparkConf()
 conf.setMaster("spark://spark-cluster-master-svc.studio.svc.cluster.local:7077")
 conf.setAppName("MyApp")
 conf.set('spark.driver.host', socket.gethostbyname(socket.gethostname()))
 conf.set('spark.executor.instances', '2') 
 sc = SparkContext(conf=conf)
rdd = sc.parallelize(range(0, 2))
rdd.sum()
sc.stop()

Error

 An error occurred while calling 
 z:org.apache.spark.api.python.PythonRDD.collectAndServe.
 : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 
 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 4) 
 (10.201.37.11 executor 1): org.apache.spark.api.python.PythonException: Traceback 
 (most recent call last):
   File "/opt/bitnami/spark/python/lib/pyspark.zip/pyspark/worker.py", line 481, in main
     raise RuntimeError(("Python in worker has different version %s than that in " +
 RuntimeError: Python in worker has different version 3.8 than that in driver 3.9, 
 PySpark cannot run with different minor versions. Please check environment 
 variables PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON are correctly set.
 at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.handlePythonException(PythonRunner.scala:555)

porsche · April 13, 2022, 12:17am

The original python version mismatch is resolved with ‘jupyter/pyspark-notebook:python-3.8.8’ container image as the driver (the single user server)
But, spark worker nodes weren’t able report back to driver (the single user server)

Has anyone seen this?
Any help to resolve this?

Topic		Replies	Views
Single user server as driver node in spark cluster k8s Zero to JupyterHub on Kubernetes jupyterlab , jupyterhub , how-to , help-wanted	1	887	April 13, 2022
Pyspark library is missing from jupyter/pyspark-notebook when running with jupyterhub/zero-to-jupyterhub-k8s Zero to JupyterHub on Kubernetes help-wanted	5	4764	November 12, 2021
How to set env var for all the kernels Zero to JupyterHub on Kubernetes help-wanted	1	48	November 1, 2024
Executing PySpark code in a Jupyter Notebook using Z2JK where the user configures the Spark Session with the spark deployment mode set to "client" with the Spark executors running in their own dedicated Kubernetes cluster Zero to JupyterHub on Kubernetes jupyterlab , jupyterhub	1	794	June 16, 2022
JHub + Spark on K8S / Workers cant connect to Drivers Zero to JupyterHub on Kubernetes	0	560	June 25, 2021

How to set PYSPARK_PYTHON/PYSPARK_DRIVER_PYTHON

Spark Code

Error

Related topics