We are getting ‘Please check environment
variables PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON are correctly set.’ error when running pyspark code.
Can someone please help us.
Thanks
-
User Server docker image: tried all below versions
- pyspark-notebook:python-3.8.8
- pyspark-notebook:spark-3.2.1
- pyspark-notebook:ubuntu-20.04
-
Spark Cluster version: 3.2.1
- Workers python 3.8
-
Spark Code
import pyspark from pyspark.sql import SparkSession from pyspark.sql import SQLContext from pyspark import SparkConf, SparkContext conf = SparkConf() conf.setMaster("spark://spark-cluster-master-svc.studio.svc.cluster.local:7077") conf.setAppName("MyApp") conf.set('spark.driver.host', socket.gethostbyname(socket.gethostname())) conf.set('spark.executor.instances', '2') sc = SparkContext(conf=conf) rdd = sc.parallelize(range(0, 2)) rdd.sum() sc.stop()
-
Error
An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 4) (10.201.37.11 executor 1): org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/opt/bitnami/spark/python/lib/pyspark.zip/pyspark/worker.py", line 481, in main raise RuntimeError(("Python in worker has different version %s than that in " + RuntimeError: Python in worker has different version 3.8 than that in driver 3.9, PySpark cannot run with different minor versions. Please check environment variables PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON are correctly set. at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.handlePythonException(PythonRunner.scala:555)