How to set PYSPARK_PYTHON/PYSPARK_DRIVER_PYTHON

We are getting ‘Please check environment
variables PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON are correctly set.’ error when running pyspark code.

Can someone please help us.
Thanks

  • User Server docker image: tried all below versions

    • pyspark-notebook:python-3.8.8
    • pyspark-notebook:spark-3.2.1
    • pyspark-notebook:ubuntu-20.04
  • Spark Cluster version: 3.2.1

    • Workers python 3.8
  • Spark Code

     import pyspark
     from pyspark.sql import SparkSession
     from pyspark.sql import SQLContext
     from pyspark import SparkConf, SparkContext
    
     conf = SparkConf()
     conf.setMaster("spark://spark-cluster-master-svc.studio.svc.cluster.local:7077")
     conf.setAppName("MyApp")
     conf.set('spark.driver.host', socket.gethostbyname(socket.gethostname()))
     conf.set('spark.executor.instances', '2') 
     sc = SparkContext(conf=conf)
    rdd = sc.parallelize(range(0, 2))
    rdd.sum()
    sc.stop()
    
  • Error

     An error occurred while calling 
     z:org.apache.spark.api.python.PythonRDD.collectAndServe.
     : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 
     0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 4) 
     (10.201.37.11 executor 1): org.apache.spark.api.python.PythonException: Traceback 
     (most recent call last):
       File "/opt/bitnami/spark/python/lib/pyspark.zip/pyspark/worker.py", line 481, in main
         raise RuntimeError(("Python in worker has different version %s than that in " +
     RuntimeError: Python in worker has different version 3.8 than that in driver 3.9, 
     PySpark cannot run with different minor versions. Please check environment 
     variables PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON are correctly set.
     at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.handlePythonException(PythonRunner.scala:555)
    
  • The original python version mismatch is resolved with ‘jupyter/pyspark-notebook:python-3.8.8’ container image as the driver (the single user server)
  • But, spark worker nodes weren’t able report back to driver (the single user server)

Has anyone seen this?
Any help to resolve this?