When spark call was made from drive node (single user server), we are getting below error. Does anyone know how to resolve this?
Python in worker has different version 3.8 than that in driver 3.9,
PySpark cannot run with different minor versions. Please check environment
variables PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON are correctly set.
- We have deployed both JupyterHub & Spark in a K8S cluster namespace
- Spark Cluster Master & Worker node uses bitnami/spark:3.0.0 container image
- The driver node (single user server) uses jupyter/pyspark-notebook:ubuntu-20.04 container image
PySpark Code
import pyspark
from pyspark.sql import SparkSession
from pyspark.sql import SQLContext
from pyspark import SparkConf, SparkContext
conf = SparkConf()
conf.setMaster("spark://spark-cluster-master-svc.studio.svc.cluster.local:7077")
conf.setAppName("MyApp")
conf.set('spark.driver.host', socket.gethostbyname(socket.gethostname()))
conf.set('spark.executor.instances', '2')
sc = SparkContext(conf=conf)
rdd = sc.parallelize(range(0, 2))
rdd.sum()
sc.stop()