When spark call was made from drive node (single user server), we are getting below error. Does anyone know how to resolve this?
Python in worker has different version 3.8 than that in driver 3.9, PySpark cannot run with different minor versions. Please check environment variables PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON are correctly set.
- We have deployed both JupyterHub & Spark in a K8S cluster namespace
- Spark Cluster Master & Worker node uses bitnami/spark:3.0.0 container image
- The driver node (single user server) uses jupyter/pyspark-notebook:ubuntu-20.04 container image
import pyspark from pyspark.sql import SparkSession from pyspark.sql import SQLContext from pyspark import SparkConf, SparkContext conf = SparkConf() conf.setMaster("spark://spark-cluster-master-svc.studio.svc.cluster.local:7077") conf.setAppName("MyApp") conf.set('spark.driver.host', socket.gethostbyname(socket.gethostname())) conf.set('spark.executor.instances', '2') sc = SparkContext(conf=conf) rdd = sc.parallelize(range(0, 2)) rdd.sum() sc.stop()