Configure number of Spark executor pods using custom kernels and enterprise gateway for kubernetes

I have been toying with enterprise gateway and spark. I could successfully get things to run the way I want (including with a custom kernel image and kernel spec image). Unfortunately, I’m pretty new to both, kubernetes and to spark so I am not very confident in my understanding and if my solution is optimal already.

As far as I understand it, enterprise gateway currently does a spark-submit in cluster mode to launch the kernel (that is also the spark driver). Is this understanding correct? Of course, this means that the launch already sets the numbers of executors according to the SPARK_OPS in my kernel,json.

Is there a way to make this more dynamic, i.e. let a notebook user choose how many executors to spawn or even better allow using a SparkConf within the notebook to adjust this?

If my understanding is correct, the second option may be tricky unless I somehow switch to client-mode, right? However I would also be happy with the first and prefer that I don’t need n different kernel specs for the same with with n different values for num_executors.

Look at spark-context-initialization-mode in System Architecture — Jupyter Enterprise Gateway 3.3.0.dev0 documentation. Basically setting it to none will allow the user to configure a lot of the settings, including the number of executors when they create the spark session object. I know it works for Python and Scala, hopefully, this is one of the kernels you are using.

1 Like