How to manage dependencies between different nodes?

aldu29 · February 3, 2023, 4:36pm

Hi,
I use JEG based on Spark to provide a multi-tenant/scalable environment for Notebooks based on k8s.
It works pretty well except for Python dependencies.
The problem: some users could install custom dependencies in their notebooks (pip install…) and these dependencies are only installed on Spark Driver node. When Executors are launched, they can be executed on different Pods/Nodes from the Driver and these dependencies are not present.
At the moment we don’t have any distributed filesystem on k8s. If we can avoid that, it would be fine. Does something exist to tackle this problem ?

kevin-bates · February 3, 2023, 5:22pm

Hi @aldu29 - EG doesn’t provide anything for this directly. I think what most folks do is extend the images to include their dependencies and use the same image for both driver and executors or introduce volume mounts, etc. but that exercise - due to the high degree of variances - is left to the operator/deployer.

With EG 3.x we support the podTemplateFile attribute on the spark driver and executor configs so those environments can get the full benefit of customizing the kernel-pod.yaml template which can be configured just about however you like on a per kernel(spec) basis.

You might also try opening a discussion in the EG repo in hopes that folks might see that item instead (i.e., spread the bait).

aldu29 · February 3, 2023, 5:28pm

Ok, Thanks @kevin-bates
I try something based on volume mount and opening a discussion in the EG repo.
Thanks for the advice

Topic		Replies	Views
Jupyter Enterprise Gateway with Spark-on-k8s Enterprise Gateway	3	1403	July 25, 2023
Jupyterlab with Jupyter enterprise gateway in K8s environment JupyterLab jupyterlab , how-to , help-wanted	0	478	February 8, 2023
Spark integration documentation Zero to JupyterHub on Kubernetes	2	5265	March 6, 2019
Single user server as driver node in spark cluster k8s Zero to JupyterHub on Kubernetes jupyterlab , jupyterhub , how-to , help-wanted	1	922	April 13, 2022
Executing PySpark code in a Jupyter Notebook using Z2JK where the user configures the Spark Session with the spark deployment mode set to "client" with the Spark executors running in their own dedicated Kubernetes cluster Zero to JupyterHub on Kubernetes jupyterlab , jupyterhub	1	837	June 16, 2022

How to manage dependencies between different nodes?

Related topics