I use JEG based on Spark to provide a multi-tenant/scalable environment for Notebooks based on k8s.
It works pretty well except for Python dependencies.
The problem: some users could install custom dependencies in their notebooks (pip install…) and these dependencies are only installed on Spark Driver node. When Executors are launched, they can be executed on different Pods/Nodes from the Driver and these dependencies are not present.
At the moment we don’t have any distributed filesystem on k8s. If we can avoid that, it would be fine. Does something exist to tackle this problem ?
Hi @aldu29 - EG doesn’t provide anything for this directly. I think what most folks do is extend the images to include their dependencies and use the same image for both driver and executors or introduce volume mounts, etc. but that exercise - due to the high degree of variances - is left to the operator/deployer.
With EG 3.x we support the
podTemplateFile attribute on the spark driver and executor configs so those environments can get the full benefit of customizing the
kernel-pod.yaml template which can be configured just about however you like on a per kernel(spec) basis.
You might also try opening a discussion in the EG repo in hopes that folks might see that item instead (i.e., spread the bait).
Ok, Thanks @kevin-bates
I try something based on volume mount and opening a discussion in the EG repo.
Thanks for the advice