Help running spark jobs on a cluster that is external to K8


I have a JupyterHub deployment on kubernetes using the z2jh helm charts.

I’m running into an issue when trying to run spark jobs from a users notebook on a spark cluster that is setup outside of my kubernetes environment (in this case a MapR cluster).

I can successfully submit the job to yarn, however the job will fail because users notebook pod IP is not resolvable and therefore it can’t talk back to the driver running on said pod.

I believe I’m missing something in my setup, maybe ingress or proxy that will allow the user spawned notebook pods to be reachable outside of the kubernetes cluster they are running on.

Any help or hints are greatly appreciated .



to provide notebook IP you need add IP in pod env var provide it to spark summit conf.
Please in extraConfig override modify_pod_hook like here

10-passpodip: |
      from kubernetes import client
      def modify_pod_hook(spawner, pod):
          pod.spec.containers[0].env.append(client.V1EnvVar("MY_POD_IP", None, client.V1EnvVarSource(None, client.V1ObjectFieldSelector(None, "status.podIP"))))
          return pod   
      c.KubeSpawner.modify_pod_hook = modify_pod_hook