Help running spark jobs on a cluster that is external to K8

Hi,

I have a JupyterHub deployment on kubernetes using the z2jh helm charts.

I’m running into an issue when trying to run spark jobs from a users notebook on a spark cluster that is setup outside of my kubernetes environment (in this case a MapR cluster).

I can successfully submit the job to yarn, however the job will fail because users notebook pod IP is not resolvable and therefore it can’t talk back to the driver running on said pod.

I believe I’m missing something in my setup, maybe ingress or proxy that will allow the user spawned notebook pods to be reachable outside of the kubernetes cluster they are running on.

Any help or hints are greatly appreciated .

Thanks

Gavin

Hi,
to provide notebook IP you need add IP in pod env var provide it to spark summit conf.
Please in extraConfig override modify_pod_hook like here

10-passpodip: |
      from kubernetes import client
      
      def modify_pod_hook(spawner, pod):
          pod.spec.containers[0].env.append(client.V1EnvVar("MY_POD_IP", None, client.V1EnvVarSource(None, client.V1ObjectFieldSelector(None, "status.podIP"))))
          return pod   
      c.KubeSpawner.modify_pod_hook = modify_pod_hook

Hi Gavin, I am trying to deploy Jupyter Hub in Kubernetes and connect to remote MapR cluster. However I cannot figure out how to.

Could you enlighten me with some document on how you did it? Many thanks