How to run single-user pod in headless mode so that I can run spark jobs on an external Hadoop Cluster

rajatonit · April 8, 2022, 1:27pm

Hi,

I am running a on-prem k8s cluster and am running JuypterHub on it.

I can successfully submit the job to an yarn queue, however the job will fail because users notebook pod IP is not resolvable and therefore it can’t talk back to the spark driver running on said pod and I get an error like:

Caused by: java.io.IOException: Failed to connect to podIP:33630 at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:287) at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:218) at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:230) at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:204) at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:202) at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:198) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.net.UnknownHostException: pod-ip

I believe I’m missing something in my setup that will allow yarn to talk back to the spawned notebook pods on the kubernetes cluster. I have read that I might need to run singleUser image or the hub image in headless mode, but I am not sure how to do that. I have tried looking through kubespawner and juypterhub configuration for a parameter to allow me to achieve this with no luck.

Any help or hints are greatly appreciated .

For now I am passing the spark driver the internal Kubernetes Pod IP of juypterhub by setting: “spark.driver.host” to str(socket.gethostbyname(socket.gethostname())) and can connect to yarn, but Yarn resource manager is unable to talk back to the spark driver in the single user pod.

Thanks!

Topic		Replies	Views
Help running spark jobs on a cluster that is external to K8 Zero to JupyterHub on Kubernetes help-wanted	2	722	October 30, 2024
Spark Client Mode Integration Zero to JupyterHub on Kubernetes	2	1004	September 19, 2019
Problem connecting singleuser pod to spark Zero to JupyterHub on Kubernetes	1	550	December 8, 2021
Executing PySpark code in a Jupyter Notebook using Z2JK where the user configures the Spark Session with the spark deployment mode set to "client" with the Spark executors running in their own dedicated Kubernetes cluster Zero to JupyterHub on Kubernetes jupyterlab , jupyterhub	1	833	June 16, 2022
Users pods connection to external spark cluster Zero to JupyterHub on Kubernetes jupyterlab , jupyterhub , how-to , help-wanted	1	75	December 9, 2024

How to run single-user pod in headless mode so that I can run spark jobs on an external Hadoop Cluster

Related topics