How do i communicate with EMR cluster from a local jupyterhub cluster

I have a jupyterhub cluster running on my domain, i want it such that if user runs a cell then the computation occurs in my emr cluster and the output is displayed below the cell.

From what i have learned so far, generally there are two ways to communicate with EMR cluster :

  1. SSH into the cluster and have jupyterhub instance running on it and execute commands there.
  2. Create a py script and push it s3 bucket and then use add_job_flow_steps method to run the py script from s3 bucket.

but what i want is, if the user runs code on jupterhub notebook on my domain the computation should take place in EMR and the output should be show bellow the cell like any other query.

Well, you will have to implement a spawner for Amazon EMR. A quick google search didnt return any existing spawner implementations. With such a spawner, your notebook instance will launch on the EMR cluster and proxy that notebook instance via JupyterHub. Here are the docs on how to implement custom spawners.

1 Like