IPython Cluster tab create a new profile

Hello everyone.

I have some experience working with JupyterHUB specifically TLJH with SSHSpawner. Now, I am trying to use IPython Parallel Clusters with OpenMPI.

My setup consists of a remote host with Debian running an implementation of ipython parallel with OpenMPI and MPI4PY. The TLJH is running on Ubuntu with a working Ipython Parallel installation. I tested this by going into My Server, IPython Clusters tab, and starting the default cluster.

I tested a controller outside of JupyterHub and was able to configure and connect it to my remote hostā€™s ipengines. How can I configure a new ipython profile to control it directly from the IPython Clusters tab and how can I connect to running clusters from inside a Notebook?

Thanks.

Iā€™m not 100% sure I followed your description, but since you are using TLJH with SSHSpawner, and only describe two machines, does that mean you are running notebook servers on the same machine as your ipython cluster? Or is there a third machine?

How can I configure a new ipython profile to control it directly from the IPython Clusters tab

To control a ā€˜remoteā€™ cluster, you make two choices:

  1. where does your controller run, and
  2. where/how to launch the engines

From your description, Iā€™m assuming you have

  1. a machine local, where your notebook server is running
  2. a machine remote, where you want your engines to run
  3. you want the controller on remote
  4. you want to start the engine on remote with mpi
  5. you want to be able to start/stop the cluster from the clusters tab (i.e. the only inputs are profile and n)

The first step is to create a profile. Iā€™m going to call it remotempi:

[local]$ ipython profile create --parallel remotempi

This will create ~/.ipython/profile_remotempi/ with various config files (if you leave out --parallel, the only difference is that it wonā€™t generate empty config files for ipython parallel. These arenā€™t required.).

The second step is to edit ~/.ipython/profile_remotempi/ipcluster_config.py. Add:

# local:.ipython/profile_remotempi/ipcluster_config.py
c = get_config()  # noqa
# tell it to launch the controller via ssh
c.Cluster.controller_launcher_class = "ssh"
# the host to ssh to
c.SSHControllerLauncher.hostname = "remote"
# tell the controller to listen on all ips, so the client can connect to it
# (alternately, use ssh tunneling from the client)
c.Cluster.controller_args = ["--ip=*"]

Next, youā€™ll want to tell it to launch the engines with MPI and ssh. This is more tedious than it should be.

on remote, create a profile with the same name (doesnā€™t need to match, but fewer things to configure if they do):

[remote]$ ipython profile create --parallel remotempi

And again on remote, you only need one config option, again in ~/.ipython/profile_remotempi/ipcluster_config.py:

# remote:.ipython/profile_remotempi/ipcluster_config.py
c = get_config()  # noqa

# launch engines with mpiexec
c.Cluster.engine_launcher_class = "mpi"

Now, back to local, we want to tell it to launch engines on remote by delegating to ipcluster engines, i.e. it will run on local: ssh remote ipcluster engines --profile remotempi, which will amount to mpiexec ipengine on remote.

# back in local:.ipython/profile_remotempi/ipcluster_config.py

c.Cluster.engine_launcher_class = "sshproxy"
c.SSHProxyEngineSetLauncher.hostname = "remote"

At this point, we have:

  1. a local profile remotempi
  2. a remote profile remotempi
  3. controller launched on remote via ssh, that is connectable from local
  4. starting engines, ultimately via something like ssh remote mpiexec ipengine

(the ssh launchers take care of things like distributing connection files between the two machines).

how can I connect to running clusters from inside a Notebook?

With the above config, you should be able to start clusters with profile='remotempi':

import ipyparallel as ipp

cluster = ipp.Cluster(profile="remotempi")
rc = cluster.start_and_connect_sync()

Or, if youā€™ve already started a running cluster, load the cluster info from a file and skip the start part:

cluster = ipp.Cluster.from_file(profile="remotempi")
rc = cluster.connect_client_sync()

Iā€™ve put together a demo repo with all of this using docker-compose, which may be useful. It has a lot of boilerplate to set up the machines with ssh, keys, etc. which you probably have already done.

1 Like

Thanks a lot for your thorough explanation. Sorry if my description was quite vague. I will add some details.

My system is formed of one local host which acts as the central hub and 9 remote hosts. Now I am trying to change the implementation completely to use a local spawner but leave access to the remote platforms in the form of IPython clusters. The SSHSpawner part was scrapped, but I mention it to show that there is already a valid certificate present in the system.

I have one question regarding your instructions:

Do I have to create the profile inside the Jupyterlab-user account in Linux, or any user (excluding root) should work fine?

I actually tested this part with a user on the central hub and was able to bring up the controller and connect to the remote engines. This profile was then copied into my jupyter user folder (/home/jupyter-user/.ipython/). The new cluster shows up but it shows ā€œerror starting clusterā€. I am not sure if it is related to the json config file. When starting the cluster from JupyterHUB I am unable to locate the ipcontroller-engine.json file. It is not generated inside the security folder of the profile. Not sure if this is related to an issue inside my config file.

EDIT:

Sorry for being obtuse about it. I tested starting and connecting to the cluster as you suggested in the repo and got a nice error on SSH public keys not matching. Will sort it out and let hopefully is only that. Thanks a lot.

Edit 2:

Finally got it to make something but now I get this error:

RuntimeWarning: IPython could not determine IPs for remote_hostname: [Errno -3] Temporary failure in name resolution

which I noticed is the same in Broadcast View ā€” ipyparallel 8.6.0.dev documentation.
I will continue tinkering, please let me know if there is a simple solution to this. Damn sorry for the spam. Updated the hosts file on my central hub because I am too lazy to setup a DNS server. Now, for some reason which must be an issue in configuration (maybe my mpi on remote is not working at it should) is triggering EngineError: Engine set stopped: {ā€˜exit_codeā€™: -1, ā€˜pidā€™: 15733, ā€˜identifierā€™: ā€¦ }

Thanks.

On a side note:

I noticed that clusters started inside the notebook will remain open on the GUI and given that each time it is running from the NB it uses a different ID a new cluster spawns in the GUI. All will appear running and on stop will completely disappear.

Do I have to create the profile inside the Jupyterlab-user account in Linux, or any user (excluding root) should work fine?

It should be as the user account that is running jupyterlab.

I actually updated GitHub - minrk/ipyparallel-ssh-mpi-demo: demo for ipython parallel with mpi and ssh so you donā€™t have to pre-create the remote profile at all, if you add

c.SSHProxyEngineSetLauncher.ipcluster_args = ["--engines=mpi"]

Iā€™m not quite sure I get what youā€™re asking, but the default cluster id for a cluster launched in Python is a random string (this allows multiple notebooks with defaults that create and manage their own clusters to avoid collisions). You can specify that is use the empty string (cluster_id=""). If you launch from the clusters tab, it will use an empty string. If you use the jupyterlab extension instead of the old clusters tab, you can start clusters with any cluster id via the UI.

1 Like

Yeah, that would pretty much solve everything. Thanks a lot. Just a last question. To make a cluster out of several remote hosts running several engines, do I need to run a controller on each of them? I have been testing the
c.SSHControllerSetLauncher.engines = {}

option without any luck. This should supersede

c.SSHControllerSetLauncher.hostname

right? or its functionality is completely different?

Thanks

Thanks for the demo it was a great guide. I will just add more information on the process to instantiate both the 1:1 cluster and the 1:many.

1:1 This was done using the SSH controller launcher on the specified hostname. Then an SSHProxy engine launcher would start the cluster in that specific system.

The configuration is the same of the demo, namely:

#ipcluster_config.py

# local:.ipython/profile_remotempi/ipcluster_config.py
c = get_config()  # noqa
# tell it to launch the controller via ssh
c.Cluster.controller_launcher_class = "ssh"
c.SSHControllerLauncher.hostname = "remote"
# tell the controller to listen on all ips, so the client can connect to it
# (alternately, use ssh tunneling from the client)
c.Cluster.controller_args = ["--ip=*"]

# SSH Proxy engine launcher
c.Cluster.engine_launcher_class = "sshproxy"
c.SSHProxyEngineSetLauncher.hostname = "remote"
c.SSHProxyEngineSetLauncher.ipcluster_args = ["--engines=mpi"]

Now for the 1:many is a bit different but way more simple. Everything must be reachable from the central hub. The Jupyter-user must be able to ssh without password to the nodes.

The ipcluster_config.py file looks like this:

#ipcluster_config.py

c = get_config()  #noqa

# controller config
c.Cluster.controller_ip = '*'

c.SSHLauncher.user = 'remote'
# engine config

c.Cluster.engine_launcher_class = 'ssh'

c.SSHEngineSetLauncher.engine_args = ['--engines=mpi --profile-dir=/home/remote/.ipython/profile_ssh1']
c.SSHEngineSetLauncher.engines = {'remote@node0': 2, 'remote@node1': 2}

c.SSHEngineSetLauncher.remote_profile_dir = '/home/remote/.ipython/profile_ssh1'

Issues:

c.SSHEngineSetLauncher.user = 'remote' does not pass the user parameter to the c.SSHEngineSetLauncher.engines. i get that it tries to use jupyter-user@node0 instead of the substitution.

c.SSHEngineSetLauncher.remote_profile_dir will not be included in the engine parameters. If you want to pass this parameter you must add it to c.SSHEngineSetLauncher.engine_args = ['--engines=mpi --profile-dir=profile_ssh1'].

Not sure if these two issues are real issues or just deprecated configuration options that are not supported.