IPython parallell setup on Carver at NERSC

IPython parallel is one of the easiest ways to spawn several Python sessions on a Supercomputing cluster and process jobs in parallel.

On Carver, the basic setup is running a controller on the login node, and submit engines to the computing nodes via PBS.

First create your configuration files running:

ipython profile create --parallel

Therefore in the ~/.config/ipython/profile_default/ipcluster_config.py, just need to set:

c.IPClusterStart.controller_launcher_class = 'LocalControllerLauncher'
c.IPClusterStart.engine_launcher_class = 'PBS'
c.PBSLauncher.batch_template_file = u'~/.config/ipython/profile_default/pbs.engine.template'

You also need to allow connections to the controller from other hosts, setting in ~/.config/ipython/profile_default/ipcontroller_config.py:

c.HubFactory.ip = '*'

With the path to the pbs engine template.

Next a couple of examples of pbs templates, for 2 or 8 processes per node:
IPython configuration does not seem to be flexible enough to add a parameter for specifying the processes per node.
So I just created a bash script that get as parameters the processes per node and the total number of nodes:

ipc 8 2 # 2 nodes with 8ppn, 16 total engines
ipc 2 3 # 3 nodes with 2ppn, 6 total engines

Once the engines are running, jobs can be submitted opening an IPython shell on the login node and run:

from IPython.parallel import Client
rc = Client()

lview = rc.load_balanced_view() # default load-balanced view

def serial_func(argument):

pass

parallel_result = lview.map(serial_func, list_of_arguments)

The serial function is sent to the engines and executed for each element of the list of arguments.

If the function returns a value, than it is transferred back to the login node.

In case the returned values are memory consuming, is also possible to still run the controller on the login node, but execute the interactive IPython session in an interactive job.