Parallel job execution on an SGE cluster environment

Quick start

The pyabc.sge package provides as most important class the SGE. Its map method automatically parallelizes across an SGE/UGE cluster. The SGE class can be used in standalone mode or in combination with the ABCSMC class (see below Usage notes).

Usage of the parallel package is fairly easy. For example:

from pyabc.sge import SGE
sge = SGE(priority=-200, memory="3G")

def f(x):
    return x * 2

tasks = [1, 2, 3, 4]

result = sge.map(f, tasks)

print(result)
[2, 4, 6, 8]

The job scheduling is either done via an SQLite database or a REDIS instance. REDIS is recommended as it works more robustly, in particular in cases where distributed file systems are rather slow.

Note

A configuration file in ~/.parallel is required. See SGE.

The pyabc.sge.sge_available can be used to check if an SGE cluster can be used on the machine.

Check the API documentation for more details.

Information about running jobs

Use the python -m pyabc.sge.job_info_redis to get a nicely formatted output of the current execution state, in case the REDIS mode is used. Check python -m pyabc.sge.job_info_redis --help for more details.

Usage notes

The SGE class can be used in standalone mode for convenient parallelization of jobs across a cluster, completely independent of the rest of the pyABC package. The SGE class can also be combined, for instance, with the pyabc.sampler.MappingSampler class for simple parallelization of ABC-SCM runs across an SGE cluster.