# Multi-core and Distributed Sampling¶

The choice of the sampler determines in which way parallelization is performed. See also the explanation of the samplers.

class pyabc.sampler.Sample(record_all_sum_stats: bool = False)

Bases: object

A Sample is created and filled during the sampling process by the Sampler.

Parameters: record_all_sum_stats (bool) – True: Record summary statistics of the rejected particles as well. False: Only record accepted particles.
all_sum_stats

Get all summary statistics.

Returns: all_sum_stats – Concatenation of all the all_sum_stats lists of all particles added and accepted to this sample via append(). List
append(particle: pyabc.population.Particle)

Add new particle to the sample.

Parameters: particle (Particle) – Sampled particle containing all information needed later.
get_accepted_population() → pyabc.population.Population
Returns: population – A population of only the accepted particles. Population
n_accepted

returns: n_accepted – Number of accepted particles. :rtype: int

class pyabc.sampler.Sampler

Bases: abc.ABC

Abstract Sampler base class.

Produce valid particles: pyabc.parameters.ValidParticle.

Parameters: nr_evaluations (int) – This is set after a population and counts the total number of model evaluations. This can be used to calculate the acceptance rate. sample_factory (SampleFactory) – A factory to create empty samples.
sample_until_n_accepted(n, simulate_one) → pyabc.sampler.base.Sample

Performs the sampling, i.e. creation of a new generation (i.e. population) of particles.

Parameters: n (int) – The number of samples to be accepted. I.e. the population size. simulate_one (Callable[[A], Particle]) – A function which internally performs the whole process of sampling parameters, simulating data, and comparing to observed data to check for acceptance, as indicated via the particle.accepted flag. sample – The generated sample, which contains the new population. pyabc.sampler.Sample
class pyabc.sampler.SingleCoreSampler

Bases: pyabc.sampler.base.Sampler

Sample on a single core. No parallelization.

sample_until_n_accepted(n, simulate_one)

Performs the sampling, i.e. creation of a new generation (i.e. population) of particles.

Parameters: n (int) – The number of samples to be accepted. I.e. the population size. simulate_one (Callable[[A], Particle]) – A function which internally performs the whole process of sampling parameters, simulating data, and comparing to observed data to check for acceptance, as indicated via the particle.accepted flag. sample – The generated sample, which contains the new population. pyabc.sampler.Sample
class pyabc.sampler.MulticoreParticleParallelSampler(n_procs=None, daemon=True)

Bases: pyabc.sampler.multicorebase.MultiCoreSampler

Samples on multiple cores using the multiprocessing module. This sampler is optimized for low latencies and is efficient, even if the individual model evaluations are fast.

Requires no pickling of the sample_one, simulate_one and accept_one function. This is achieved using fork on linux (see Sampler).

The simulation results are still pickled as they are transmitted from the worker processes back to the parent process. Depending on the kind of summary statistics this can be fast or slow. If your summary statistics are only a dict with a couple of numbers, the overhead should not be substantial. However, if your summary statistics are large numpy arrays or similar, this could cause overhead

Parameters: n_procs (int, optional) – If set to None, the Number of cores is determined according to pyabc.sge.nr_cores_available().

Warning

Windows support is not tested. As there is no fork on Windows. This sampler might not work.

sample_until_n_accepted(n, simulate_one)

Performs the sampling, i.e. creation of a new generation (i.e. population) of particles.

Parameters: n (int) – The number of samples to be accepted. I.e. the population size. simulate_one (Callable[[A], Particle]) – A function which internally performs the whole process of sampling parameters, simulating data, and comparing to observed data to check for acceptance, as indicated via the particle.accepted flag. sample – The generated sample, which contains the new population. pyabc.sampler.Sample
class pyabc.sampler.MappingSampler(map=<class 'map'>, mapper_pickles=False)

Bases: pyabc.sampler.base.Sampler

Parallelize via a map operation. This sampler can be applied in a multi-core or in a distributed setting.

Parameters: map (map like function) – A function which works like the built in map. The map can be really any generic map operations. Possible candidates include: multiprocessing.pool.map (see https://docs.python.org/3/library/multiprocessing.html#multiprocessing.pool.Pool) pyabc.sge.SGE’s map method. This mapper is useful in SGE-like environments where you don’t want to start workers which run forever. Dask’s distributed distributed.Client’s map (see https://distributed.readthedocs.io/en/latest/api.html#client) IPython parallel’ map (see http://ipyparallel.readthedocs.io/en/latest/task.html#quick-and-easy-parallelism) and many other implementations. Each of the mapped function calls samples until it gets one accepted particle. This could have a performance impact if one of the sample tasks runs very long and all the other tasks are already finished. The sampler then has to wait until the last sample task is finished. mapper_pickles (bool, optional) – Whether the mapper handles the pickling itself or the MappingSampler class should handle serialization. The default is False. While this setting is compatible with a larger range of map functions, its performance can be suboptimal. As possibly too much serialization and deserialization is done, which could limit overall performace if the model evaluations are comparatively fast. The passed map function might implement more efficient serialization. For example, for the pyabc.sge.SGE mapper, this option should be set to True for better performance.
sample_until_n_accepted(n, simulate_one)

Performs the sampling, i.e. creation of a new generation (i.e. population) of particles.

Parameters: n (int) – The number of samples to be accepted. I.e. the population size. simulate_one (Callable[[A], Particle]) – A function which internally performs the whole process of sampling parameters, simulating data, and comparing to observed data to check for acceptance, as indicated via the particle.accepted flag. sample – The generated sample, which contains the new population. pyabc.sampler.Sample
class pyabc.sampler.DaskDistributedSampler(dask_client=None, client_max_jobs=inf, default_pickle=False, batchsize=1)

Bases: pyabc.sampler.eps_mixin.EPSMixin, pyabc.sampler.base.Sampler

Parallelize with dask. This sampler is intended to be used with a pre-configured dask client, but is able to initialize client, scheduler and workers on its own on the local machine for testing/debugging purposes.

Parameters: dask_client (dask.Client, optional) – The configured dask Client. If none is provided, then a local dask distributed cluster is created. client_max_jobs – Maximum number of jobs that can submitted to the client at a time. If this value is smaller than the maximum number of cores provided by the distributed infrastructure, the infrastructure will not be utilized fully. default_pickle – Specify if the sampler uses pythons default pickle function to communicate the submit function to python; if this is the case, a cloud-pickle based workaround is used to pickle the simulate and evaluate functions. This allows utilization of locally defined functions, which can not be pickled using default pickle, at the cost of an additional pickling overhead. For dask, this workaround should not be necessary and it should be save to use default_pickle=false. batchsize (int, optional) – Number of parameter samples that are evaluated in one remote execution call. Batchsubmission can be used to reduce the communication overhead for fast (ms-s) model evaluations. Large batchsizes can result in un- necessary model evaluations. By default, batchsize=1, i.e. no batching is done.
class pyabc.sampler.RedisEvalParallelSampler(host='localhost', port=6379, batch_size=1)

Bases: pyabc.sampler.base.Sampler

Redis based low latency sampler. This sampler is well performing in distributed environments. It is usually faster than the pyabc.sampler.DaskDistributedSampler for short model evaluation runtimes. The longer the model evaluation times, the less the advantage becomes. It requires a running Redis server as broker.

This sampler requires workers to be started via the command abc-redis-worker. An example call might look like abc-redis-worker --host=123.456.789.123 --runtime=2h to connect to a Redis server on IP 123.456.789.123 and to terminate the worker after finishing the first population which ends after 2 hours since worker start. So the actual runtime might be longer than 2h. See abc-redis-worker --help for its options.

Use the command abc-redis-manager to retrieve info and stop the running workers.

Start as many workers as you wish. Workers can be dynamically added during the ABC run.

Parameters: host (str, optional) – IP address or name of the Redis server. Default is “localhost”. port (int, optional) – Port of the Redis server. Default is 6379. batch_size (int, optional) – Number of model evaluations the workers perform before contacting the REDIS server. Defaults to 1. Increase this value if model evaluation times are short or the number of workers is large to reduce communication overhead.
n_worker()

Get the number of connected workers.

Returns: Number of workers connected.
sample_until_n_accepted(n, simulate_one)

Performs the sampling, i.e. creation of a new generation (i.e. population) of particles.

Parameters: n (int) – The number of samples to be accepted. I.e. the population size. simulate_one (Callable[[A], Particle]) – A function which internally performs the whole process of sampling parameters, simulating data, and comparing to observed data to check for acceptance, as indicated via the particle.accepted flag. sample – The generated sample, which contains the new population. pyabc.sampler.Sample
class pyabc.sampler.MulticoreEvalParallelSampler(n_procs=None, daemon=True)

Bases: pyabc.sampler.multicorebase.MultiCoreSampler

Multicore Evaluation parallel sampler.

Implements the same strategy as pyabc.sampler.RedisEvalParallelSampler or pyabc.sampler.DaskDistributedSampler.

However, parallelization is restricted to a single machine with multiple processes. This sampler has very low communication overhead and is thus suitable for short running model evaluations.

Requires no pickling of the sample_one, simulate_one and accept_one function. This is achieved using fork on linux (see Sampler).

The simulation results are still pickled as they are transmitted from the worker processes back to the parent process. Depending on the kind of summary statistics this can be fast or slow. If your summary statistics are only a dict with a couple of numbers, the overhead should not be substantial. However, if your summary statistics are large numpy arrays or similar, this could cause overhead

Parameters: n_procs (int, optional) – If set to None, the Number of cores is determined according to pyabc.sge.nr_cores_available().
sample_until_n_accepted(n, simulate_one)

Performs the sampling, i.e. creation of a new generation (i.e. population) of particles.

Parameters: n (int) – The number of samples to be accepted. I.e. the population size. simulate_one (Callable[[A], Particle]) – A function which internally performs the whole process of sampling parameters, simulating data, and comparing to observed data to check for acceptance, as indicated via the particle.accepted flag. sample – The generated sample, which contains the new population. pyabc.sampler.Sample
class pyabc.sampler.RedisEvalParallelSamplerServerStarter(host='localhost', port=6379, batch_size=1)

Bases: pyabc.sampler.redis_eps.sampler.RedisEvalParallelSampler

class pyabc.sampler.ConcurrentFutureSampler(cfuture_executor=None, client_max_jobs=200, default_pickle=True, batchsize=1)

Bases: pyabc.sampler.eps_mixin.EPSMixin, pyabc.sampler.base.Sampler

Parallelize with an arbitrary sampler that implements the python concurrent futures executor interface. Specifically, it needs to implement a “submit” function that is able to evaluate arbitrary function handles and return a concurrent future result object

Parameters: cfuture_executor (concurrent.futures.Executor, required) – Configured object that implements the concurrent.futures.Executor interface client_max_jobs – Maximum number of jobs that can submitted to the client at a time. If this value is smaller than the maximum number of cores provided by the distributed infrastructure, the infrastructure will not be utilized fully. default_pickle – Specify if the sampler uses pythons default pickle function to communicate the submit function to python; if this is the case, a cloud-pickle based workaround is used to pickle the simulate and evaluate functions. This allows utilization of locally defined functions, which can not be pickled using default pickle, at the cost of an additional pickling overhead. batchsize (int, optional) – Number of parameter samples that are evaluated in one remote execution call. Batchsubmission can be used to reduce the communication overhead for fast (ms-s) model evaluations. Large batchsizes can result in un- necessary model evaluations. By default, batchsize=1, i.e. no batching is done