# Multi-core and Distributed Sampling¶

The choice of the sampler determines in which way parallelization is performed. See also the explanation of the samplers.

class pyabc.sampler.Sampler

Bases: abc.ABC

Abstract Sampler base class.

Produce valid particles: pyabc.parameters.ValidParticle.

nr_evaluations_

int – This is set after a population and counts the total number of model evaluations. This can be used to calculate the acceptance rate.

sample_until_n_accepted(sample_one: Callable[A], simulate_one: Callable[A, pyabc.parameters.ValidParticle], accept_one: Callable[pyabc.parameters.ValidParticle, bool], n: int)
Parameters: sample_one (Callable[[], A]) – A function which takes no arguments and returns a proposal parameter $$\theta$$. simulate_one (Callable[[A], ValidParticle]) – A function which takes as sole argument a proposal parameter $$\theta$$ as returned by sample_one. It returns the summary statistics $$s$$. accept_one (Callable[[ValidParticle], bool]) – A function which takes as sole argument the summary statistics $$s$$ as returned by simulate_one. It returns True it the simulated sample is accepted and False otherwise. n (int) – The number of samples to be accepted. I.e. the population size. valid_particles – The list of accepted particles.
class pyabc.sampler.SingleCoreSampler

Bases: pyabc.sampler.base.Sampler

Sample on a single core. No parallelization.

sample_until_n_accepted(sample_one, simulate_one, accept_one, n)
Parameters: sample_one (Callable[[], A]) – A function which takes no arguments and returns a proposal parameter $$\theta$$. simulate_one (Callable[[A], ValidParticle]) – A function which takes as sole argument a proposal parameter $$\theta$$ as returned by sample_one. It returns the summary statistics $$s$$. accept_one (Callable[[ValidParticle], bool]) – A function which takes as sole argument the summary statistics $$s$$ as returned by simulate_one. It returns True it the simulated sample is accepted and False otherwise. n (int) – The number of samples to be accepted. I.e. the population size. valid_particles – The list of accepted particles.
class pyabc.sampler.MulticoreParticleParallelSampler(n_procs=None)

Bases: pyabc.sampler.multicorebase.MultiCoreSampler

Samples on multiple cores using the multiprocessing module. This sampler is optimized for low latencies and is efficient, even if the individual model evaluations are fast.

Requires no pickling of the sample_one, simulate_one and accept_one function. This is achieved using fork on linux (see Sampler).

The simulation results are still pickled as they are transmitted from the worker processes back to the parent process. Depending on the kind of summary statistics this can be fast or slow. If your summary statistics are only a dict with a couple of numbers, the overhead should not be substantial. However, if your summary statistics are large numpy arrays or similar, this could cause overhead

Parameters: n_procs (int, optional) – If set to None, the Number of cores is determined according to pyabc.sge.nr_cores_available().

Warning

Windows support is not tested. As there is no fork on Windows. This sampler might not work.

sample_until_n_accepted(sample_one, simulate_one, accept_one, n)
Parameters: sample_one (Callable[[], A]) – A function which takes no arguments and returns a proposal parameter $$\theta$$. simulate_one (Callable[[A], ValidParticle]) – A function which takes as sole argument a proposal parameter $$\theta$$ as returned by sample_one. It returns the summary statistics $$s$$. accept_one (Callable[[ValidParticle], bool]) – A function which takes as sole argument the summary statistics $$s$$ as returned by simulate_one. It returns True it the simulated sample is accepted and False otherwise. n (int) – The number of samples to be accepted. I.e. the population size. valid_particles – The list of accepted particles.
class pyabc.sampler.MappingSampler(map=<class 'map'>, mapper_pickles=False)

Bases: pyabc.sampler.base.Sampler

Parallelize via a map operation. This sampler can be applied in a multi-core or in a distributed setting.

Parameters: map (map like function) – A function which works like the built in map. The map can be really any generic map operations. Possible candidates include: multiprocessing.pool.map (see https://docs.python.org/3/library/multiprocessing.html#multiprocessing.pool.Pool) pyabc.sge.SGE’s map method. This mapper is useful in SGE-like environments where you don’t want to start workers which run forever. Dask’s distributed distributed.Client’s map (see https://distributed.readthedocs.io/en/latest/api.html#client) IPython parallel’ map (see http://ipyparallel.readthedocs.io/en/latest/task.html#quick-and-easy-parallelism) and many other implementations. Each of the mapped function calls samples until it gets one accepted particle. This could have a performance impact if one of the sample tasks runs very long and all the other tasks are already finished. The sampler then has to wait until the last sample task is finished. mapper_pickles (bool, optional) – Whether the mapper handles the pickling itself or the MappingSampler class should handle serialization. The default is False. While this setting is compatible with a larger range of map functions, its performance can be suboptimal. As possibly too much serialization and deserialization is done, which could limit overall performace if the model evaluations are comparatively fast. The passed map function might implement more efficient serialization. For example, for the pyabc.sge.SGE mapper, this option should be set to True for better performance.
sample_until_n_accepted(sample, simualte, accept, n)
Parameters: sample_one (Callable[[], A]) – A function which takes no arguments and returns a proposal parameter $$\theta$$. simulate_one (Callable[[A], ValidParticle]) – A function which takes as sole argument a proposal parameter $$\theta$$ as returned by sample_one. It returns the summary statistics $$s$$. accept_one (Callable[[ValidParticle], bool]) – A function which takes as sole argument the summary statistics $$s$$ as returned by simulate_one. It returns True it the simulated sample is accepted and False otherwise. n (int) – The number of samples to be accepted. I.e. the population size. valid_particles – The list of accepted particles.
class pyabc.sampler.DaskDistributedSampler(dask_client=None, client_max_jobs=inf, default_pickle=False, batchsize=1)

Bases: pyabc.sampler.eps_mixin.EPSMixin, pyabc.sampler.base.Sampler

Parallelize with dask. This sampler is intended to be used with a pre-configured dask client, but is able to initialize client, scheduler and workers on its own on the local machine for testing/debugging purposes.

Parameters: dask_client (dask.Client, optional) – The configured dask Client. If none is provided, then a local dask distributed cluster is created. client_max_jobs – Maximum number of jobs that can submitted to the client at a time. If this value is smaller than the maximum number of cores provided by the distributed infrastructure, the infrastructure will not be utilized fully. default_pickle – Specify if the sampler uses pythons default pickle function to communicate the submit function to python; if this is the case, a cloud-pickle based workaround is used to pickle the simulate and evaluate functions. This allows utilization of locally defined functions, which can not be pickled using default pickle, at the cost of an additional pickling overhead. For dask, this workaround should not be necessary and it should be save to use default_pickle=false. batchsize (int, optional) – Number of parameter samples that are evaluated in one remote execution call. Batchsubmission can be used to reduce the communication overhead for fast (ms-s) model evaluations. Large batchsizes can result in un- necessary model evaluations. By default, batchsize=1, i.e. no batching is done.
class pyabc.sampler.RedisEvalParallelSampler(host='localhost', port=6379)

Bases: pyabc.sampler.base.Sampler

Redis based low latency sampler. This sampler is well performing in distributed environments. It is usually faster than the pyabc.sampler.DaskDistributedSampler for short model evaluation runtimes. The longer the model evaluation times, the less the advantage becomes. It requires a running Redis server as broker.

This sampler requires workers to be started via the command abc-redis-worker. An example call might look like abc-redis-worker --host=123.456.789.123 --runtime=2h to connect to a Redis server on IP 123.456.789.123 and to terminate the worker after finishing the first population which ends after 2 hours since worker start. So the actual runtime might be longer than 2h. See abc-redis-worker --help for its options.

Use the command abc-redis-manager to retrieve info and stop the running workers.

Start as many workers as you wish. Workers can be dynamically added during the ABC run.

Parameters: host (str, optional) – IP address or name of the Redis server. Default is “localhost” port (int, optional) – Port of the Redis server. Default if 6379.
sample_until_n_accepted(sample_one, simulate_one, accept_one, n)
Parameters: sample_one (Callable[[], A]) – A function which takes no arguments and returns a proposal parameter $$\theta$$. simulate_one (Callable[[A], ValidParticle]) – A function which takes as sole argument a proposal parameter $$\theta$$ as returned by sample_one. It returns the summary statistics $$s$$. accept_one (Callable[[ValidParticle], bool]) – A function which takes as sole argument the summary statistics $$s$$ as returned by simulate_one. It returns True it the simulated sample is accepted and False otherwise. n (int) – The number of samples to be accepted. I.e. the population size. valid_particles – The list of accepted particles.
class pyabc.sampler.MulticoreEvalParallelSampler(n_procs=None)

Bases: pyabc.sampler.multicorebase.MultiCoreSampler

Multicore Evaluation parallel sampler.

Implements the same strategy as pyabc.sampler.RedisEvalParallelSampler or pyabc.sampler.DaskDistributedSampler.

However, parallelization is restricted to a single machine with multiple processes. This sampler has very low communication overhead and is thus suitable for short running model evaluations.

Requires no pickling of the sample_one, simulate_one and accept_one function. This is achieved using fork on linux (see Sampler).

The simulation results are still pickled as they are transmitted from the worker processes back to the parent process. Depending on the kind of summary statistics this can be fast or slow. If your summary statistics are only a dict with a couple of numbers, the overhead should not be substantial. However, if your summary statistics are large numpy arrays or similar, this could cause overhead

Parameters: n_procs (int, optional) – If set to None, the Number of cores is determined according to pyabc.sge.nr_cores_available().
sample_until_n_accepted(sample_one, simulate_one, accept_one, n)
Parameters: sample_one (Callable[[], A]) – A function which takes no arguments and returns a proposal parameter $$\theta$$. simulate_one (Callable[[A], ValidParticle]) – A function which takes as sole argument a proposal parameter $$\theta$$ as returned by sample_one. It returns the summary statistics $$s$$. accept_one (Callable[[ValidParticle], bool]) – A function which takes as sole argument the summary statistics $$s$$ as returned by simulate_one. It returns True it the simulated sample is accepted and False otherwise. n (int) – The number of samples to be accepted. I.e. the population size. valid_particles – The list of accepted particles.
class pyabc.sampler.RedisEvalParallelSamplerServerStarter(host='localhost', port=6379)

Bases: pyabc.sampler.redis_eps.sampler.RedisEvalParallelSampler

class pyabc.sampler.ConcurrentFutureSampler(cfuture_executor=None, client_max_jobs=200, default_pickle=True, batchsize=1)

Bases: pyabc.sampler.eps_mixin.EPSMixin, pyabc.sampler.base.Sampler

Parallelize with an arbitrary sampler that implements the python concurrent futures executor interface. Specifically, it needs to implement a “submit” function that is able to evaluate arbitrary function handles and return a concurrent future result object

Parameters: cfuture_executor (concurrent.futures.Executor, required) – Configured object that implements the concurrent.futures.Executor interface client_max_jobs – Maximum number of jobs that can submitted to the client at a time. If this value is smaller than the maximum number of cores provided by the distributed infrastructure, the infrastructure will not be utilized fully. default_pickle – Specify if the sampler uses pythons default pickle function to communicate the submit function to python; if this is the case, a cloud-pickle based workaround is used to pickle the simulate and evaluate functions. This allows utilization of locally defined functions, which can not be pickled using default pickle, at the cost of an additional pickling overhead. batchsize (int, optional) – Number of parameter samples that are evaluated in one remote execution call. Batchsubmission can be used to reduce the communication overhead for fast (ms-s) model evaluations. Large batchsizes can result in un- necessary model evaluations. By default, batchsize=1, i.e. no batching is done