pyabc.sampler

Parallel sampling

Parallel multi-core and distributed sampling.

The choice of the sampler determines in which way parallelization is performed. See also the explanation of the samplers.

Note

pyABC allows to parallelize the sampling process via various samplers. If you want to also parallelize single model simulations, be careful that both levels of parallelization work together well. In particular, if the environment variable OMP_NUM_THREADS is not set, pyABC sets it to a default of 1. For multi-processed sampling (the default at least on linux systems), the flag PYABC_NUM_PROCS can be used to determine on how many jobs to parallelize the sampling.

class pyabc.sampler.ConcurrentFutureSampler(cfuture_executor=None, client_max_jobs: int = 200, default_pickle: bool = True, batch_size: int = 1)[source]

Bases: EPSMixin, Sampler

Parallelize with an arbitrary sampler that implements the python concurrent futures executor interface. Specifically, it needs to implement a “submit” function that is able to evaluate arbitrary function handles and return a concurrent future result object

Parameters:

cfuture_executor ("concurrent.futures.Executor") – Configured object that implements the concurrent.futures.Executor interface
client_max_jobs – Maximum number of jobs that can submitted to the client at a time. If this value is smaller than the maximum number of cores provided by the distributed infrastructure, the infrastructure will not be utilized fully.
default_pickle – Specify if the sampler uses pythons default pickle function to communicate the submit function to python; if this is the case, a cloud-pickle based workaround is used to pickle the simulate and evaluate functions. This allows utilization of locally defined functions, which can not be pickled using default pickle, at the cost of an additional pickling overhead.
batch_size – Number of parameter samples that are evaluated in one remote execution call. Batch submission can be used to reduce the communication overhead for fast (ms-s) model evaluations. Large batch sizes can result in un- necessary model evaluations. By default, batch_size=1, i.e. no batching is done.

__init__(cfuture_executor=None, client_max_jobs: int = 200, default_pickle: bool = True, batch_size: int = 1)[source]

client_cores() → int[source]: Number of active client cores.

class pyabc.sampler.DaskDistributedSampler(dask_client: Client = None, client_max_jobs: int = inf, default_pickle: bool = True, batch_size: int = 1)[source]

Bases: EPSMixin, Sampler

Parallelize with dask. This sampler is intended to be used with a pre-configured dask client, but is able to initialize client, scheduler and workers on its own on the local machine for testing/debugging purposes.

Parameters:

dask_client – The configured dask Client. If none is provided, then a local dask distributed cluster is created.
client_max_jobs – Maximum number of jobs that can submitted to the client at a time. If this value is smaller than the maximum number of cores provided by the distributed infrastructure, the infrastructure will not be utilized fully.
default_pickle – Specify if the sampler uses python’s default pickle function to communicate the submit function to python; if this is the case, a cloud-pickle based workaround is used to pickle the simulate and evaluate functions. This allows utilization of locally defined functions, which can not be pickled using default pickle, at the cost of an additional pickling overhead. For dask, this workaround should not be necessary and it should be save to use default_pickle=false.
batch_size – Number of parameter samples that are evaluated in one remote execution call. Batchsubmission can be used to reduce the communication overhead for fast (ms-s) model evaluations. Large batch sizes can result in un- necessary model evaluations. By default, batch_size=1, i.e. no batching is done.

__init__(dask_client: Client = None, client_max_jobs: int = inf, default_pickle: bool = True, batch_size: int = 1)[source]

client_cores() → int[source]: Number of active client cores.

shutdown()[source]: Shutdown the dask client. If it was started without arguments, the local cluster that was started at the same time is also closed.

class pyabc.sampler.MappingSampler(map_=<class 'map'>, mapper_pickles: bool = False)[source]

Bases: Sampler

Parallelize via a map operation. This sampler can be applied in a multi-core or in a distributed setting.

Parameters:

map (map like function) –
A function which works like the built in map. The map can be really any generic map operations. Possible candidates include:
- multiprocessing.pool.map (see https://docs.python.org/3/library/multiprocessing.html#multiprocessing.pool.Pool)
- pyabc.sge.SGE’s map method. This mapper is useful in SGE-like environments where you don’t want to start workers which run forever.
- Dask’s distributed distributed.Client’s map (see https://distributed.readthedocs.io/en/latest/api.html#client)
- IPython parallel’ map (see http://ipyparallel.readthedocs.io/en/latest/task.html#quick-and-easy-parallelism)
and many other implementations.

Each of the mapped function calls samples until it gets one accepted particle. This could have a performance impact if one of the sample tasks runs very long and all the other tasks are already finished. The sampler then has to wait until the last sample task is finished.
mapper_pickles (bool, optional) –
Whether the mapper handles the pickling itself or the MappingSampler class should handle serialization.

The default is False. While this setting is compatible with a larger range of map functions, its performance can be suboptimal. As possibly too much serialization and deserialization is done, which could limit overall performace if the model evaluations are comparatively fast. The passed map function might implement more efficient serialization. For example, for the pyabc.sge.SGE mapper, this option should be set to True for better performance.

__init__(map_=<class 'map'>, mapper_pickles: bool = False)[source]

class pyabc.sampler.MulticoreEvalParallelSampler(n_procs: int = None, daemon: bool = True, pickle: bool = None, check_max_eval: bool = False)[source]

Bases: MultiCoreSampler

Multicore Evaluation parallel sampler.

Implements the same strategy as pyabc.sampler.RedisEvalParallelSampler or pyabc.sampler.DaskDistributedSampler.

However, parallelization is restricted to a single machine with multiple processes. This sampler has very low communication overhead and is thus suitable for short running model evaluations.

Requires no pickling of the sample_one, simulate_one and accept_one function. This is achieved using fork on linux (see Sampler).

The simulation results are still pickled as they are transmitted from the worker processes back to the parent process. Depending on the kind of summary statistics this can be fast or slow. If your summary statistics are only a dict with a couple of numbers, the overhead should not be substantial. However, if your summary statistics are large numpy arrays or similar, this could cause overhead

Parameters:: n_procs (int, optional) – If set to None, the Number of cores is determined according to pyabc.sge.nr_cores_available().

class pyabc.sampler.MulticoreParticleParallelSampler(n_procs: int = None, daemon: bool = True, pickle: bool = None, check_max_eval: bool = False)[source]

Bases: MultiCoreSampler

Samples on multiple cores using the multiprocessing module. This sampler is optimized for low latencies and is efficient, even if the individual model evaluations are fast.

Requires no pickling of the sample_one, simulate_one and accept_one function. This is achieved using fork on linux (see Sampler).

The simulation results are still pickled as they are transmitted from the worker processes back to the parent process. Depending on the kind of summary statistics this can be fast or slow. If your summary statistics are only a dict with a couple of numbers, the overhead should not be substantial. However, if your summary statistics are large numpy arrays or similar, this could cause overhead

Parameters:

n_procs (int, optional) – If set to None, the Number of cores is determined according to pyabc.sge.nr_cores_available().
warning:: (..) – Windows support is not tested. As there is no fork on Windows. This sampler might not work.

class pyabc.sampler.RedisEvalParallelSampler(host: str = 'localhost', port: int = 6379, password: str = None, batch_size: int = 1, look_ahead: bool = False, look_ahead_delay_evaluation: bool = True, max_n_eval_look_ahead_factor: float = 10.0, wait_for_all_samples: bool = False, adapt_look_ahead_proposal: bool = False, log_file: str = None)[source]

Bases: RedisSamplerBase

Redis based dynamic scheduling low latency sampler.

This sampler is well-performing in distributed environments. It is usually faster than the pyabc.sampler.DaskDistributedSampler for short model evaluation runtimes. The longer the model evaluation times, the less the advantage becomes. It requires a running Redis server as broker.

This sampler requires workers to be started via the command abc-redis-worker. An example call might look like abc-redis-worker --host=123.456.789.123 --runtime=2h to connect to a Redis server on IP 123.456.789.123 and to terminate the worker after finishing the first population which ends after 2 hours since worker start. So the actual runtime might be longer than 2h. See abc-redis-worker --help for its options.

Use the command abc-redis-manager to retrieve info on and stop the running workers.

Start as many workers as you wish. Workers can be dynamically added during the ABC run.

Currently, a server (specified via host and port) can only meaningfully handle one ABCSMC analysis at a time.

Parameters:

host – IP address or name of the Redis server. Default is “localhost”.
port – Port of the Redis server. Default is 6379.
password – Password for a protected server. Default is None (no protection).
batch_size – Number of model evaluations the workers perform before contacting the REDIS server. Defaults to 1. Increase this value if model evaluation times are short or the number of workers is large to reduce communication overhead.
look_ahead – Whether to start sampling for the next generation already with preliminary results although the current generation has not completely finished yet. This increases parallel efficiency, but can lead to a higher Monte-Carlo error.
look_ahead_delay_evaluation – In look-ahead mode, acceptance can be delayed until the final acceptance criteria for generation t have been decided. This is mandatory if the routine has adaptive components (distance, epsilon, …) besides the transition kernel. If not needed, enabling it may lead to a worse performance, especially if evaluation is costly compared to simulation, because evaluation happens sequentially on the main thread. Only effective if look_ahead is True.
max_n_eval_look_ahead_factor – In delayed evaluation, only this factor times the previous number of samples are generated, afterwards the workers wait. Does not apply if evaluation is not delayed. This allows to perform a reasonable number of evaluations only, as for short-running models the number of evaluations can otherwise explode unnecessarily.
wait_for_all_samples – Whether to wait for all simulations in an iteration to finish. If not, then the sampler only waits for all simulations that were started prior to the last started particle of the first n acceptances. Waiting for all should not be needed, this is for studying purposes.
adapt_look_ahead_proposal – In look-ahead mode, adapt the preliminary proposal based on previous acceptances. In theory, as long as proposal >> prior, everything is fine. However, in practice, given a finite sample size, in some cases the preliminary proposal may be biased towards earlier-accepted particles, which can induce a similar bias in the next accepted population. Thus, if any parameter dependent simulation time heterogeneity is to be expected, i.e. if different plausible parameter space regions come with different simulation times, then this flag should be set to False. If no such heterogeneity is to be expected, this flag can be set to True, which can result in improved performance due to a more tailored proposal distribution. Only effective if look_ahead is True.
log_file – A file for a dedicated sampler history. Updated in each iteration. This log file is complementary to the logging realized via the logging module.

__init__(host: str = 'localhost', port: int = 6379, password: str = None, batch_size: int = 1, look_ahead: bool = False, look_ahead_delay_evaluation: bool = True, max_n_eval_look_ahead_factor: float = 10.0, wait_for_all_samples: bool = False, adapt_look_ahead_proposal: bool = False, log_file: str = None)[source]

check_analysis_variables(distance_function: Distance, eps: Epsilon, acceptor: Acceptor) → None[source]: “Check analysis variables appropriateness for sampling.

clear_generation_t(t: int) → None[source]

Clean up after generation t has finished.

Parameters:: t (The time for which to clear.) –

create_sample(id_results: List[Tuple], n: int) → Sample[source]: Create a single sample result. Order the results by starting point to avoid a bias towards short-running simulations (dynamic scheduling).

generation_t_was_started(t: int) → bool[source]

Check whether generation t was started already.

Parameters:: t (The time for which to check.) –

maybe_start_next_generation(t: int, n: int, id_results: List, all_accepted: bool, ana_vars: AnalysisVars) → None[source]

Start the next generation already, if that looks reasonable.

Parameters:

t (The current time.) –
n (The current population size.) –
id_results (The so-far returned samples.) –
all_accepted (Whether all particles are accepted.) –
ana_vars (Analysis variables.) –

start_generation_t(n: int, t: int, simulate_one: Callable, all_accepted: bool, is_look_ahead: bool, max_n_eval_look_ahead: float = inf) → None[source]: Start generation t.

class pyabc.sampler.RedisEvalParallelSamplerServerStarter(password: str = None, batch_size: int = 1, wait_for_all_samples: bool = False, look_ahead: bool = False, look_ahead_delay_evaluation: bool = True, max_n_eval_look_ahead_factor: float = 10.0, adapt_look_ahead_proposal: bool = False, workers: int = 2, processes_per_worker: int = 1, daemon: bool = True, catch: bool = True, log_file: str = None)[source]

Bases: RedisEvalParallelSampler

Simple routine to start a dynamic redis-server. For the arguments see the base class.

__init__(password: str = None, batch_size: int = 1, wait_for_all_samples: bool = False, look_ahead: bool = False, look_ahead_delay_evaluation: bool = True, max_n_eval_look_ahead_factor: float = 10.0, adapt_look_ahead_proposal: bool = False, workers: int = 2, processes_per_worker: int = 1, daemon: bool = True, catch: bool = True, log_file: str = None)[source]

class pyabc.sampler.RedisStaticSampler(host: str = 'localhost', port: int = 6379, password: str = None, log_file: str = None)[source]

Bases: RedisSamplerBase

Redis based static scheduling sampler.

clear_generation_t(t: int) → None[source]

Clean up after generation t has finished.

Parameters:: t (The time for which to clear.) –

create_sample(samples: List[Sample], n: int) → Sample[source]: Create a single sample result. Order the results by starting point to avoid a bias towards short-running simulations (dynamic scheduling).

start_generation_t(n: int, t: int, simulate_one: Callable) → None[source]: Start generation t.

class pyabc.sampler.RedisStaticSamplerServerStarter(password: str = None, workers: int = 2, processes_per_worker: int = 1, daemon: bool = True, catch: bool = True, log_file: str = None)[source]

Bases: RedisStaticSampler

Simple routine to start a static redis-server. For the arguments see the base class.

__init__(password: str = None, workers: int = 2, processes_per_worker: int = 1, daemon: bool = True, catch: bool = True, log_file: str = None)[source]

class pyabc.sampler.Sampler[source]

Bases: ABC

Abstract Sampler base class.

Produce valid particles: pyabc.parameters.ValidParticle.

nr_evaluations_

This is set after a population and counts the total number of model evaluations. This can be used to calculate the acceptance rate.

Type:: int

sample_factory

A factory to create empty samples.

Type:: SampleFactory

show_progress

Whether to show progress within a generation. Some samplers support this by e.g. showing a progress bar. Set via >>> sampler = Sampler() >>> sampler.show_progress = True

Type:: bool

analysis_id

A universal unique id of the analysis, automatically generated by the inference routine.

Type:: str

__init__()[source]

check_analysis_variables(distance_function: Distance, eps: Epsilon, acceptor: Acceptor) → None[source]: Raise if any analysis variable is not conform with the sampler. This check serves in particular to ensure that all components are fit for look-ahead sampling. Default: Do nothing.

set_analysis_id(analysis_id: str)[source]: Set the analysis id. Called by the inference routine. The default is to just obediently set it. Specific samplers may want to check whether there are conflicting analyses.

stop() → None[source]: Stop the sampler. Called by the inference routine when an analysis is finished. Some samplers may need to e.g. finish ongoing processes or close servers.

class pyabc.sampler.SingleCoreSampler(check_max_eval: bool = False)[source]

Bases: Sampler

Sample on a single core. No parallelization.

Parameters:: check_max_eval (bool) – Whether to check the maximum number of evaluations on the fly.

__init__(check_max_eval: bool = False)[source]

pyabc.sampler.nr_cores_available()[source]

Determine the number of cores available: If set, the environment variable PYABC_NUM_PROCS is used, otherwise os.cpu_count() is used.

Returns:: nr_cores – The number of cores available.
Return type:: int