Parallel Approximate Bayesian Computation  Sequential Monte Carlo¶
The ABCSMC class is the most central class of the pyABC package. Most of the other classes serve to configure it. (I.e. the other classes implement a Strategy pattern.)

class
pyabc.smc.
ABCSMC
(models: Union[typing.List[pyabc.model.Model], pyabc.model.Model], parameter_priors: Union[typing.List[pyabc.random_variables.Distribution], pyabc.random_variables.Distribution, typing.Callable], distance_function, population_size: Union[pyabc.populationstrategy.PopulationStrategy, int] = 100, summary_statistics: Callable[model_output, dict] = <function identity>, model_prior: pyabc.random_variables.RV = None, model_perturbation_kernel: pyabc.random_variables.ModelPerturbationKernel = None, transitions: List[pyabc.transition.base.Transition] = None, eps: pyabc.epsilon.Epsilon = None, sampler=None)¶ Bases:
object
Approximate Bayesian Computation  Sequential Monte Carlo (ABCSMC).
This is an implementation of an ABCSMC algorithm similar to [1]
Parameters:  models (list of models, single model, single function or list of functions) –
 If models is a function, then the function should have a single parameter, which is of dictionary type, and should return a single dictionary, which contains the simulated data.
 If models is a list of functions, then the first point applies to each function.
 Models can also be a list of Model instances or a single Model instance.
This model’s output is passed to the summary statistics calculation. Per default, the model is assumed to already return the calculated summary statistics. Accordingly, the default summary_statistics function is just the identity.
 parameter_priors (List[Distribution]) – A list of prior distributions for the models’ parameters. Each list entry is the prior distribution for the corresponding model.
 distance_function (DistanceFunction, optional) – Measures the distance of the tentatively sampled particle to the measured data.
 population_size (int, PopulationStrategy, optional) – Specify the size of the population.
If
population_specification
is anint
, then the size is constant. Adaptive population sizes are also possible by passing apyabc.populationstrategy.PopulationStrategy
object. The default is 100 particles per population.  summary_statistics (Callable[[model_output], dict]) – A function which takes the raw model output as returned by any ot the models and calculates the corresponding summary statistics. Note that the default value is just the identity function. I.e. the model is assumed to already calculate the summary statistics. However, in a model selection setting it can make sense to have the model produce some kind or raw output and then take the same summary statistics function for all the models.
 model_prior (RV, optional) – A random variable giving the prior weights of the model classes.
The default is a uniform prior over the model classes,
RV("randint", 0, len(models))
.  model_perturbation_kernel (ModelPerturbationKernel) – Kernel which governs with which probability to switch from one model to another model for a given sample while generating proposals for the subsequent population from the current population.
 transitions (List[Transition], Transition, optional) – A list of
pyabc.transition.Transition
objects or a singlepyabc.transition.Transition
in case of a single model. Defaults to multivariate normal transitions for every model.  eps (Epsilon, optional) – Accepts any
pyabc.epsilon.Epsilon
subclass. The default is thepyabc.epsilon.MedianEpsilon
which adapts automatically. The object passed here determines how the acceptance threshold scheduling is performed.  sampler (Sampler, optional) – In some cases, a mapper implementation will require initialization
to run properly, e.g. database connection, grid setup, etc..
The sampler is an object that encapsulates this information.
The default sampler
pyabc.sampler.MulticoreEvalParallelSampler
will parallelize across the cores of a single machine only.

stop_if_only_single_model_alive
¶ bool – Defaults to False. Set this to true if you want to stop ABCSMC automatically as soon as only a single model has survived.
[1] Toni, Tina, and Michael P. H. Stumpf. “SimulationBased Model Selection for Dynamical Systems in Systems and Population Biology.” Bioinformatics 26, no. 1 (2010): 104–10. doi:10.1093/bioinformatics/btp619. 
load
(db: str, abc_id: int = 1) → int¶ Load an ABCSMC run for continuation.
Parameters:  db (str) – A SQLAlchemy database identifier pointing to the database from which to continue a run.
 abc_id (int, optional) – The id of the ABCSMC run in the database which is to be continued.
The default is 1. If more than one ABCSMC run is stored, use
the
abc_id
parameter to indicate which one to continue.
Note
The Epsilon’s and distance function’s initialize methods are not called when an ABCSMC run is loaded.

new
(db: str, observed_sum_stat: dict = None, *, gt_model: int = None, gt_par: dict = None, meta_info=None) → int¶ Make a new ABCSMC run.
Parameters:  db (str) –
Has to be a valid SQLAlchemy database identifier. This indicates the database to be used (and created if necessary and possible) for the ABCSMC run.
To use an inmemory database pass “sqlite://”. Note that inmemory databases are only available on the master mode. If workers are started on different nodes they won’t be able to access the database. This should not be a problem in most scenarios. The inmemory option is mainly useful for benchmarking (and maybe) for testing.
 observed_sum_stat (dict, optional) –
This is the really important parameter here. It is of the form
{'statistic_1': val_1, 'statistic_2': val_2, ... }
.The dictionary provided here represents the measured data. Particle during ABCSMC sampling are compared against the summary statistics provided here.
The summary statistics’ values can be integers, floats, strings and everything which is a numpy array or can be converted to one (e.g. lists). In addition, pandas.DataFrames can also be used as summary statistics. Note that storage of pandas DataFrames in pyABC’s database is still considered experimental.
This parameter is optional, as the distance function might implement comparison to the observed data on its own. Not giving this parameter is equivalent to passing an empty dictionary
{}
.  gt_model (int, optional) – This is only meta data stored to the database, but not actually used for the ABCSMC algorithm If you want to predict your ABCSMC procedure against synthetic samples, you can use this parameter to indicate the ground truth model number. This helps with further analysis. If you use actually measured data (and don’t know the ground truth) you don’t have to set this.
 gt_par (dict, optional) – Similar to
ground_truth_model
, this is only for recording purposes in the database, but not used in the ABCSMC algorithm. This stores the parameters of the ground truth model if it was synthetically obtained. Don’t give this parameter if you don’t know the ground truth.  meta_info (dict, optional) – Can contain an arbitrary number of keys, only for recording purposes. Store arbitrary meta information in this dictionary. Can be used for really anything. This dictionary is stored in the database.
 db (str) –

run
(minimum_epsilon: float, max_nr_populations: int, min_acceptance_rate: float = 0.0, **kwargs) → pyabc.storage.history.History¶ Run the ABCSMC model selection until either of the stopping criteria is met.
Parameters:  minimum_epsilon (float) – Stop if epsilon is smaller than minimum epsilon specified here.
 max_nr_populations (int) – The maximum number of populations. Stop if this number is reached.
 min_acceptance_rate (float, optional) – Minimal allowed acceptance rate. Sampling stops if a population has a lower rate.
Population after population is sampled and particles which are close enough to the observed data are accepted and added to the next population. If an adaptive Epsilon is specified (this is the default), then the acceptance threshold decreases from population to population automatically in a data dependent way.
Sampling of further populations is stopped, when either of the three stopping criteria is met:
 the maximum number of populations
max_nr_populations
is reached,  the acceptance threshold for the last sampled population was
smaller than
minimum_epsilon
,  or the acceptance rate dropped below
acceptance_rate
.
The value of
minimum_epsilon
determines the quality of the ABCSMC approximation. The smaller the better. But sampling time also increases with decreasingminimum_epsilon
.This method can be called repeatedly to sample further populations after sampling was stopped once.

set_data
(observed_summary_statistics: dict, abc_options: Union[dict, str], ground_truth_model: int = 1, ground_truth_parameter: dict = None)¶ This method is an alias for the
new
method. This method is deprecated and to be removed in future releases. Use thenew
method instead. Note that the argument order has changed!Warning
The method “set_data” is deprecated.
 models (list of models, single model, single function or list of functions) –