pyabc.inference

Inference

ABC inference routines. This is the analysis core module of pyABC.

class pyabc.inference.ABCSMC(models: list[~pyabc.model.model.Model] | ~pyabc.model.model.Model | ~collections.abc.Callable, parameter_priors: list[~pyabc.random_variables.random_variables.Distribution] | ~pyabc.random_variables.random_variables.Distribution | ~collections.abc.Callable, distance_function: ~pyabc.distance.base.Distance | ~collections.abc.Callable = None, population_size: ~pyabc.populationstrategy.populationstrategy.PopulationStrategy | int = 100, summary_statistics: ~collections.abc.Callable[[~pyabc.inference.smc.model_output], dict] = <function identity>, model_prior: ~pyabc.random_variables.random_variables.RV = None, model_perturbation_kernel: ~pyabc.transition.model.ModelPerturbationKernel = None, transitions: list[~pyabc.transition.base.Transition] | ~pyabc.transition.base.Transition = None, eps: ~pyabc.epsilon.base.Epsilon = None, sampler: ~pyabc.sampler.base.Sampler = None, acceptor: ~pyabc.acceptor.acceptor.Acceptor = None, stop_if_only_single_model_alive: bool = False, max_nr_recorded_particles: int = inf)[source]

Bases: object

Approximate Bayesian Computation - Sequential Monte Carlo (ABCSMC).

This is an implementation of an ABCSMC algorithm similar to [1].

The ABCSMC class is the most central class of the pyABC package. Most of the other classes serve to configure it (i.e. the other classes implement a strategy pattern).

Parameters:

models –
Can be a list of models, a single model, a list of functions, or a single function.
- If models is a function, then the function should have a single parameter, which is of dictionary type, and should return a single dictionary, which contains the simulated data.
- If models is a list of functions, then the first point applies to each function.
- Models can also be a list of Model instances or a single Model instance.
This model’s output is passed to the summary statistics calculation. Per default, the model is assumed to already return the calculated summary statistics. Accordingly, the default summary_statistics function is just the identity. Note that the sampling and evaluation of particles happens in the model’s methods, so overriding these offers a great deal of flexibility, in particular the freedom to use or ignore the distance_function, summary_statistics, and eps parameters here.
parameter_priors – A list of prior distributions for the models’ parameters. Each list entry is the prior distribution for the corresponding model.
distance_function – Measures the distance of the tentatively sampled particle to the measured data.
population_size – Specify the size of the population. If population_specification is an int, then the size is constant. Adaptive population sizes are also possible by passing a pyabc.populationstrategy.PopulationStrategy object. The default is 100 particles per population.
summary_statistics – A function which takes the raw model output as returned by any ot the models and calculates the corresponding summary statistics. Note that the default value is just the identity function. I.e. the model is assumed to already calculate the summary statistics. However, in a model selection setting it can make sense to have the model produce some kind or raw output and then take the same summary statistics function for all the models.
model_prior – A random variable giving the prior weights of the model classes. The default is a uniform prior over the model classes, RV("randint", 0, len(models)).
model_perturbation_kernel – Kernel which governs with which probability to switch from one model to another model for a given sample while generating proposals for the subsequent population from the current population.
transitions – A list of pyabc.transition.Transition objects or a single pyabc.transition.Transition in case of a single model. Defaults to multivariate normal transitions for every model.
eps – Accepts any pyabc.epsilon.Epsilon subclass. The default is the pyabc.epsilon.MedianEpsilon which adapts automatically. The object passed here determines how the acceptance threshold scheduling is performed.
sampler – In some cases, a mapper implementation will require initialization to run properly, e.g. database connection, grid setup, etc.. The sampler is an object that encapsulates this information. The default sampler pyabc.sampler.MulticoreEvalParallelSampler will parallelize across the cores of a single machine only.
acceptor – Takes a distance function, summary statistics and an epsilon threshold to decide about acceptance of a particle. Argument accepts any subclass of pyabc.acceptor.Acceptor, or a function convertible to an acceptor. Defaults to a pyabc.acceptor.UniformAcceptor.
stop_if_only_single_model_alive – Defaults to False. Set this to true if you want to stop ABCSMC automatically as soon as only a single model has survived.
max_nr_recorded_particles – Defaults to inf. Set this to the maximum number of accepted and rejected particles that methods like the AdaptivePNormDistance function use to update themselves each iteration.

__init__(models: list[~pyabc.model.model.Model] | ~pyabc.model.model.Model | ~collections.abc.Callable, parameter_priors: list[~pyabc.random_variables.random_variables.Distribution] | ~pyabc.random_variables.random_variables.Distribution | ~collections.abc.Callable, distance_function: ~pyabc.distance.base.Distance | ~collections.abc.Callable = None, population_size: ~pyabc.populationstrategy.populationstrategy.PopulationStrategy | int = 100, summary_statistics: ~collections.abc.Callable[[~pyabc.inference.smc.model_output], dict] = <function identity>, model_prior: ~pyabc.random_variables.random_variables.RV = None, model_perturbation_kernel: ~pyabc.transition.model.ModelPerturbationKernel = None, transitions: list[~pyabc.transition.base.Transition] | ~pyabc.transition.base.Transition = None, eps: ~pyabc.epsilon.base.Epsilon = None, sampler: ~pyabc.sampler.base.Sampler = None, acceptor: ~pyabc.acceptor.acceptor.Acceptor = None, stop_if_only_single_model_alive: bool = False, max_nr_recorded_particles: int = inf)[source]

check_terminate(t: int, acceptance_rate: float) → bool[source]

Check whether any termination criterion is met.

Parameters:

t (Current time point.)
acceptance_rate (Acceptance rate in current generation.)

Returns:

terminate

Return type:

Whether to terminate (True) or not (False).

initialize_components_before_run(minimum_epsilon: float, max_nr_populations: int, min_acceptance_rate: float, max_total_nr_simulations: int, max_walltime: timedelta, min_eps_diff: float) → int[source]

Initialize everything before starting a run.

In particular sets variables corresponding to arguments, and performs sampling based initialization for e.g. distance and epsilon, if these are adaptive.

All parameters are as for run().

Returns:: t0
Return type:: The initial time point from which to start the next generation.

load(db: str, abc_id: int = 1, observed_sum_stat: dict = None) → History[source]

Load an ABC-SMC run for continuation.

Parameters:

db (str) – A SQLAlchemy database identifier pointing to the database from which to continue a run.
abc_id (int, optional) – The id of the ABC-SMC run in the database which is to be continued. The default is 1. If more than one ABC-SMC run is stored, use the abc_id parameter to indicate which one to continue.
observed_sum_stat (dict, optional) – The observed summary statistics. This field should be used only if the summary statistics cannot be reproduced exactly from the database (in particular when they are no numpy or pandas objects, e.g. when they were generated in R). If None, then the summary statistics are read from the history.

new(db: str, observed_sum_stat: dict = None, *, gt_model: int = None, gt_par: dict = None, meta_info=None) → History[source]

Make a new ABCSMC run.

Parameters:

db (str) –
Has to be a valid SQLAlchemy database identifier. This indicates the database to be used (and created if necessary and possible) for the ABC-SMC run.

To use an in-memory database pass “sqlite://”. Note that in-memory databases are only available on the master mode. If workers are started on different nodes they won’t be able to access the database. This should not be a problem in most scenarios. The in-memory option is mainly useful for benchmarking (and maybe) for testing.
observed_sum_stat (dict, optional) –
This is the really important parameter here. It is of the form {'statistic_1': val_1, 'statistic_2': val_2, ... }.

The dictionary provided here represents the measured data. Particle during ABCSMC sampling are compared against the summary statistics provided here.

The summary statistics’ values can be integers, floats, strings and everything which is a numpy array or can be converted to one (e.g. lists). In addition, pandas.DataFrames can also be used as summary statistics. Note that storage of pandas DataFrames in pyABC’s database is still considered experimental.

This parameter is optional, as the distance function might implement comparison to the observed data on its own. Not giving this parameter is equivalent to passing an empty dictionary {}.
gt_model (int, optional) – This is only meta data stored to the database, but not actually used for the ABCSMC algorithm. If you want to predict your ABCSMC procedure against synthetic samples, you can use this parameter to indicate the ground truth model number. This helps with further analysis. If you use actually measured data (and don’t know the ground truth) you don’t have to set this.
gt_par (dict, optional) – Similar to ground_truth_model, this is only for recording purposes in the database, but not used in the ABCSMC algorithm. This stores the parameters of the ground truth model if it was synthetically obtained. Don’t give this parameter if you don’t know the ground truth.
meta_info (dict, optional) – Can contain an arbitrary number of keys, only for recording purposes. Store arbitrary meta information in this dictionary. Can be used for really anything. This dictionary is stored in the database.

Returns:

history – The history, with set history.id, which is the id under which the generated ABCSMC run entry in the database can be identified.

Return type:

History

run_generation(t: int) → dict[source]

Run a single generation.

Parameters:: t (Generation time index to run for.)
Returns:: Dictionary with entries “successful” indicating whether the generation terminated successfully, and potentially “acceptance_rate”.
Return type:: ret