pyabc.distance

Distances

Distance functions or metrics measure closeness of observed and sampled data. This module implements various commonly used distance functions for ABC, featuring a few advanced concepts.

For custom distance functions, either pass a plain function to ABCSMC, or subclass the pyabc.Distance class.

class pyabc.distance.AcceptAllDistance[source]

Bases: Distance

Just a mock distance function which always returns -1. So any sample should be accepted for any sane epsilon object.

Can be used for testing.

__call__(x: dict, x_0: dict, t: int = None, par: dict = None) float[source]

Evaluate at time point t the distance of the summary statistics of the data simulated for the tentatively sampled particle to those of the observed data.

Abstract method. This method has to be overwritten by all concrete implementations.

Parameters:
  • x (dict) – Summary statistics of the data simulated for the tentatively sampled parameter.

  • x_0 (dict) – Summary statistics of the observed data.

  • t (int) – Time point at which to evaluate the distance. Usually, the distance will not depend on the time.

  • par (dict) – The parameters used to create the summary statistics x. These can be required by some distance functions. Usually, the distance will not depend on the parameters.

Returns:

distance – Quantifies the distance between the summary statistics of the data simulated for the tentatively sampled particle and of the observed data.

Return type:

float

class pyabc.distance.AdaptiveAggregatedDistance(distances: List[Distance], initial_weights: List = None, factors: List | dict = None, adaptive: bool = True, scale_function: Callable = None, log_file: str = None)[source]

Bases: AggregatedDistance

Adapt the weights of AggregatedDistances automatically over time.

Parameters:
  • distances – As in AggregatedDistance.

  • initial_weights – Weights to be used in the initial iteration. List with a weight for each distance function.

  • factors – As in AggregatedDistance.

  • adaptive – True: Adapt weights after each iteration. False: Adapt weights only once at the beginning in initialize(). This corresponds to a pre-calibration.

  • scale_function – Function that takes a np.ndarray of shape (n_sample,), namely the values obtained by applying one of the distances on a set of samples, and returns a single float, namely the weight to apply to this distance function. Default: span.

  • log_file – A log file to store weights for each time point in. Weights are currently not stored in the database. The data are saved in json format and can be retrieved via pyabc.storage.load_dict_from_json.

__init__(distances: List[Distance], initial_weights: List = None, factors: List | dict = None, adaptive: bool = True, scale_function: Callable = None, log_file: str = None)[source]
Parameters:
  • distances (List) – The distance functions to apply.

  • weights (Union[List, dict], optional (default = [1,...])) – The weights to apply to the distances when taking the sum. Can be a list with entries in the same order as the distances, or a dictionary of lists, with the keys being the single time points (if the weights should be iteration-specific).

  • factors (Union[List, dict], optional (dfault = [1,...])) – Scaling factors that the weights are multiplied with. The same structure applies as to weights. If None is passed, a factor of 1 is considered for every summary statistic. Note that in this class, factors are superfluous as everything can be achieved with weights alone, however in subclasses the factors can remain static while weights adapt over time, allowing for greater flexibility.

configure_sampler(sampler) None[source]

Make the sampler return also rejected particles, because these are needed to get a better estimate of the summary statistic variability, avoiding a bias to accepted ones only.

Parameters:

sampler (Sampler) – The sampler employed.

initialize(t: int, get_sample: Callable[[], Sample], x_0: dict, total_sims: int)[source]

Initialize weights.

is_adaptive() bool[source]

Whether the class is dynamically updated after each generation, based on the last generation’s available data. Default: False.

requires_calibration() bool[source]

Whether the class requires an initial calibration, based on samples from the prior. Default: False.

update(t: int, get_sample: Callable[[], Sample], total_sims: int)[source]

Update weights based on all simulations.

class pyabc.distance.AdaptivePNormDistance(p: float = 1, initial_scale_weights: Dict[str, float] = None, fixed_weights: Dict[str, float] = None, fit_scale_ixs: EventIxs | Collection[int] | int = inf, scale_function: Callable = None, max_scale_weight_ratio: float = None, scale_log_file: str = None, all_particles_for_scale: bool = True, sumstat: Sumstat = None)[source]

Bases: PNormDistance

In the p-norm distance, adapt the weights for each generation, based on the previous simulations. This class is motivated by [1].

Parameters:
  • p – p for p-norm. Required p >= 1, p = np.inf allowed (infinity-norm).

  • initial_scale_weights – Scale weights to be used in the initial iteration. Dictionary with observables as keys and weights as values.

  • fixed_weights – Fixed multiplicative factors the weights are multiplied with, to e.g. account for heterogeneous numbers of data points. The discrimination of various weight types makes only sense for adaptive distances.

  • fit_scale_ixs – Generation indices before which to (re)fit the scale weights. Inf (default) means in every generation. For other values see pyabc.EventIxs.

  • scale_function – (data: list, x_0: float) -> scale: float. Computes the scale (i.e. inverse weight s = 1 / w) for a given summary statistic. Here, data denotes the list of simulated summary statistics, and x_0 the observed summary statistic. Implemented are absolute_median_deviation, standard_deviation (default), centered_absolute_median_deviation, centered_standard_deviation.

  • max_scale_weight_ratio – If not None, extreme scale weights will be bounded by the ratio times the smallest non-zero absolute scale weight. In practice usually not necessary, it is theoretically required to ensure convergence if weights are refitted in infinitely many iterations.

  • scale_log_file – A log file to store scale weights for each time point in. Weights are currently not stored in the database. The data are saved in json format and can be retrieved via pyabc.storage.load_dict_from_json.

  • all_particles_for_scale – Whether to include also rejected particles for scale calculation (True) or only accepted ones (False).

  • sumstat – Summary statistics. Defaults to an identity mapping.

__init__(p: float = 1, initial_scale_weights: Dict[str, float] = None, fixed_weights: Dict[str, float] = None, fit_scale_ixs: EventIxs | Collection[int] | int = inf, scale_function: Callable = None, max_scale_weight_ratio: float = None, scale_log_file: str = None, all_particles_for_scale: bool = True, sumstat: Sumstat = None)[source]
configure_sampler(sampler) None[source]

Make the sampler return also rejected particles, because these are needed to get a better estimate of the summary statistic variability, avoiding a bias to accepted ones only.

Parameters:

sampler (Sampler) – The sampler employed.

fit_scales(t: int, sample: Sample) None[source]

Here the real weight update happens.

get_config() dict[source]

Return configuration of the distance.

Returns:

config – Dictionary describing the distance.

Return type:

dict

get_weights(t: int) ndarray[source]

Compute weights for time t.

Generates weights from the multiple possible contributing factors. Overwrite in subclasses if there are additional weights.

Parameters:

t (Current time point.) –

Return type:

The combined weights.

initialize(t: int, get_sample: Callable[[], Sample], x_0: dict, total_sims: int) None[source]

Initialize before the first generation.

Called at the beginning by the inference routine, can be used for calibration to the problem. The default is to do nothing.

Parameters:
  • t – Time point for which to initialize the distance.

  • get_sample – Returns on command the initial sample.

  • x_0 – The observed summary statistics.

  • total_sims – The total number of simulations so far.

is_adaptive() bool[source]

Whether the class is dynamically updated after each generation, based on the last generation’s available data. Default: False.

requires_calibration() bool[source]

Whether the class requires an initial calibration, based on samples from the prior. Default: False.

update(t: int, get_sample: Callable[[], Sample], total_sims: int) bool[source]

Update for the upcoming generation t.

Similar to initialize, however called for every subsequent iteration. The default is to do nothing.

Parameters:
  • t – Time point for which to update the distance.

  • get_sample – Returns on demand the last generation’s complete sample.

  • total_sims – The total number of simulations so far.

Returns:

is_updated – Whether the distance has changed compared to beforehand. Depending on the result, the population needs to be updated in ABCSMC before preparing the next generation. Defaults to False.

Return type:

bool

class pyabc.distance.AggregatedDistance(distances: List[Distance | Callable], weights: List | dict = None, factors: List | dict = None)[source]

Bases: Distance

Aggregates a list of distance functions, all of which may work on subparts of the summary statistics. Then computes and returns the weighted sum of the distance values generated by the various distance functions.

All class functions are propagated to the children and the obtained results aggregated appropriately.

__call__(x: dict, x_0: dict, t: int = None, par: dict = None) float[source]

Applies all distance functions and computes the weighted sum of all obtained values.

__init__(distances: List[Distance | Callable], weights: List | dict = None, factors: List | dict = None)[source]
Parameters:
  • distances (List) – The distance functions to apply.

  • weights (Union[List, dict], optional (default = [1,...])) – The weights to apply to the distances when taking the sum. Can be a list with entries in the same order as the distances, or a dictionary of lists, with the keys being the single time points (if the weights should be iteration-specific).

  • factors (Union[List, dict], optional (dfault = [1,...])) – Scaling factors that the weights are multiplied with. The same structure applies as to weights. If None is passed, a factor of 1 is considered for every summary statistic. Note that in this class, factors are superfluous as everything can be achieved with weights alone, however in subclasses the factors can remain static while weights adapt over time, allowing for greater flexibility.

configure_sampler(sampler)[source]

Note: configure_sampler is applied by all distances sequentially, so care must be taken that they perform no contradictory operations on the sampler.

static format_dict(w, t, n_distances, default_val=1.0)[source]

Normalize weight or factor dictionary to the employed format.

get_config() dict[source]

Return configuration of the distance.

Returns:

config – Dictionary describing the distance.

Return type:

dict

static get_for_t_or_latest(w, t)[source]

Extract values from dict for given time point.

initialize(t: int, get_sample: Callable[[], Sample], x_0: dict, total_sims: int)[source]

Initialize before the first generation.

Called at the beginning by the inference routine, can be used for calibration to the problem. The default is to do nothing.

Parameters:
  • t – Time point for which to initialize the distance.

  • get_sample – Returns on command the initial sample.

  • x_0 – The observed summary statistics.

  • total_sims – The total number of simulations so far.

is_adaptive() bool[source]

Whether the class is dynamically updated after each generation, based on the last generation’s available data. Default: False.

requires_calibration() bool[source]

Whether the class requires an initial calibration, based on samples from the prior. Default: False.

update(t: int, get_sample: Callable[[], Sample], total_sims: int) bool[source]

The sum_stats are passed on to all distance functions, each of which may then update using these. If any update occurred, a value of True is returned indicating that e.g. the distance may need to be recalculated since the underlying distances changed.

class pyabc.distance.BinomialKernel(p: float | Callable, ret_scale: str = 'SCALE_LOG', keys: List[str] = None, pdf_max: float = None)[source]

Bases: StochasticKernel

A kernel with a binomial probability mass function.

Parameters:
  • p (Union[float, Callable]) – The success probability.

  • ret_scale (See StochasticKernel.) –

  • keys (See StochasticKernel.) –

  • pdf_max (See StochasticKernel.) –

__call__(x: dict, x_0: dict, t: int = None, par: dict = None) float[source]

Evaluate at time point t the distance of the summary statistics of the data simulated for the tentatively sampled particle to those of the observed data.

Abstract method. This method has to be overwritten by all concrete implementations.

Parameters:
  • x (dict) – Summary statistics of the data simulated for the tentatively sampled parameter.

  • x_0 (dict) – Summary statistics of the observed data.

  • t (int) – Time point at which to evaluate the distance. Usually, the distance will not depend on the time.

  • par (dict) – The parameters used to create the summary statistics x. These can be required by some distance functions. Usually, the distance will not depend on the parameters.

Returns:

distance – Quantifies the distance between the summary statistics of the data simulated for the tentatively sampled particle and of the observed data.

Return type:

float

__init__(p: float | Callable, ret_scale: str = 'SCALE_LOG', keys: List[str] = None, pdf_max: float = None)[source]
initialize(t: int, get_sample: Callable[[], Sample], x_0: dict, total_sims: int)[source]

Remember the summary statistic keys in sorted order, if not set in __init__ already.

class pyabc.distance.Distance[source]

Bases: ABC

Abstract base class for distance objects.

Any object that computes the similarity between observed and simulated data should inherit from this class.

abstract __call__(x: dict, x_0: dict, t: int = None, par: dict = None) float[source]

Evaluate at time point t the distance of the summary statistics of the data simulated for the tentatively sampled particle to those of the observed data.

Abstract method. This method has to be overwritten by all concrete implementations.

Parameters:
  • x (dict) – Summary statistics of the data simulated for the tentatively sampled parameter.

  • x_0 (dict) – Summary statistics of the observed data.

  • t (int) – Time point at which to evaluate the distance. Usually, the distance will not depend on the time.

  • par (dict) – The parameters used to create the summary statistics x. These can be required by some distance functions. Usually, the distance will not depend on the parameters.

Returns:

distance – Quantifies the distance between the summary statistics of the data simulated for the tentatively sampled particle and of the observed data.

Return type:

float

configure_sampler(sampler)[source]

Configure the sampler.

This method is called by the inference routine at the beginning. A possible configuration would be to request also the storing of rejected particles. The default is to do nothing.

Parameters:

sampler (Sampler) – The used sampler.

get_config() dict[source]

Return configuration of the distance.

Returns:

config – Dictionary describing the distance.

Return type:

dict

initialize(t: int, get_sample: Callable[[], Sample], x_0: dict, total_sims: int)[source]

Initialize before the first generation.

Called at the beginning by the inference routine, can be used for calibration to the problem. The default is to do nothing.

Parameters:
  • t – Time point for which to initialize the distance.

  • get_sample – Returns on command the initial sample.

  • x_0 – The observed summary statistics.

  • total_sims – The total number of simulations so far.

is_adaptive() bool[source]

Whether the class is dynamically updated after each generation, based on the last generation’s available data. Default: False.

requires_calibration() bool[source]

Whether the class requires an initial calibration, based on samples from the prior. Default: False.

to_json() str[source]

Return JSON encoded configuration of the distance.

Returns:

json_str – JSON encoded string describing the distance. The default implementation is to try to convert the dictionary returned by get_config.

Return type:

str:

update(t: int, get_sample: Callable[[], Sample], total_sims: int) bool[source]

Update for the upcoming generation t.

Similar to initialize, however called for every subsequent iteration. The default is to do nothing.

Parameters:
  • t – Time point for which to update the distance.

  • get_sample – Returns on demand the last generation’s complete sample.

  • total_sims – The total number of simulations so far.

Returns:

is_updated – Whether the distance has changed compared to beforehand. Depending on the result, the population needs to be updated in ABCSMC before preparing the next generation. Defaults to False.

Return type:

bool

class pyabc.distance.DistanceWithMeasureList(measures_to_use='all')[source]

Bases: Distance

Base class for distance functions with measure list. This class is not functional on its own.

Parameters:

measures_to_use (Union[str, List[str]].) –

  • If set to “all”, all measures are used. This is the default.

  • If a list is provided, the measures in the list are used.

  • measures refers to the summary statistics.

__init__(measures_to_use='all')[source]
get_config()[source]

Return configuration of the distance.

Returns:

config – Dictionary describing the distance.

Return type:

dict

initialize(t: int, get_sample: Callable[[], Sample], x_0: dict, total_sims: int)[source]

Initialize before the first generation.

Called at the beginning by the inference routine, can be used for calibration to the problem. The default is to do nothing.

Parameters:
  • t – Time point for which to initialize the distance.

  • get_sample – Returns on command the initial sample.

  • x_0 – The observed summary statistics.

  • total_sims – The total number of simulations so far.

class pyabc.distance.FunctionDistance(fun)[source]

Bases: Distance

This is a wrapper around a simple function which calculates the distance. If a function/callable is passed to the ABCSMC class, which is not subclassed from pyabc.Distance, then it is converted to an instance of the SimpleFunctionDistance class.

Parameters:

fun (Callable[[dict, dict], float]) – A Callable accepting as parameters (a subset of) the arguments of the pyabc.Distance.__call__ function. Usually at least the summary statistics x and x_0. Returns the distance between both.

__call__(x: dict, x_0: dict, t: int = None, par: dict = None) float[source]

Evaluate at time point t the distance of the summary statistics of the data simulated for the tentatively sampled particle to those of the observed data.

Abstract method. This method has to be overwritten by all concrete implementations.

Parameters:
  • x (dict) – Summary statistics of the data simulated for the tentatively sampled parameter.

  • x_0 (dict) – Summary statistics of the observed data.

  • t (int) – Time point at which to evaluate the distance. Usually, the distance will not depend on the time.

  • par (dict) – The parameters used to create the summary statistics x. These can be required by some distance functions. Usually, the distance will not depend on the parameters.

Returns:

distance – Quantifies the distance between the summary statistics of the data simulated for the tentatively sampled particle and of the observed data.

Return type:

float

__init__(fun)[source]
get_config()[source]

Return configuration of the distance.

Returns:

config – Dictionary describing the distance.

Return type:

dict

static to_distance(maybe_distance: Callable | Distance) Distance[source]
Parameters:
  • maybe_distance (either a Callable as in FunctionDistance, or a) –

  • object. (pyabc.Distance) –

Return type:

A Distance instance.

class pyabc.distance.FunctionKernel(fun: Callable, ret_scale: str = 'SCALE_LIN', keys: List[str] = None, pdf_max: float = None)[source]

Bases: StochasticKernel

This is a wrapper around a simple function which calculates the probability density.

Parameters:
  • fun (Callable) – A Callable accepting __call__’s parameters. The function should be a pdf or pmf.

  • ret_scale (as in StochasticKernel) –

  • keys (as in StochasticKernel) –

  • pdf_max (as in StochasticKernel) –

__call__(x: dict, x_0: dict, t: int = None, par: dict = None) float[source]

Evaluate at time point t the distance of the summary statistics of the data simulated for the tentatively sampled particle to those of the observed data.

Abstract method. This method has to be overwritten by all concrete implementations.

Parameters:
  • x (dict) – Summary statistics of the data simulated for the tentatively sampled parameter.

  • x_0 (dict) – Summary statistics of the observed data.

  • t (int) – Time point at which to evaluate the distance. Usually, the distance will not depend on the time.

  • par (dict) – The parameters used to create the summary statistics x. These can be required by some distance functions. Usually, the distance will not depend on the parameters.

Returns:

distance – Quantifies the distance between the summary statistics of the data simulated for the tentatively sampled particle and of the observed data.

Return type:

float

__init__(fun: Callable, ret_scale: str = 'SCALE_LIN', keys: List[str] = None, pdf_max: float = None)[source]
class pyabc.distance.IndependentLaplaceKernel(scale: Callable | Sequence[float] | float = None, keys: List[str] = None, pdf_max: float = None)[source]

Bases: StochasticKernel

This kernel can be used for efficient computations of large-scale independent Laplace distributions, performing computations directly on a log-scale to avoid numeric issues. In each coordinate, a 1-dim Laplace distribution

\[p(x) = \frac{1}{2b}\exp (\frac{1}{b}|x-a|)\]

is assumed.

Parameters:
  • scale (Union[array_like, float, Callable], optional (default = ones vector)) – Scale terms b of the distribution. Can also be a Callable taking as arguments the parameters. In that case, pdf_max should also be given if it is supposed to be used. Usually, it will then be given as the density at the observed statistics assuming the minimum allowed variance.

  • keys (As in StochasticKernel.) –

  • pdf_max (As in StochasticKernel.) –

__call__(x: dict, x_0: dict, t: int = None, par: dict = None)[source]

Evaluate at time point t the distance of the summary statistics of the data simulated for the tentatively sampled particle to those of the observed data.

Abstract method. This method has to be overwritten by all concrete implementations.

Parameters:
  • x (dict) – Summary statistics of the data simulated for the tentatively sampled parameter.

  • x_0 (dict) – Summary statistics of the observed data.

  • t (int) – Time point at which to evaluate the distance. Usually, the distance will not depend on the time.

  • par (dict) – The parameters used to create the summary statistics x. These can be required by some distance functions. Usually, the distance will not depend on the parameters.

Returns:

distance – Quantifies the distance between the summary statistics of the data simulated for the tentatively sampled particle and of the observed data.

Return type:

float

__init__(scale: Callable | Sequence[float] | float = None, keys: List[str] = None, pdf_max: float = None)[source]
initialize(t: int, get_sample: Callable[[], Sample], x_0: dict, total_sims: int)[source]

Remember the summary statistic keys in sorted order, if not set in __init__ already.

class pyabc.distance.IndependentNormalKernel(var: Callable | Sequence[float] | float = None, keys: List[str] = None, pdf_max: float = None)[source]

Bases: StochasticKernel

This kernel can be used for efficient computations of large-scale independent normal distributions, circumventing the covariance matrix, and performing computations directly on a log-scale to avoid numeric issues.

Parameters:
  • var (Union[array_like, float, Callable], optional (default = ones vector)) – Variances of the distribution (assuming zeros in the off-diagonal of the covariance matrix). Can also be a Callable taking as arguments the parameters. In that case, pdf_max should also be given if it is supposed to be used. Usually, it will then be given as the density at the observed statistics assuming the minimum allowed variance.

  • keys (As in StochasticKernel.) –

  • pdf_max (As in StochasticKernel.) –

__call__(x: dict, x_0: dict, t: int = None, par: dict = None)[source]

Evaluate at time point t the distance of the summary statistics of the data simulated for the tentatively sampled particle to those of the observed data.

Abstract method. This method has to be overwritten by all concrete implementations.

Parameters:
  • x (dict) – Summary statistics of the data simulated for the tentatively sampled parameter.

  • x_0 (dict) – Summary statistics of the observed data.

  • t (int) – Time point at which to evaluate the distance. Usually, the distance will not depend on the time.

  • par (dict) – The parameters used to create the summary statistics x. These can be required by some distance functions. Usually, the distance will not depend on the parameters.

Returns:

distance – Quantifies the distance between the summary statistics of the data simulated for the tentatively sampled particle and of the observed data.

Return type:

float

__init__(var: Callable | Sequence[float] | float = None, keys: List[str] = None, pdf_max: float = None)[source]
initialize(t: int, get_sample: Callable[[], Sample], x_0: dict, total_sims: int)[source]

Remember the summary statistic keys in sorted order, if not set in __init__ already.

class pyabc.distance.InfoWeightedPNormDistance(predictor: Predictor, p: float = 1, initial_scale_weights: Dict[str, float] = None, initial_info_weights: Dict[str, float] = None, fixed_weights: Dict[str, float] = None, fit_scale_ixs: EventIxs | Collection | int = inf, fit_info_ixs: EventIxs | Collection | int = None, normalize_by_par: bool = True, scale_function: Callable = None, max_scale_weight_ratio: float = None, max_info_weight_ratio: float = None, scale_log_file: str = None, info_log_file: str = None, info_sample_log_file: str = None, sumstat: Sumstat = None, fd_deltas: List[float] | float = None, subsetter: Subsetter = None, all_particles_for_scale: bool = True, all_particles_for_prediction: bool = True, feature_normalization: str = 'weights', par_trafo: ParTrafoBase = None)[source]

Bases: AdaptivePNormDistance

Weight summary statistics by sensitivity of a predictor y -> theta.

__init__(predictor: Predictor, p: float = 1, initial_scale_weights: Dict[str, float] = None, initial_info_weights: Dict[str, float] = None, fixed_weights: Dict[str, float] = None, fit_scale_ixs: EventIxs | Collection | int = inf, fit_info_ixs: EventIxs | Collection | int = None, normalize_by_par: bool = True, scale_function: Callable = None, max_scale_weight_ratio: float = None, max_info_weight_ratio: float = None, scale_log_file: str = None, info_log_file: str = None, info_sample_log_file: str = None, sumstat: Sumstat = None, fd_deltas: List[float] | float = None, subsetter: Subsetter = None, all_particles_for_scale: bool = True, all_particles_for_prediction: bool = True, feature_normalization: str = 'weights', par_trafo: ParTrafoBase = None)[source]
Parameters:
  • predictor – Predictor model used to quantify the information in data on parameters.

  • initial_info_weights – Initial information weights. Can be passed to avoid (re)-calibration.

  • fit_info_ixs – Generations when to fit the information weights, similar to fit_scale_ixs. Defaults to {9, 15}, which may not always be the smartest choice and may change in the future. In particular, consider making it dependent on the total number of simulations.

  • normalize_by_par – Whether to normalize total sensitivities of each parameter to 1.

  • max_info_weight_ratio – Maximum ratio on information weights, similar to max_scale_weight_ratio.

  • info_log_file – Log file for the information weights.

  • info_sample_log_file – Log file for samples used to train the regression model underlying the information weights, in npy format. Should be only a base file name, will be automatically postfixed by “{t}_{var}.npy”, with var in samples, parameters, weights.

  • fd_deltas – Finite difference step sizes. Can be a float, or a List of floats, in which case component-wise step size selection is employed.

  • subsetter – Sample subset/cluster selection method. Defaults to just taking all samples. In the presence of e.g. multi-modalities it may make sense to reduce.

  • all_particles_for_scale – Whether to use all particles for scale calculation (True) or only accepted ones (False).

  • all_particles_for_prediction – Whether to include rejected particles for fitting predictor models. The same arguments apply as for PredictorSumstat.all_particles, i.e. not using all may yield a better local approximation.

  • feature_normalization – What normalization to apply to the parameters before predictor model fitting. Can be any of “std” (standard deviation), “mad” (median absolute deviation), “weights” (use the inverse scale weights), or “none” (no normalization). It is recommended to match this with the scale_function, e.g. std or mad. Allowing to specify different normalizations (and not “weights”) allows to e.g. employ outlier down-weighting in the scale function, and just normalize differently here, in order to not counteract that.

  • par_trafo – Parameter transformations to use as targets. Defaults to identity.

static calculate_sensis(predictor: Predictor, fd_deltas: List[float] | float, x0: ndarray, n_x: int, n_y: int, par_trafo: ParTrafoBase, normalize_by_par: bool)[source]

Calculate normalized predictor sensitivities.

Parameters:
  • predictor (Fitted predictor model.) –

  • fd_deltas (Finite difference step sizes.) –

  • x0 (Observed data, shape (n_x).) –

  • n_x (Data dimension.) –

  • n_y (Transformed parameter dimension.) –

  • par_trafo (Parameter transformations, shape (n_y).) –

  • normalize_by_par (Whether to normalize sensitivities by parameters.) –

Returns:

sensis

Return type:

Sensitivities, shape (n_x, n_y).

configure_sampler(sampler) None[source]

Make the sampler return also rejected particles, because these are needed to get a better estimate of the summary statistic variability, avoiding a bias to accepted ones only.

Parameters:

sampler (Sampler) – The sampler employed.

fit_info(t: int, sample: Sample) None[source]

Update information weights from model fits.

get_config() dict[source]

Return configuration of the distance.

Returns:

config – Dictionary describing the distance.

Return type:

dict

get_weights(t: int) ndarray[source]

Compute weights for time t.

Generates weights from the multiple possible contributing factors. Overwrite in subclasses if there are additional weights.

Parameters:

t (Current time point.) –

Return type:

The combined weights.

initialize(t: int, get_sample: Callable[[], Sample], x_0: dict, total_sims: int) None[source]

Initialize before the first generation.

Called at the beginning by the inference routine, can be used for calibration to the problem. The default is to do nothing.

Parameters:
  • t – Time point for which to initialize the distance.

  • get_sample – Returns on command the initial sample.

  • x_0 – The observed summary statistics.

  • total_sims – The total number of simulations so far.

is_adaptive() bool[source]

Whether the class is dynamically updated after each generation, based on the last generation’s available data. Default: False.

static normalize_sample(sumstats: ndarray, parameters: ndarray, weights: ndarray, s_0: ndarray, t: int, subsetter: Subsetter, feature_normalization: str, scale_weights: Dict[int, ndarray]) Dict[source]

Normalize samples prior to regression model training.

Parameters:
  • sumstats (Model outputs or summary statistics, shape (n_sample, n_x).) –

  • parameters (Parameter values, shape (n_sample, n_y).) –

  • weights (Importance sampling weights, shape (n_sample,).) –

  • s_0 (Observed data, shape (n_x,).) –

  • t (Time point, only needed together with scale_weights.) –

  • subsetter (Subset creator.) –

  • feature_normalization (Method of feature normalization.) –

  • scale_weights – Dictionary of scale weights, only used if feature_normalization==”weights”.

Returns:

Dictionary with keys x, y, weights, use_ixs, x0.

Return type:

ret

requires_calibration() bool[source]

Whether the class requires an initial calibration, based on samples from the prior. Default: False.

update(t: int, get_sample: Callable[[], Sample], total_sims: int) bool[source]

Update for the upcoming generation t.

Similar to initialize, however called for every subsequent iteration. The default is to do nothing.

Parameters:
  • t – Time point for which to update the distance.

  • get_sample – Returns on demand the last generation’s complete sample.

  • total_sims – The total number of simulations so far.

Returns:

is_updated – Whether the distance has changed compared to beforehand. Depending on the result, the population needs to be updated in ABCSMC before preparing the next generation. Defaults to False.

Return type:

bool

class pyabc.distance.MinMaxDistance(measures_to_use='all')[source]

Bases: RangeEstimatorDistance

Calculate upper and lower margins as max and min of the parameters. This works surprisingly well for normalization in simple cases

static lower(parameter_list)[source]

Calculate the lower margin form a list of parameter values.

Parameters:

parameter_list (List[float]) – List of values of a parameter.

Returns:

lower_margin – The lower margin of the range calculated from these parameters

Return type:

float

static upper(parameter_list)[source]

Calculate the upper margin form a list of parameter values.

Parameters:

parameter_list (List[float]) – List of values of a parameter.

Returns:

upper_margin – The upper margin of the range calculated from these parameters

Return type:

float

class pyabc.distance.NegativeBinomialKernel(p: float, ret_scale: str = 'SCALE_LOG', keys: List[str] = None, pdf_max: float = None)[source]

Bases: StochasticKernel

A kernel with a negative binomial probability mass function.

Parameters:
  • p (Union[float, Callable]) – The success probability.

  • ret_scale (See StochasticKernel.) –

  • keys (See StochasticKernel.) –

  • pdf_max (See StochasticKernel.) –

__call__(x: dict, x_0: dict, t: int = None, par: dict = None) float[source]

Evaluate at time point t the distance of the summary statistics of the data simulated for the tentatively sampled particle to those of the observed data.

Abstract method. This method has to be overwritten by all concrete implementations.

Parameters:
  • x (dict) – Summary statistics of the data simulated for the tentatively sampled parameter.

  • x_0 (dict) – Summary statistics of the observed data.

  • t (int) – Time point at which to evaluate the distance. Usually, the distance will not depend on the time.

  • par (dict) – The parameters used to create the summary statistics x. These can be required by some distance functions. Usually, the distance will not depend on the parameters.

Returns:

distance – Quantifies the distance between the summary statistics of the data simulated for the tentatively sampled particle and of the observed data.

Return type:

float

__init__(p: float, ret_scale: str = 'SCALE_LOG', keys: List[str] = None, pdf_max: float = None)[source]
initialize(t: int, get_sample: Callable[[], Sample], x_0: dict, total_sims: int)[source]

Remember the summary statistic keys in sorted order, if not set in __init__ already.

class pyabc.distance.NoDistance[source]

Bases: Distance

Implements a kind of null object as distance function. This can be used as a dummy distance function if e.g. integrated modeling is used.

Note

This distance function cannot be evaluated, so currently it is in particular not possible to use an epsilon threshold which requires initialization, because during initialization the distance function is invoked directly and not via the acceptor as usual. Conceptually, this would be possible and can be implemented on request.

__call__(x: dict, x_0: dict, t: int = None, par: dict = None) float[source]

Evaluate at time point t the distance of the summary statistics of the data simulated for the tentatively sampled particle to those of the observed data.

Abstract method. This method has to be overwritten by all concrete implementations.

Parameters:
  • x (dict) – Summary statistics of the data simulated for the tentatively sampled parameter.

  • x_0 (dict) – Summary statistics of the observed data.

  • t (int) – Time point at which to evaluate the distance. Usually, the distance will not depend on the time.

  • par (dict) – The parameters used to create the summary statistics x. These can be required by some distance functions. Usually, the distance will not depend on the parameters.

Returns:

distance – Quantifies the distance between the summary statistics of the data simulated for the tentatively sampled particle and of the observed data.

Return type:

float

__init__()[source]
class pyabc.distance.NormalKernel(cov: ndarray = None, ret_scale: str = 'SCALE_LOG', keys: List[str] = None, pdf_max: float = None)[source]

Bases: StochasticKernel

A kernel with a normal, i.e. Gaussian, probability density. This is just a wrapper around sp.multivariate_normal.

Parameters:
  • cov (array_like, optional (default = identiy matrix)) – Covariance matrix of the distribution.

  • ret_scale (As in StochasticKernel.) –

  • keys (As in StochasticKernel.) –

  • pdf_max (As in StochasticKernel.) –

Note

The order of the entries in the mean and cov vectors is assumed to be the same as the one in keys. If keys is None, it is assumed to be the same as the one obtained via sorted(x.keys()) for summary statistics x.

__call__(x: dict, x_0: dict, t: int = None, par: dict = None) float[source]

Return the value of the normal distribution at x - x_0, or its logarithm.

__init__(cov: ndarray = None, ret_scale: str = 'SCALE_LOG', keys: List[str] = None, pdf_max: float = None)[source]
initialize(t: int, get_sample: Callable[[], Sample], x_0: dict, total_sims: int)[source]

Remember the summary statistic keys in sorted order, if not set in __init__ already.

class pyabc.distance.PCADistance(measures_to_use='all', p: float = 2)[source]

Bases: DistanceWithMeasureList

Calculate distance in whitened coordinates.

A PCA whitening transformation \(X\) is calculated from an initial sample. The distance is measured as p-norm distance in the transformed space. I.e

\[d(x,y) = \| Wx - Wy \|\]
Parameters:
  • measures_to_use (See DistanceWithMeasureList.) –

  • p (p-norm, defaults to Euclidean distance.) –

__call__(x: dict, x_0: dict, t: int = None, par: dict = None) float[source]

Evaluate at time point t the distance of the summary statistics of the data simulated for the tentatively sampled particle to those of the observed data.

Abstract method. This method has to be overwritten by all concrete implementations.

Parameters:
  • x (dict) – Summary statistics of the data simulated for the tentatively sampled parameter.

  • x_0 (dict) – Summary statistics of the observed data.

  • t (int) – Time point at which to evaluate the distance. Usually, the distance will not depend on the time.

  • par (dict) – The parameters used to create the summary statistics x. These can be required by some distance functions. Usually, the distance will not depend on the parameters.

Returns:

distance – Quantifies the distance between the summary statistics of the data simulated for the tentatively sampled particle and of the observed data.

Return type:

float

__init__(measures_to_use='all', p: float = 2)[source]
initialize(t: int, get_sample: Callable[[], Sample], x_0: dict, total_sims: int)[source]

Initialize before the first generation.

Called at the beginning by the inference routine, can be used for calibration to the problem. The default is to do nothing.

Parameters:
  • t – Time point for which to initialize the distance.

  • get_sample – Returns on command the initial sample.

  • x_0 – The observed summary statistics.

  • total_sims – The total number of simulations so far.

requires_calibration() bool[source]

Whether the class requires an initial calibration, based on samples from the prior. Default: False.

class pyabc.distance.PNormDistance(p: float = 1, fixed_weights: Dict[str, float] | Dict[int, Dict[str, float]] = None, sumstat: Sumstat = None)[source]

Bases: Distance

Weighted p-norm distance.

Distance between summary statistics calculated according to

\[d(x, y) = \left [\sum_{i} \left| w_i ( x_i-y_i ) \right|^{p} \right ]^{1/p}\]

E.g. * p=1 for a Manhattan or L1 metric, * p=2 for a Euclidean or L2 metric, * p=np.inf for a Chebyshev, maximum or inf metric.

Parameters:
  • p – p for p-norm. p >= 1, p = np.inf implies max norm.

  • fixed_weights – Weights. Dictionary indexed by time points, or only one entry if the weights should not change over time. Each entry contains a dictionary of numeric weights, indexed by summary statistics labels, or the corresponding array representation. If None is passed, a weight of 1 is considered for every summary statistic. If no entry is available for a given time point, the maximum available time point is selected.

  • sumstat – Summary statistics transformation to apply to the model output.

__call__(x: dict, x_0: dict, t: int = None, par: dict = None) float[source]

Evaluate at time point t the distance of the summary statistics of the data simulated for the tentatively sampled particle to those of the observed data.

Abstract method. This method has to be overwritten by all concrete implementations.

Parameters:
  • x (dict) – Summary statistics of the data simulated for the tentatively sampled parameter.

  • x_0 (dict) – Summary statistics of the observed data.

  • t (int) – Time point at which to evaluate the distance. Usually, the distance will not depend on the time.

  • par (dict) – The parameters used to create the summary statistics x. These can be required by some distance functions. Usually, the distance will not depend on the parameters.

Returns:

distance – Quantifies the distance between the summary statistics of the data simulated for the tentatively sampled particle and of the observed data.

Return type:

float

__init__(p: float = 1, fixed_weights: Dict[str, float] | Dict[int, Dict[str, float]] = None, sumstat: Sumstat = None)[source]
configure_sampler(sampler) None[source]

Configure the sampler.

This method is called by the inference routine at the beginning. A possible configuration would be to request also the storing of rejected particles. The default is to do nothing.

Parameters:

sampler (Sampler) – The used sampler.

static for_t_or_latest(w: Dict[int, ndarray], t: int) ndarray[source]

Extract values from dict for given time point.

Parameters:
  • w (Weights dictionary.) –

  • t (Time point to extract weights for.) –

Return type:

The The weights at time t, or the maximal key if t is not present.

static format_dict(vals: Dict[str, float] | Dict[int, Dict[str, float]], t: int, s_ids: List[str]) Dict[int, float | ndarray][source]

Normalize weight dictionary to the employed format.

Parameters:
  • vals (Possibly unformatted weight values.) –

  • t (Current time point.) –

  • s_ids (Summary statistic labels for correct conversion to array.) –

Return type:

Dictionary indexed by time points and, with array values.

get_config() dict[source]

Return configuration of the distance.

Returns:

config – Dictionary describing the distance.

Return type:

dict

get_weights(t: int) ndarray[source]

Compute weights for time t.

Generates weights from the multiple possible contributing factors. Overwrite in subclasses if there are additional weights.

Parameters:

t (Current time point.) –

Return type:

The combined weights.

initialize(t: int, get_sample: Callable[[], Sample], x_0: dict, total_sims: int) None[source]

Initialize before the first generation.

Called at the beginning by the inference routine, can be used for calibration to the problem. The default is to do nothing.

Parameters:
  • t – Time point for which to initialize the distance.

  • get_sample – Returns on command the initial sample.

  • x_0 – The observed summary statistics.

  • total_sims – The total number of simulations so far.

update(t: int, get_sample: Callable[[], Sample], total_sims: int) bool[source]

Update for the upcoming generation t.

Similar to initialize, however called for every subsequent iteration. The default is to do nothing.

Parameters:
  • t – Time point for which to update the distance.

  • get_sample – Returns on demand the last generation’s complete sample.

  • total_sims – The total number of simulations so far.

Returns:

is_updated – Whether the distance has changed compared to beforehand. Depending on the result, the population needs to be updated in ABCSMC before preparing the next generation. Defaults to False.

Return type:

bool

weights2dict(weights: Dict[int, ndarray]) Dict[int, Dict[str, float]][source]

Create labeled weights dictionary.

Parameters:

weights – Array formatted weight dictionary.

Returns:

weights_dict

Return type:

Key-value formatted weight dictionary.

class pyabc.distance.PercentileDistance(measures_to_use='all')[source]

Bases: RangeEstimatorDistance

Calculate normalization 20% and 80% from percentiles as lower and upper margins

PERCENTILE = 20

The percentiles

get_config()[source]

Return configuration of the distance.

Returns:

config – Dictionary describing the distance.

Return type:

dict

static lower(parameter_list)[source]

Calculate the lower margin form a list of parameter values.

Parameters:

parameter_list (List[float]) – List of values of a parameter.

Returns:

lower_margin – The lower margin of the range calculated from these parameters

Return type:

float

static upper(parameter_list)[source]

Calculate the upper margin form a list of parameter values.

Parameters:

parameter_list (List[float]) – List of values of a parameter.

Returns:

upper_margin – The upper margin of the range calculated from these parameters

Return type:

float

class pyabc.distance.PoissonKernel(ret_scale: str = 'SCALE_LOG', keys: List[str] = None, pdf_max: float = None)[source]

Bases: StochasticKernel

A kernel with a Poisson probability mass function.

Parameters:
  • ret_scale (See StochasticKernel.) –

  • keys (See StochasticKernel.) –

  • pdf_max (See StochasticKernel.) –

__call__(x: dict, x_0: dict, t: int = None, par: dict = None) float[source]

Evaluate at time point t the distance of the summary statistics of the data simulated for the tentatively sampled particle to those of the observed data.

Abstract method. This method has to be overwritten by all concrete implementations.

Parameters:
  • x (dict) – Summary statistics of the data simulated for the tentatively sampled parameter.

  • x_0 (dict) – Summary statistics of the observed data.

  • t (int) – Time point at which to evaluate the distance. Usually, the distance will not depend on the time.

  • par (dict) – The parameters used to create the summary statistics x. These can be required by some distance functions. Usually, the distance will not depend on the parameters.

Returns:

distance – Quantifies the distance between the summary statistics of the data simulated for the tentatively sampled particle and of the observed data.

Return type:

float

__init__(ret_scale: str = 'SCALE_LOG', keys: List[str] = None, pdf_max: float = None)[source]
initialize(t: int, get_sample: Callable[[], Sample], x_0: dict, total_sims: int)[source]

Remember the summary statistic keys in sorted order, if not set in __init__ already.

class pyabc.distance.RangeEstimatorDistance(measures_to_use='all')[source]

Bases: DistanceWithMeasureList

Abstract base class for distance functions which estimate is based on a range.

It defines the two template methods lower and upper.

Hence

\[d(x, y) = \sum_{i \in \text{measures}} \left | \frac{x_i - y_i}{u_i - l_i} \right |\]

where \(l_i\) and \(u_i\) are the lower and upper margin for measure \(i\).

__call__(x: dict, x_0: dict, t: int = None, par: dict = None) float[source]

Evaluate at time point t the distance of the summary statistics of the data simulated for the tentatively sampled particle to those of the observed data.

Abstract method. This method has to be overwritten by all concrete implementations.

Parameters:
  • x (dict) – Summary statistics of the data simulated for the tentatively sampled parameter.

  • x_0 (dict) – Summary statistics of the observed data.

  • t (int) – Time point at which to evaluate the distance. Usually, the distance will not depend on the time.

  • par (dict) – The parameters used to create the summary statistics x. These can be required by some distance functions. Usually, the distance will not depend on the parameters.

Returns:

distance – Quantifies the distance between the summary statistics of the data simulated for the tentatively sampled particle and of the observed data.

Return type:

float

__init__(measures_to_use='all')[source]
get_config()[source]

Return configuration of the distance.

Returns:

config – Dictionary describing the distance.

Return type:

dict

initialize(t: int, get_sample: Callable[[], Sample], x_0: dict, total_sims: int)[source]

Initialize before the first generation.

Called at the beginning by the inference routine, can be used for calibration to the problem. The default is to do nothing.

Parameters:
  • t – Time point for which to initialize the distance.

  • get_sample – Returns on command the initial sample.

  • x_0 – The observed summary statistics.

  • total_sims – The total number of simulations so far.

static lower(parameter_list: List[float])[source]

Calculate the lower margin form a list of parameter values.

Parameters:

parameter_list (List[float]) – List of values of a parameter.

Returns:

lower_margin – The lower margin of the range calculated from these parameters

Return type:

float

requires_calibration() bool[source]

Whether the class requires an initial calibration, based on samples from the prior. Default: False.

static upper(parameter_list: List[float])[source]

Calculate the upper margin form a list of parameter values.

Parameters:

parameter_list (List[float]) – List of values of a parameter.

Returns:

upper_margin – The upper margin of the range calculated from these parameters

Return type:

float

class pyabc.distance.SlicedWassersteinDistance(sumstat: Sumstat, metric: str = 'sqeuclidean', p: float = 2.0, n_proj: int = 50, seed: int | RandomState = None, emd_1d_args: dict = None)[source]

Bases: Distance

Sliced Wasserstein distance via efficient one-dimensional projections.

As the optimal transport mapping underlying Wasserstein distances can be challenging for high-dimensional problems, this distance reduces multi-dimensional distributions to one-dimensional representations via linear projections, and then averages 1d Wasserstein distances, which can be efficiently calculated by sorting, across the projected distributions.

More explicitly, with \(\mathbb{S}^{d-1} = \{u\in\mathbb{R}^d: \|x\|_2=1\}\) denoting the d-dimensional unit sphere and for \(u\in\mathbb{S}^{d-1}\) denoting by \(u^*(y) = \langle u, y\rangle\) the associated linear form, the Sliced Wasserstein distance of order \(p\) between probability measures \(\mu,\nu\) is defined as:

\[\text{SWD}_p(\mu, \nu) = \underset{u \sim \mathcal{U} (\mathbb{S}^{d-1})}{\mathbb{E}}[W_p^p(u^*_\# \mu, u^*_\# \nu)] ^{\frac{1}{p}}\]

Here, \(u^*_\# \mu\) denotes the push-forward measure of \(\mu\) by the projection \(u\), and \(W_p\) the 1d Wasserstein distance with exponent \(p\) for an underlying distance metric. In practice, the integral is approximated via a Monte-Carlo sample.

This distance is based on [2], the implementation based on and generalized from https://pythonot.github.io/gen_modules/ot.sliced.html.

__call__(x: dict, x_0: dict, t: int = None, par: dict = None) float[source]

Evaluate at time point t the distance of the summary statistics of the data simulated for the tentatively sampled particle to those of the observed data.

Abstract method. This method has to be overwritten by all concrete implementations.

Parameters:
  • x (dict) – Summary statistics of the data simulated for the tentatively sampled parameter.

  • x_0 (dict) – Summary statistics of the observed data.

  • t (int) – Time point at which to evaluate the distance. Usually, the distance will not depend on the time.

  • par (dict) – The parameters used to create the summary statistics x. These can be required by some distance functions. Usually, the distance will not depend on the parameters.

Returns:

distance – Quantifies the distance between the summary statistics of the data simulated for the tentatively sampled particle and of the observed data.

Return type:

float

__init__(sumstat: Sumstat, metric: str = 'sqeuclidean', p: float = 2.0, n_proj: int = 50, seed: int | RandomState = None, emd_1d_args: dict = None)[source]
Parameters:
  • sumstat – Summary statistics. Returns a ndarray of shape (n, dim), where n is the number of samples and dim the sample dimension.

  • metric – Distance to use, e.g. “cityblock”, “sqeuclidean”, “minkowski”.

  • p – Distance exponent, to take the root in the overall distance. Also used in ot.emd2d_1d if “metric”==”minkowski”.

  • n_proj – Number of unit sphere projections used for Monte-Carlo approximation. Per projection, a one-dimensional EMD is calculated.

  • seed – Seed used for numpy random number generation.

  • emd_1d_args – Further keyword arguments passed on to ot.emd2_1d.

initialize(x_0: dict, t: int = None, get_sample: Callable[[], Sample] = None, total_sims: int = None) None[source]

Initialize before the first generation.

Called at the beginning by the inference routine, can be used for calibration to the problem. The default is to do nothing.

Parameters:
  • t – Time point for which to initialize the distance.

  • get_sample – Returns on command the initial sample.

  • x_0 – The observed summary statistics.

  • total_sims – The total number of simulations so far.

update(t: int, get_sample: Callable[[], Sample], total_sims: int) bool[source]

Update for the upcoming generation t.

Similar to initialize, however called for every subsequent iteration. The default is to do nothing.

Parameters:
  • t – Time point for which to update the distance.

  • get_sample – Returns on demand the last generation’s complete sample.

  • total_sims – The total number of simulations so far.

Returns:

is_updated – Whether the distance has changed compared to beforehand. Depending on the result, the population needs to be updated in ABCSMC before preparing the next generation. Defaults to False.

Return type:

bool

class pyabc.distance.StochasticKernel(ret_scale: str = 'SCALE_LIN', keys: List[str] = None, pdf_max: float = None)[source]

Bases: Distance

A stochastic kernel assesses the similarity between observed and simulated summary statistics or data via a probability measure.

Note

The returned value cannot be interpreted as a distance function, but rather as an inverse distance, as it increases as the similarity between observed and simulated summary statistics increases. Thus, a StochasticKernel should only be used together with a StochasticAcceptor.

Parameters:
  • ret_scale (str, optional (default = SCALE_LIN)) – The scale of the value returned in __call__: Given a proability density p(x,x_0), the returned value can be either of p(x,x_0), or log(p(x,x_0)).

  • keys (List[str], optional) – The keys of the summary statistics, specifying the order to be used.

  • pdf_max (float, optional) – The maximum possible probability density function value. Defaults to None and is then computed as the density at (x_0, x_0), where x_0 denotes the observed summary statistics. Must be overridden if pdf_max is to be used in the analysis by the acceptor and the default is not applicable. This value should be in the scale specified by ret_scale already.

__init__(ret_scale: str = 'SCALE_LIN', keys: List[str] = None, pdf_max: float = None)[source]
initialize(t: int, get_sample: Callable[[], Sample], x_0: dict, total_sims: int)[source]

Remember the summary statistic keys in sorted order, if not set in __init__ already.

class pyabc.distance.WassersteinDistance(sumstat: Sumstat, p: float = 2.0, dist: str | Callable = None, emd_args: dict = None)[source]

Bases: Distance

Optimal transport Wasserstein distance between empirical distributions.

The Wasserstein distance, also referred to as Vaserstein, Kantorovich-Rubinstein, or earth mover’s distance, is a metric between probability distributions on a given metric space (M, d). Intuitively, it quantifies the minimum cost of transforming one probability distribution on M into another, with point-wise cost function d.

The Wasserstein distance between discrete distributions \(\mu = \{(x_i,a_i)\}\) and \(\nu = \{(y_i,b_i)\}\) can be expressed as

\[W_p(\mu,\nu) = \left(\sum_{i,j}\gamma^*_{ij}M_{ij}\right)^{1/p}\]

where the optimal transport mapping is given as

\[ \begin{align}\begin{aligned}\gamma^* = \text{argmin}_{\gamma \in \mathbb{R}^{m\times n}} \sum_{i,j}\gamma_{ij}M_{ij}\\s.t. \gamma 1 = a; \gamma^T 1= b; \gamma\geq 0\end{aligned}\end{align} \]

where \(M\in\mathbb{R}^{m\times n}\) is the pairwise cost matrix defining the cost to move mass from bin \(x_i\) to bin \(y_j\), e.g. expressed via a distance metric, \(M_{ij} = \|x_i - y_j\|_p\), and \(a\) and \(b\) are histograms weighting samples (e.g. uniform).

Its application in ABC is based on [3]. For further information see e.g. https://en.wikipedia.org/wiki/Wasserstein_metric.

__call__(x: dict, x_0: dict, t: int = None, par: dict = None) float[source]

Evaluate at time point t the distance of the summary statistics of the data simulated for the tentatively sampled particle to those of the observed data.

Abstract method. This method has to be overwritten by all concrete implementations.

Parameters:
  • x (dict) – Summary statistics of the data simulated for the tentatively sampled parameter.

  • x_0 (dict) – Summary statistics of the observed data.

  • t (int) – Time point at which to evaluate the distance. Usually, the distance will not depend on the time.

  • par (dict) – The parameters used to create the summary statistics x. These can be required by some distance functions. Usually, the distance will not depend on the parameters.

Returns:

distance – Quantifies the distance between the summary statistics of the data simulated for the tentatively sampled particle and of the observed data.

Return type:

float

__init__(sumstat: Sumstat, p: float = 2.0, dist: str | Callable = None, emd_args: dict = None)[source]
Parameters:
  • sumstat – Summary statistics. Returns a ndarray of shape (n, dim), where n is the number of samples and dim the sample dimension.

  • p – Distance exponent, e.g. Manhattan (p=1), Euclidean (p=2). If dist is separately specified, ^(1/p) is still applied at the end.

  • dist – Distance to use. If not specified, the distance is induced by p.

  • emd_args – Further keyword arguments passed on to ot.emd.

initialize(x_0: dict, t: int = None, get_sample: Callable[[], Sample] = None, total_sims: int = None) None[source]

Initialize before the first generation.

Called at the beginning by the inference routine, can be used for calibration to the problem. The default is to do nothing.

Parameters:
  • t – Time point for which to initialize the distance.

  • get_sample – Returns on command the initial sample.

  • x_0 – The observed summary statistics.

  • total_sims – The total number of simulations so far.

update(t: int, get_sample: Callable[[], Sample], total_sims: int) bool[source]

Update for the upcoming generation t.

Similar to initialize, however called for every subsequent iteration. The default is to do nothing.

Parameters:
  • t – Time point for which to update the distance.

  • get_sample – Returns on demand the last generation’s complete sample.

  • total_sims – The total number of simulations so far.

Returns:

is_updated – Whether the distance has changed compared to beforehand. Depending on the result, the population needs to be updated in ABCSMC before preparing the next generation. Defaults to False.

Return type:

bool

class pyabc.distance.ZScoreDistance(measures_to_use='all')[source]

Bases: DistanceWithMeasureList

Calculate distance as sum of ZScore over the selected measures. The measured Data is the reference for the ZScore.

Hence

\[d(x, y) = \sum_{i \in \text{measures}} \left| \frac{x_i-y_i}{y_i} \right|\]
__call__(x: dict, x_0: dict, t: int = None, par: dict = None) float[source]

Evaluate at time point t the distance of the summary statistics of the data simulated for the tentatively sampled particle to those of the observed data.

Abstract method. This method has to be overwritten by all concrete implementations.

Parameters:
  • x (dict) – Summary statistics of the data simulated for the tentatively sampled parameter.

  • x_0 (dict) – Summary statistics of the observed data.

  • t (int) – Time point at which to evaluate the distance. Usually, the distance will not depend on the time.

  • par (dict) – The parameters used to create the summary statistics x. These can be required by some distance functions. Usually, the distance will not depend on the parameters.

Returns:

distance – Quantifies the distance between the summary statistics of the data simulated for the tentatively sampled particle and of the observed data.

Return type:

float