Distance functions

Distance functions measure closeness of observed and sampled data. This module implements various commonly used distance functions for ABC, featuring a few advanced concepts.

For custom distance functions, either pass a plain function to ABCSMC or subclass the pyabc.Distance class.

class pyabc.distance.AcceptAllDistance[source]

Bases: pyabc.distance.base.Distance

Just a mock distance function which always returns -1. So any sample should be accepted for any sane epsilon object.

Can be used for testing.

__call__(x: dict, x_0: dict, t: int = None, par: dict = None) → float[source]

Evaluate at time point t the distance of the summary statistics of the data simulated for the tentatively sampled particle to those of the observed data.

Abstract method. This method has to be overwritten by all concrete implementations.

Parameters
  • x (dict) – Summary statistics of the data simulated for the tentatively sampled parameter.

  • x_0 (dict) – Summary statistics of the observed data.

  • t (int) – Time point at which to evaluate the distance. Usually, the distance will not depend on the time.

  • par (dict) – The parameters used to create the summary statistics x. These can be required by some distance functions. Usually, the distance will not depend on the parameters.

Returns

distance – Quantifies the distance between the summary statistics of the data simulated for the tentatively sampled particle and of the observed data.

Return type

float

class pyabc.distance.AdaptiveAggregatedDistance(distances: List[pyabc.distance.base.Distance], initial_weights: List = None, factors: Union[List, dict] = None, adaptive: bool = True, scale_function: Callable = None, log_file: str = None)[source]

Bases: pyabc.distance.distance.AggregatedDistance

Adapt the weights of AggregatedDistances automatically over time.

Parameters
  • distances – As in AggregatedDistance.

  • initial_weights – Weights to be used in the initial iteration. List with a weight for each distance function.

  • factors – As in AggregatedDistance.

  • adaptive – True: Adapt weights after each iteration. False: Adapt weights only once at the beginning in initialize(). This corresponds to a pre-calibration.

  • scale_function – Function that takes a list of floats, namely the values obtained by applying one of the distances passed to a set of samples, and returns a single float, namely the weight to apply to this distance function. Default: scale_span.

  • log_file – A log file to store weights for each time point in. Weights are currently not stored in the database. The data are saved in json format and can be retrieved via pyabc.storage.load_dict_from_json.

__init__(distances: List[pyabc.distance.base.Distance], initial_weights: List = None, factors: Union[List, dict] = None, adaptive: bool = True, scale_function: Callable = None, log_file: str = None)[source]
Parameters
  • distances (List) – The distance functions to apply.

  • weights (Union[List, dict], optional (default = [1,..])) – The weights to apply to the distances when taking the sum. Can be a list with entries in the same order as the distances, or a dictionary of lists, with the keys being the single time points (if the weights should be iteration-specific).

  • factors (Union[List, dict], optional (dfault = [1,..])) – Scaling factors that the weights are multiplied with. The same structure applies as to weights. If None is passed, a factor of 1 is considered for every summary statistic. Note that in this class, factors are superfluous as everything can be achieved with weights alone, however in subclsses the factors can remain static while weights adapt over time, allowing for greater flexibility.

initialize(t: int, get_all_sum_stats: Callable[], List[dict]], x_0: dict = None)[source]

Initialize weights.

update(t: int, get_all_sum_stats: Callable[], List[dict]])[source]

Update weights based on all simulations.

class pyabc.distance.AdaptivePNormDistance(p: float = 2, initial_weights: dict = None, factors: dict = None, adaptive: bool = True, scale_function: Callable = None, normalize_weights: bool = True, max_weight_ratio: float = None, log_file: str = None)[source]

Bases: pyabc.distance.distance.PNormDistance

In the p-norm distance, adapt the weights for each generation, based on the previous simulations. This class is motivated by 1.

Parameters
  • p – p for p-norm. Required p >= 1, p = np.inf allowed (infinity-norm). Default: p=2.

  • initial_weights – Weights to be used in the initial iteration. Dictionary with observables as keys and weights as values.

  • factors – As in PNormDistance.

  • adaptive – True: Adapt distance after each iteration. False: Adapt distance only once at the beginning in initialize(). This corresponds to a pre-calibration.

  • scale_function – (data: list, x_0: float) -> scale: float. Computes the scale (i.e. inverse weight s = 1 / w) for a given summary statistic. Here, data denotes the list of simulated summary statistics, and x_0 the observed summary statistic. Implemented are absolute_median_deviation, standard_deviation (default), centered_absolute_median_deviation, centered_standard_deviation.

  • normalize_weights – Whether to normalize the weights to have mean 1. This just possibly smoothes the decrease of epsilon and might aid numeric stability, but is not strictly necessary.

  • max_weight_ratio – If not None, large weights will be bounded by the ratio times the smallest non-zero absolute weight. In practice usually not necessary, it is theoretically required to ensure convergence.

  • log_file – A log file to store weights for each time point in. Weights are currently not stored in the database. The data are saved in json format and can be retrieved via pyabc.storage.load_dict_from_json.

1

Prangle, Dennis. “Adapting the ABC Distance Function”. Bayesian Analysis, 2017. doi:10.1214/16-BA1002.

__init__(p: float = 2, initial_weights: dict = None, factors: dict = None, adaptive: bool = True, scale_function: Callable = None, normalize_weights: bool = True, max_weight_ratio: float = None, log_file: str = None)[source]

Initialize self. See help(type(self)) for accurate signature.

configure_sampler(sampler: pyabc.sampler.base.Sampler)[source]

Make the sampler return also rejected particles, because these are needed to get a better estimate of the summary statistic variabilities, avoiding a bias to accepted ones only.

Parameters

sampler (Sampler) – The sampler employed.

get_config() → dict[source]

Return configuration of the distance.

Returns

config – Dictionary describing the distance.

Return type

dict

initialize(t: int, get_all_sum_stats: Callable[], List[dict]], x_0: dict = None)[source]

Initialize weights.

update(t: int, get_all_sum_stats: Callable[], List[dict]])[source]

Update weights.

class pyabc.distance.AggregatedDistance(distances: List[pyabc.distance.base.Distance], weights: Union[List, dict] = None, factors: Union[List, dict] = None)[source]

Bases: pyabc.distance.base.Distance

Aggregates a list of distance functions, all of which may work on subparts of the summary statistics. Then computes and returns the weighted sum of the distance values generated by the various distance functions.

All class functions are propagated to the children and the obtained results aggregated appropriately.

__call__(x: dict, x_0: dict, t: int = None, par: dict = None) → float[source]

Applies all distance functions and computes the weighted sum of all obtained values.

__init__(distances: List[pyabc.distance.base.Distance], weights: Union[List, dict] = None, factors: Union[List, dict] = None)[source]
Parameters
  • distances (List) – The distance functions to apply.

  • weights (Union[List, dict], optional (default = [1,..])) – The weights to apply to the distances when taking the sum. Can be a list with entries in the same order as the distances, or a dictionary of lists, with the keys being the single time points (if the weights should be iteration-specific).

  • factors (Union[List, dict], optional (dfault = [1,..])) – Scaling factors that the weights are multiplied with. The same structure applies as to weights. If None is passed, a factor of 1 is considered for every summary statistic. Note that in this class, factors are superfluous as everything can be achieved with weights alone, however in subclsses the factors can remain static while weights adapt over time, allowing for greater flexibility.

configure_sampler(sampler: pyabc.sampler.base.Sampler)[source]

Note: configure_sampler is applied by all distances sequentially, so care must be taken that they perform no contradictory operations on the sampler.

static format_dict(w, t, n_distances, default_val=1.0)[source]

Normalize weight or factor dictionary to the employed format.

get_config() → dict[source]

Return configuration of the distance.

Returns

config – Dictionary describing the distance.

Return type

dict

static get_for_t_or_latest(w, t)[source]

Extract values from dict for given time point.

initialize(t: int, get_all_sum_stats: Callable[], List[dict]], x_0: dict = None)[source]

This method is called by the ABCSMC framework before the first use of the distance (at the beginning of ABCSMC.run()), and can be used to calibrate it to the statistics of the samples.

The default is to do nothing.

Parameters
  • t (int) – Time point for which to initialize the distance.

  • get_all_sum_stats (Callable[[], List[dict]]) – Returns on command the initial summary statistics.

  • x_0 (dict, optional) – The observed summary statistics.

update(t: int, get_all_sum_stats: Callable[], List[dict]]) → bool[source]

The sum_stats are passed on to all distance functions, each of which may then update using these. If any update occurred, a value of True is returned indicating that e.g. the distance may need to be recalculated since the underlying distances changed.

class pyabc.distance.BinomialKernel(p: Union[float, Callable], ret_scale: str = 'SCALE_LOG', keys: List[str] = None, pdf_max: float = None)[source]

Bases: pyabc.distance.kernel.StochasticKernel

A kernel with a binomial probability mass function.

Parameters
  • p (Union[float, Callable]) – The success probability.

  • keys, pdf_max (ret_scale,) –

__call__(x: dict, x_0: dict, t: int = None, par: dict = None) → float[source]

Evaluate at time point t the distance of the summary statistics of the data simulated for the tentatively sampled particle to those of the observed data.

Abstract method. This method has to be overwritten by all concrete implementations.

Parameters
  • x (dict) – Summary statistics of the data simulated for the tentatively sampled parameter.

  • x_0 (dict) – Summary statistics of the observed data.

  • t (int) – Time point at which to evaluate the distance. Usually, the distance will not depend on the time.

  • par (dict) – The parameters used to create the summary statistics x. These can be required by some distance functions. Usually, the distance will not depend on the parameters.

Returns

distance – Quantifies the distance between the summary statistics of the data simulated for the tentatively sampled particle and of the observed data.

Return type

float

__init__(p: Union[float, Callable], ret_scale: str = 'SCALE_LOG', keys: List[str] = None, pdf_max: float = None)[source]

Initialize self. See help(type(self)) for accurate signature.

initialize(t: int, get_all_sum_stats: Callable[], List[dict]], x_0: dict = None)[source]

Remember the summary statistic keys in sorted order, if not set in __init__ already.

class pyabc.distance.Distance[source]

Bases: abc.ABC

Abstract base class for distance objects.

Any object that computes the similarity between observed and simulated data should inherit from this class.

abstract __call__(x: dict, x_0: dict, t: int = None, par: dict = None) → float[source]

Evaluate at time point t the distance of the summary statistics of the data simulated for the tentatively sampled particle to those of the observed data.

Abstract method. This method has to be overwritten by all concrete implementations.

Parameters
  • x (dict) – Summary statistics of the data simulated for the tentatively sampled parameter.

  • x_0 (dict) – Summary statistics of the observed data.

  • t (int) – Time point at which to evaluate the distance. Usually, the distance will not depend on the time.

  • par (dict) – The parameters used to create the summary statistics x. These can be required by some distance functions. Usually, the distance will not depend on the parameters.

Returns

distance – Quantifies the distance between the summary statistics of the data simulated for the tentatively sampled particle and of the observed data.

Return type

float

configure_sampler(sampler: pyabc.sampler.base.Sampler)[source]

This is called by the ABCSMC class and gives the distance the opportunity to configure the sampler. For example, the distance might request the sampler to also return rejected particles in order to adapt the distance to the statistics of the sample. The method is called by the ABCSMC framework before the first used of the distance (at the beginning of ABCSMC.run()), after initialize().

The default is to do nothing.

Parameters

sampler (Sampler) – The sampler used in ABCSMC.

get_config() → dict[source]

Return configuration of the distance.

Returns

config – Dictionary describing the distance.

Return type

dict

initialize(t: int, get_all_sum_stats: Callable[], List[dict]], x_0: dict = None)[source]

This method is called by the ABCSMC framework before the first use of the distance (at the beginning of ABCSMC.run()), and can be used to calibrate it to the statistics of the samples.

The default is to do nothing.

Parameters
  • t (int) – Time point for which to initialize the distance.

  • get_all_sum_stats (Callable[[], List[dict]]) – Returns on command the initial summary statistics.

  • x_0 (dict, optional) – The observed summary statistics.

to_json() → str[source]

Return JSON encoded configuration of the distance.

Returns

json_str – JSON encoded string describing the distance. The default implementation is to try to convert the dictionary returned by get_config.

Return type

str:

update(t: int, get_all_sum_stats: Callable[], List[dict]]) → bool[source]

Update the distance for the upcoming generation t.

The default is to do nothing.

Parameters
  • t (int) – Time point for which to update the distance.

  • get_all_sum_stats (Callable[[], List[dict]]) – Returns on demand a list of all summary statistics from the finished generation that should be used to update the distance.

Returns

is_updated – Whether the distance has changed compared to beforehand. Depending on the result, the population needs to be updated in ABCSMC before preparing the next generation. Defaults to False.

Return type

bool

class pyabc.distance.DistanceWithMeasureList(measures_to_use='all')[source]

Bases: pyabc.distance.base.Distance

Base class for distance functions with measure list. This class is not functional on its own.

Parameters

measures_to_use (Union[str, List[str]]) –

  • If set to “all”, all measures are used. This is the default.

  • If a list is provided, the measures in the list are used.

  • measures refers to the summary statistics.

__init__(measures_to_use='all')[source]

Initialize self. See help(type(self)) for accurate signature.

get_config()[source]

Return configuration of the distance.

Returns

config – Dictionary describing the distance.

Return type

dict

initialize(t: int, get_all_sum_stats: Callable[], List[dict]], x_0: dict = None)[source]

This method is called by the ABCSMC framework before the first use of the distance (at the beginning of ABCSMC.run()), and can be used to calibrate it to the statistics of the samples.

The default is to do nothing.

Parameters
  • t (int) – Time point for which to initialize the distance.

  • get_all_sum_stats (Callable[[], List[dict]]) – Returns on command the initial summary statistics.

  • x_0 (dict, optional) – The observed summary statistics.

class pyabc.distance.IdentityFakeDistance[source]

Bases: pyabc.distance.base.Distance

A fake distance function, which just passes the summary statistics on. This class assumes that the model already returns the distance. This can be useful in cases where simulating can be stopped early, when during the simulation some condition is reached which makes it impossible to accept the particle.

__call__(x: dict, x_0: dict, t: int = None, par: dict = None) → float[source]

Evaluate at time point t the distance of the summary statistics of the data simulated for the tentatively sampled particle to those of the observed data.

Abstract method. This method has to be overwritten by all concrete implementations.

Parameters
  • x (dict) – Summary statistics of the data simulated for the tentatively sampled parameter.

  • x_0 (dict) – Summary statistics of the observed data.

  • t (int) – Time point at which to evaluate the distance. Usually, the distance will not depend on the time.

  • par (dict) – The parameters used to create the summary statistics x. These can be required by some distance functions. Usually, the distance will not depend on the parameters.

Returns

distance – Quantifies the distance between the summary statistics of the data simulated for the tentatively sampled particle and of the observed data.

Return type

float

class pyabc.distance.IndependentLaplaceKernel(scale: Union[Callable, List[float], float] = None, keys: List[str] = None, pdf_max: float = None)[source]

Bases: pyabc.distance.kernel.StochasticKernel

This kernel can be used for efficient computations of large-scale independent Laplace distributions, performing computations directly on a log-scale to avoid numeric issues. In each coordinate, a 1-dim Laplace distribution

\[p(x) = \frac{1}{2b}\exp (\frac{1}{b}|x-a|)\]

is assumed.

Parameters
  • scale (Union[array_like, float, Callable], optional (default = ones vector)) – Scale terms b of the distribution. Can also be a Callable taking as arguments the parameters. In that case, pdf_max should also be given if it is supposed to be used. Usually, it will then be given as the density at the observed statistics assuming the minimum allowed variance.

  • pdf_max (keys,) –

__call__(x: dict, x_0: dict, t: int = None, par: dict = None)[source]

Evaluate at time point t the distance of the summary statistics of the data simulated for the tentatively sampled particle to those of the observed data.

Abstract method. This method has to be overwritten by all concrete implementations.

Parameters
  • x (dict) – Summary statistics of the data simulated for the tentatively sampled parameter.

  • x_0 (dict) – Summary statistics of the observed data.

  • t (int) – Time point at which to evaluate the distance. Usually, the distance will not depend on the time.

  • par (dict) – The parameters used to create the summary statistics x. These can be required by some distance functions. Usually, the distance will not depend on the parameters.

Returns

distance – Quantifies the distance between the summary statistics of the data simulated for the tentatively sampled particle and of the observed data.

Return type

float

__init__(scale: Union[Callable, List[float], float] = None, keys: List[str] = None, pdf_max: float = None)[source]

Initialize self. See help(type(self)) for accurate signature.

initialize(t: int, get_all_sum_stats: Callable[], List[dict]], x_0: dict = None)[source]

Remember the summary statistic keys in sorted order, if not set in __init__ already.

class pyabc.distance.IndependentNormalKernel(var: Union[Callable, List[float], float] = None, keys: List[str] = None, pdf_max: float = None)[source]

Bases: pyabc.distance.kernel.StochasticKernel

This kernel can be used for efficient computations of large-scale independent normal distributions, circumventing the covariance matrix, and performing computations directly on a log-scale to avoid numeric issues.

Parameters
  • var (Union[array_like, float, Callable], optional (default = ones vector)) – Variances of the distribution (assuming zeros in the off-diagonal of the covariance matrix). Can also be a Callable taking as arguments the parameters. In that case, pdf_max should also be given if it is supposed to be used. Usually, it will then be given as the density at the observed statistics assuming the minimum allowed variance.

  • pdf_max (keys,) –

__call__(x: dict, x_0: dict, t: int = None, par: dict = None)[source]

Evaluate at time point t the distance of the summary statistics of the data simulated for the tentatively sampled particle to those of the observed data.

Abstract method. This method has to be overwritten by all concrete implementations.

Parameters
  • x (dict) – Summary statistics of the data simulated for the tentatively sampled parameter.

  • x_0 (dict) – Summary statistics of the observed data.

  • t (int) – Time point at which to evaluate the distance. Usually, the distance will not depend on the time.

  • par (dict) – The parameters used to create the summary statistics x. These can be required by some distance functions. Usually, the distance will not depend on the parameters.

Returns

distance – Quantifies the distance between the summary statistics of the data simulated for the tentatively sampled particle and of the observed data.

Return type

float

__init__(var: Union[Callable, List[float], float] = None, keys: List[str] = None, pdf_max: float = None)[source]

Initialize self. See help(type(self)) for accurate signature.

initialize(t: int, get_all_sum_stats: Callable[], List[dict]], x_0: dict = None)[source]

Remember the summary statistic keys in sorted order, if not set in __init__ already.

class pyabc.distance.MinMaxDistance(measures_to_use='all')[source]

Bases: pyabc.distance.distance.RangeEstimatorDistance

Calculate upper and lower margins as max and min of the parameters. This works surprisingly well for normalization in simple cases

static lower(parameter_list)[source]

Calculate the lower margin form a list of parameter values.

Parameters

parameter_list (List[float]) – List of values of a parameter.

Returns

lower_margin – The lower margin of the range calculated from these parameters

Return type

float

static upper(parameter_list)[source]

Calculate the upper margin form a list of parameter values.

Parameters

parameter_list (List[float]) – List of values of a parameter.

Returns

upper_margin – The upper margin of the range calculated from these parameters

Return type

float

class pyabc.distance.NegativeBinomialKernel(p: float, ret_scale: str = 'SCALE_LOG', keys: List[str] = None, pdf_max: float = None)[source]

Bases: pyabc.distance.kernel.StochasticKernel

A kernel with a negative binomial probability mass function.

Parameters
  • p (Union[float, Callable]) – The success probability.

  • keys, pdf_max (ret_scale,) –

__call__(x: dict, x_0: dict, t: int = None, par: dict = None) → float[source]

Evaluate at time point t the distance of the summary statistics of the data simulated for the tentatively sampled particle to those of the observed data.

Abstract method. This method has to be overwritten by all concrete implementations.

Parameters
  • x (dict) – Summary statistics of the data simulated for the tentatively sampled parameter.

  • x_0 (dict) – Summary statistics of the observed data.

  • t (int) – Time point at which to evaluate the distance. Usually, the distance will not depend on the time.

  • par (dict) – The parameters used to create the summary statistics x. These can be required by some distance functions. Usually, the distance will not depend on the parameters.

Returns

distance – Quantifies the distance between the summary statistics of the data simulated for the tentatively sampled particle and of the observed data.

Return type

float

__init__(p: float, ret_scale: str = 'SCALE_LOG', keys: List[str] = None, pdf_max: float = None)[source]

Initialize self. See help(type(self)) for accurate signature.

initialize(t: int, get_all_sum_stats: Callable[], List[dict]], x_0: dict = None)[source]

Remember the summary statistic keys in sorted order, if not set in __init__ already.

class pyabc.distance.NoDistance[source]

Bases: pyabc.distance.base.Distance

Implements a kind of null object as distance function. This can be used as a dummy distance function if e.g. integrated modeling is used.

Note

This distance function cannot be evaluated, so currently it is in particular not possible to use an epsilon threshold which requires initialization, because during initialization the distance function is invoked directly and not via the acceptor as usual. Conceptually, this would be possible and can be implemented on request.

__call__(x: dict, x_0: dict, t: int = None, par: dict = None) → float[source]

Evaluate at time point t the distance of the summary statistics of the data simulated for the tentatively sampled particle to those of the observed data.

Abstract method. This method has to be overwritten by all concrete implementations.

Parameters
  • x (dict) – Summary statistics of the data simulated for the tentatively sampled parameter.

  • x_0 (dict) – Summary statistics of the observed data.

  • t (int) – Time point at which to evaluate the distance. Usually, the distance will not depend on the time.

  • par (dict) – The parameters used to create the summary statistics x. These can be required by some distance functions. Usually, the distance will not depend on the parameters.

Returns

distance – Quantifies the distance between the summary statistics of the data simulated for the tentatively sampled particle and of the observed data.

Return type

float

__init__()[source]

Initialize self. See help(type(self)) for accurate signature.

class pyabc.distance.NormalKernel(cov: numpy.ndarray = None, ret_scale: str = 'SCALE_LOG', keys: List[str] = None, pdf_max: float = None)[source]

Bases: pyabc.distance.kernel.StochasticKernel

A kernel with a normal, i.e. Gaussian, probability density. This is just a wrapper around sp.multivariate_normal.

Parameters
  • cov (array_like, optional (default = identiy matrix)) – Covariance matrix of the distribution.

  • keys, pdf_max (ret_scale,) –

Note

The order of the entries in the mean and cov vectors is assumed to be the same as the one in keys. If keys is None, it is assumed to be the same as the one obtained via sorted(x.keys()) for summary statistics x.

__call__(x: dict, x_0: dict, t: int = None, par: dict = None) → float[source]

Return the value of the normal distribution at x - x_0, or its logarithm.

__init__(cov: numpy.ndarray = None, ret_scale: str = 'SCALE_LOG', keys: List[str] = None, pdf_max: float = None)[source]

Initialize self. See help(type(self)) for accurate signature.

initialize(t: int, get_all_sum_stats: Callable[], List[dict]], x_0: dict = None)[source]

Remember the summary statistic keys in sorted order, if not set in __init__ already.

class pyabc.distance.PCADistance(measures_to_use='all')[source]

Bases: pyabc.distance.distance.DistanceWithMeasureList

Calculate distance in whitened coordinates.

A whitening transformation \(X\) is calculated from an initial sample. The distance is measured as euclidean distance in the transformed space. I.e

\[d(x,y) = \| Wx - Wy \|\]
__call__(x: dict, x_0: dict, t: int = None, par: dict = None) → float[source]

Evaluate at time point t the distance of the summary statistics of the data simulated for the tentatively sampled particle to those of the observed data.

Abstract method. This method has to be overwritten by all concrete implementations.

Parameters
  • x (dict) – Summary statistics of the data simulated for the tentatively sampled parameter.

  • x_0 (dict) – Summary statistics of the observed data.

  • t (int) – Time point at which to evaluate the distance. Usually, the distance will not depend on the time.

  • par (dict) – The parameters used to create the summary statistics x. These can be required by some distance functions. Usually, the distance will not depend on the parameters.

Returns

distance – Quantifies the distance between the summary statistics of the data simulated for the tentatively sampled particle and of the observed data.

Return type

float

__init__(measures_to_use='all')[source]

Initialize self. See help(type(self)) for accurate signature.

initialize(t: int, get_all_sum_stats: Callable[], List[dict]], x_0: dict = None)[source]

This method is called by the ABCSMC framework before the first use of the distance (at the beginning of ABCSMC.run()), and can be used to calibrate it to the statistics of the samples.

The default is to do nothing.

Parameters
  • t (int) – Time point for which to initialize the distance.

  • get_all_sum_stats (Callable[[], List[dict]]) – Returns on command the initial summary statistics.

  • x_0 (dict, optional) – The observed summary statistics.

class pyabc.distance.PNormDistance(p: float = 2, weights: dict = None, factors: dict = None)[source]

Bases: pyabc.distance.base.Distance

Use a weighted p-norm

\[d(x, y) = \left [\sum_{i} \left| w_i ( x_i-y_i ) \right|^{p} \right ]^{1/p}\]

to compute distances between sets of summary statistics. E.g. set p=2 to get a Euclidean distance.

Parameters
  • p (float, optional (default = 2)) – p for p-norm. Required p >= 1, p = np.inf allowed (infinity-norm).

  • weights (dict, optional (default = 1)) – Weights. Dictionary indexed by time points. Each entry contains a dictionary of numeric weights, indexed by summary statistics labels. If None is passed, a weight of 1 is considered for every summary statistic. If no entry is available in weights for a given time point, the maximum available time point is selected. It is also possible to pass a single dictionary index by summary statistics labels, if weights do not change in time.

  • factors (dict, optional (default = 1)) – Scaling factors that the weights are multiplied with. The same structure applies as to weights. If None is passed, a factor of 1 is considered for every summary statistic. Note that in this class, factors are superfluous as everything can be achieved with weights alone, however in subclasses the factors can remain static while weights adapt over time, allowing for greater flexibility.

__call__(x: dict, x_0: dict, t: int = None, par: dict = None) → float[source]

Evaluate at time point t the distance of the summary statistics of the data simulated for the tentatively sampled particle to those of the observed data.

Abstract method. This method has to be overwritten by all concrete implementations.

Parameters
  • x (dict) – Summary statistics of the data simulated for the tentatively sampled parameter.

  • x_0 (dict) – Summary statistics of the observed data.

  • t (int) – Time point at which to evaluate the distance. Usually, the distance will not depend on the time.

  • par (dict) – The parameters used to create the summary statistics x. These can be required by some distance functions. Usually, the distance will not depend on the parameters.

Returns

distance – Quantifies the distance between the summary statistics of the data simulated for the tentatively sampled particle and of the observed data.

Return type

float

__init__(p: float = 2, weights: dict = None, factors: dict = None)[source]

Initialize self. See help(type(self)) for accurate signature.

static format_dict(w, t, sum_stat_keys, default_val=1.0)[source]

Normalize weight or factor dictionary to the employed format.

get_config() → dict[source]

Return configuration of the distance.

Returns

config – Dictionary describing the distance.

Return type

dict

static get_for_t_or_latest(w, t)[source]

Extract values from dict for given time point.

initialize(t: int, get_all_sum_stats: Callable[], List[dict]], x_0: dict = None)[source]

This method is called by the ABCSMC framework before the first use of the distance (at the beginning of ABCSMC.run()), and can be used to calibrate it to the statistics of the samples.

The default is to do nothing.

Parameters
  • t (int) – Time point for which to initialize the distance.

  • get_all_sum_stats (Callable[[], List[dict]]) – Returns on command the initial summary statistics.

  • x_0 (dict, optional) – The observed summary statistics.

class pyabc.distance.PercentileDistance(measures_to_use='all')[source]

Bases: pyabc.distance.distance.RangeEstimatorDistance

Calculate normalization 20% and 80% from percentiles as lower and upper margins

PERCENTILE = 20

The percentiles

get_config()[source]

Return configuration of the distance.

Returns

config – Dictionary describing the distance.

Return type

dict

static lower(parameter_list)[source]

Calculate the lower margin form a list of parameter values.

Parameters

parameter_list (List[float]) – List of values of a parameter.

Returns

lower_margin – The lower margin of the range calculated from these parameters

Return type

float

static upper(parameter_list)[source]

Calculate the upper margin form a list of parameter values.

Parameters

parameter_list (List[float]) – List of values of a parameter.

Returns

upper_margin – The upper margin of the range calculated from these parameters

Return type

float

class pyabc.distance.PoissonKernel(ret_scale: str = 'SCALE_LOG', keys: List[str] = None, pdf_max: float = None)[source]

Bases: pyabc.distance.kernel.StochasticKernel

A kernel with a Poisson probability mass function.

Parameters

keys, pdf_max (ret_scale,) –

__call__(x: dict, x_0: dict, t: int = None, par: dict = None) → float[source]

Evaluate at time point t the distance of the summary statistics of the data simulated for the tentatively sampled particle to those of the observed data.

Abstract method. This method has to be overwritten by all concrete implementations.

Parameters
  • x (dict) – Summary statistics of the data simulated for the tentatively sampled parameter.

  • x_0 (dict) – Summary statistics of the observed data.

  • t (int) – Time point at which to evaluate the distance. Usually, the distance will not depend on the time.

  • par (dict) – The parameters used to create the summary statistics x. These can be required by some distance functions. Usually, the distance will not depend on the parameters.

Returns

distance – Quantifies the distance between the summary statistics of the data simulated for the tentatively sampled particle and of the observed data.

Return type

float

__init__(ret_scale: str = 'SCALE_LOG', keys: List[str] = None, pdf_max: float = None)[source]

Initialize self. See help(type(self)) for accurate signature.

initialize(t: int, get_all_sum_stats: Callable[], List[dict]], x_0: dict = None)[source]

Remember the summary statistic keys in sorted order, if not set in __init__ already.

class pyabc.distance.RangeEstimatorDistance(measures_to_use='all')[source]

Bases: pyabc.distance.distance.DistanceWithMeasureList

Abstract base class for distance functions which estimate is based on a range.

It defines the two template methods lower and upper.

Hence

\[d(x, y) = \sum_{i \in \text{measures}} \left | \frac{x_i - y_i}{u_i - l_i} \right |\]

where \(l_i\) and \(u_i\) are the lower and upper margin for measure \(i\).

__call__(x: dict, x_0: dict, t: int = None, par: dict = None) → float[source]

Evaluate at time point t the distance of the summary statistics of the data simulated for the tentatively sampled particle to those of the observed data.

Abstract method. This method has to be overwritten by all concrete implementations.

Parameters
  • x (dict) – Summary statistics of the data simulated for the tentatively sampled parameter.

  • x_0 (dict) – Summary statistics of the observed data.

  • t (int) – Time point at which to evaluate the distance. Usually, the distance will not depend on the time.

  • par (dict) – The parameters used to create the summary statistics x. These can be required by some distance functions. Usually, the distance will not depend on the parameters.

Returns

distance – Quantifies the distance between the summary statistics of the data simulated for the tentatively sampled particle and of the observed data.

Return type

float

__init__(measures_to_use='all')[source]

Initialize self. See help(type(self)) for accurate signature.

get_config()[source]

Return configuration of the distance.

Returns

config – Dictionary describing the distance.

Return type

dict

initialize(t: int, get_all_sum_stats: Callable[], List[dict]], x_0: dict = None)[source]

This method is called by the ABCSMC framework before the first use of the distance (at the beginning of ABCSMC.run()), and can be used to calibrate it to the statistics of the samples.

The default is to do nothing.

Parameters
  • t (int) – Time point for which to initialize the distance.

  • get_all_sum_stats (Callable[[], List[dict]]) – Returns on command the initial summary statistics.

  • x_0 (dict, optional) – The observed summary statistics.

static lower(parameter_list: List[float])[source]

Calculate the lower margin form a list of parameter values.

Parameters

parameter_list (List[float]) – List of values of a parameter.

Returns

lower_margin – The lower margin of the range calculated from these parameters

Return type

float

static upper(parameter_list: List[float])[source]

Calculate the upper margin form a list of parameter values.

Parameters

parameter_list (List[float]) – List of values of a parameter.

Returns

upper_margin – The upper margin of the range calculated from these parameters

Return type

float

class pyabc.distance.SimpleFunctionDistance(fun)[source]

Bases: pyabc.distance.base.Distance

This is a wrapper around a simple function which calculates the distance. If a function/callable is passed to the ABCSMC class, which is not subclassed from pyabc.Distance, then it is converted to an instance of the SimpleFunctionDistance class.

Parameters

fun (Callable[[dict, dict], float]) – A Callable accepting as parameters (a subset of) the arguments of the pyabc.Distance.__call__ function. Usually at least the summary statistics x and x_0. Returns the distance between both.

__call__(x: dict, x_0: dict, t: int = None, par: dict = None) → float[source]

Evaluate at time point t the distance of the summary statistics of the data simulated for the tentatively sampled particle to those of the observed data.

Abstract method. This method has to be overwritten by all concrete implementations.

Parameters
  • x (dict) – Summary statistics of the data simulated for the tentatively sampled parameter.

  • x_0 (dict) – Summary statistics of the observed data.

  • t (int) – Time point at which to evaluate the distance. Usually, the distance will not depend on the time.

  • par (dict) – The parameters used to create the summary statistics x. These can be required by some distance functions. Usually, the distance will not depend on the parameters.

Returns

distance – Quantifies the distance between the summary statistics of the data simulated for the tentatively sampled particle and of the observed data.

Return type

float

__init__(fun)[source]

Initialize self. See help(type(self)) for accurate signature.

get_config()[source]

Return configuration of the distance.

Returns

config – Dictionary describing the distance.

Return type

dict

class pyabc.distance.SimpleFunctionKernel(fun: Callable, ret_scale: str = 'SCALE_LIN', keys: List[str] = None, pdf_max: float = None)[source]

Bases: pyabc.distance.kernel.StochasticKernel

This is a wrapper around a simple function which calculates the probability density.

Parameters
  • fun (Callable) – A Callable accepting __call__’s parameters. The function should be a pdf or pmf.

  • keys, pdf_max (ret_scale,) –

__call__(x: dict, x_0: dict, t: int = None, par: dict = None) → float[source]

Evaluate at time point t the distance of the summary statistics of the data simulated for the tentatively sampled particle to those of the observed data.

Abstract method. This method has to be overwritten by all concrete implementations.

Parameters
  • x (dict) – Summary statistics of the data simulated for the tentatively sampled parameter.

  • x_0 (dict) – Summary statistics of the observed data.

  • t (int) – Time point at which to evaluate the distance. Usually, the distance will not depend on the time.

  • par (dict) – The parameters used to create the summary statistics x. These can be required by some distance functions. Usually, the distance will not depend on the parameters.

Returns

distance – Quantifies the distance between the summary statistics of the data simulated for the tentatively sampled particle and of the observed data.

Return type

float

__init__(fun: Callable, ret_scale: str = 'SCALE_LIN', keys: List[str] = None, pdf_max: float = None)[source]

Initialize self. See help(type(self)) for accurate signature.

class pyabc.distance.StochasticKernel(ret_scale: str = 'SCALE_LIN', keys: List[str] = None, pdf_max: float = None)[source]

Bases: pyabc.distance.base.Distance

A stochastic kernel assesses the similarity between observed and simulated summary statistics or data via a probability measure.

Note

The returned value cannot be interpreted as a distance function, but rather as an inverse distance, as it increases as the similarity between observed and simulated summary statistics increases. Thus, a StochasticKernel should only be used together with a StochasticAcceptor.

Parameters
  • ret_scale (str, optional (default = SCALE_LIN)) – The scale of the value returned in __call__: Given a proability density p(x,x_0), the returned value can be either of p(x,x_0), or log(p(x,x_0)).

  • keys (List[str], optional) – The keys of the summary statistics, specifying the order to be used.

  • pdf_max (float, optional) – The maximum possible probability density function value. Defaults to None and is then computed as the density at (x_0, x_0), where x_0 denotes the observed summary statistics. Must be overridden if pdf_max is to be used in the analysis by the acceptor and the default is not applicable. This value should be in the scale specified by ret_scale already.

__init__(ret_scale: str = 'SCALE_LIN', keys: List[str] = None, pdf_max: float = None)[source]

Initialize self. See help(type(self)) for accurate signature.

initialize(t: int, get_all_sum_stats: Callable[], List[dict]], x_0: dict = None)[source]

Remember the summary statistic keys in sorted order, if not set in __init__ already.

class pyabc.distance.ZScoreDistance(measures_to_use='all')[source]

Bases: pyabc.distance.distance.DistanceWithMeasureList

Calculate distance as sum of ZScore over the selected measures. The measured Data is the reference for the ZScore.

Hence

\[d(x, y) = \sum_{i \in \text{measures}} \left| \frac{x_i-y_i}{y_i} \right|\]
__call__(x: dict, x_0: dict, t: int = None, par: dict = None) → float[source]

Evaluate at time point t the distance of the summary statistics of the data simulated for the tentatively sampled particle to those of the observed data.

Abstract method. This method has to be overwritten by all concrete implementations.

Parameters
  • x (dict) – Summary statistics of the data simulated for the tentatively sampled parameter.

  • x_0 (dict) – Summary statistics of the observed data.

  • t (int) – Time point at which to evaluate the distance. Usually, the distance will not depend on the time.

  • par (dict) – The parameters used to create the summary statistics x. These can be required by some distance functions. Usually, the distance will not depend on the parameters.

Returns

distance – Quantifies the distance between the summary statistics of the data simulated for the tentatively sampled particle and of the observed data.

Return type

float

pyabc.distance.bias(data, x_0, **kwargs)[source]

Bias of sample to observed value.

pyabc.distance.combined_mean_absolute_deviation(data, x_0, **kwargs)[source]

Compute the sum of the mean absolute deviations to the mean of the samples and to the observed value.

pyabc.distance.combined_median_absolute_deviation(data, x_0, **kwargs)[source]

Compute the sum of the median absolute deviations to the median of the samples and to the observed value.

pyabc.distance.mean(data, **kwargs)[source]

Compute the mean.

pyabc.distance.mean_absolute_deviation(data, **kwargs)[source]

Calculate the mean absolute deviation from the mean.

pyabc.distance.mean_absolute_deviation_to_observation(data, x_0, **kwargs)[source]

Mean absolute deviation of data w.r.t. the observation x_0.

pyabc.distance.median(data, **kwargs)[source]

Compute the median.

pyabc.distance.median_absolute_deviation(data, **kwargs)[source]

Calculate the sample median absolute deviation (MAD) from the median, defined as median(abs(data - median(data)).

pyabc.distance.median_absolute_deviation_to_observation(data, x_0, **kwargs)[source]

Median absolute deviation of data w.r.t. the observation x_0.

pyabc.distance.root_mean_square_deviation(data, x_0, **kwargs)[source]

Square root of the mean squared error, i.e. of the bias squared plus the variance.

pyabc.distance.span(data, **kwargs)[source]

Compute the difference of largest and smallest data point.

pyabc.distance.standard_deviation(data, **kwargs)[source]

Calculate the sample standard deviation (SD).

pyabc.distance.standard_deviation_to_observation(data, x_0, **kwargs)[source]

Standard deviation of absolute deviations of the data w.r.t. the observation x_0.

pyabc.distance.to_distance(maybe_distance)[source]
Parameters
  • maybe_distance (either a Callable as in SimpleFunctionDistance, or a) –

  • object. (pyabc.Distance) –

Returns

Return type

A Distance instance.