Transitions (Perturbation kernels)

Perturbation strategies. The classes defined here transition the current population to the next one. pyABC implements global and local transitions. Proposals for the subsequent generation are generated from the current generation density estimates of the current generations. This is equivalent to perturbing randomly chosen particles.

These can be passed to pyabc.smc.ABCSMC via the transitions keyword argument.

class pyabc.transition.DiscreteRandomWalkTransition(n_steps: int = 1, p_l: float = 0.3333333333333333, p_r: float = 0.3333333333333333, p_c: float = 0.3333333333333333)[source]

Bases: pyabc.transition.base.DiscreteTransition

This transition is based on a discrete random walk. This may be useful for discrete ordinal parameter distributions that can be described as lieing on the grid of integers.

Note

This transition does not adapt to the problem structure and thus has potentially slow convergence. Further, the transition does not satisfy proposal >> prior, so that it is indeed not valid as an importance sampling distribution. This can be overcome by selecting the number of steps as a random variable.

Parameters

n_steps (int, optional (default = 1)) – Number of random walk steps to take.

__init__(n_steps: int = 1, p_l: float = 0.3333333333333333, p_r: float = 0.3333333333333333, p_c: float = 0.3333333333333333)[source]

Initialize self. See help(type(self)) for accurate signature.

fit(X: pandas.core.frame.DataFrame, w: numpy.ndarray)[source]

Fit the density estimator (perturber) to the sampled data. Concrete implementations might do something like fitting a KDE.

The parameters given as X and w are automatically stored in self.X and self.w.

Parameters
  • X (pd.DataFrame) – The parameters.

  • w (array) – The corresponding weights

pdf(x: Union[pandas.core.series.Series, pandas.core.frame.DataFrame]) → Union[float, numpy.ndarray][source]

Evaluate the probability mass function (PMF) at x.

rvs(size: int = None) → Union[pandas.core.series.Series, pandas.core.frame.DataFrame]

Sample from the density.

Parameters

size (int, optional) – Number of independent samples to draw. Defaults to 1 and is in this case equivalent to calling “rvs_single”.

Returns

samples

Return type

The samples as pandas DataFrame

Note

This method can be overridden for efficient implementations. The default is to call rvs_single repeatedly (which might not be the most efficient way).

rvs_single() → pandas.core.series.Series[source]

Random variable sample (rvs).

Sample from the fitted distribution.

Returns

sample – A sample from the fitted model.

Return type

pd.Series

class pyabc.transition.DiscreteTransition[source]

Bases: pyabc.transition.base.Transition

This is a base class for discrete transition kernels.

abstract fit(X: pandas.core.frame.DataFrame, w: numpy.ndarray) → None

Fit the density estimator (perturber) to the sampled data. Concrete implementations might do something like fitting a KDE.

The parameters given as X and w are automatically stored in self.X and self.w.

Parameters
  • X (pd.DataFrame) – The parameters.

  • w (array) – The corresponding weights

abstract pdf(x: Union[pandas.core.series.Series, pandas.core.frame.DataFrame]) → Union[float, numpy.ndarray]

Evaluate the probability density function (PDF) at x.

Parameters

x (pd.Series, pd.DataFrame) – Parameter. If x is a series, then x should have the the columns from X passed to the fit method as indices. If x is a DataFrame, then x should have the same columns as X passed before to the fit method. The order of the columns is not important

Returns

density – Probability density at x.

Return type

float

rvs(size: int = None) → Union[pandas.core.series.Series, pandas.core.frame.DataFrame]

Sample from the density.

Parameters

size (int, optional) – Number of independent samples to draw. Defaults to 1 and is in this case equivalent to calling “rvs_single”.

Returns

samples

Return type

The samples as pandas DataFrame

Note

This method can be overridden for efficient implementations. The default is to call rvs_single repeatedly (which might not be the most efficient way).

abstract rvs_single() → pandas.core.series.Series

Random variable sample (rvs).

Sample from the fitted distribution.

Returns

sample – A sample from the fitted model.

Return type

pd.Series

class pyabc.transition.GridSearchCV(estimator=None, param_grid=None, scoring=None, n_jobs=1, iid=True, refit=True, cv=5, verbose=0, pre_dispatch='2*n_jobs', error_score='raise', return_train_score=True)[source]

Bases: sklearn.model_selection._search.GridSearchCV

Do a grid search to automatically select the best parameters for transition classes such as the pyabc.transition.MultivariateNormalTransition.

This is essentially a thin wrapper around ‘sklearn.model_selection.GridSearchCV’. It translates the scikit-learn interface to the interface used in pyABC. It implements hence a thin adapter pattern.

The parameters are just as for sklearn.model_selection.GridSearchCV. Major default values:

  • estimator = MultivariateNormalTransition()

  • param_grid = {‘scaling’: np.linspace(0.05, 1.0, 5)}

  • cv = 5

__init__(estimator=None, param_grid=None, scoring=None, n_jobs=1, iid=True, refit=True, cv=5, verbose=0, pre_dispatch='2*n_jobs', error_score='raise', return_train_score=True)[source]

Initialize self. See help(type(self)) for accurate signature.

fit(X, y=None, groups=None)[source]

Fit the density estimator (perturber) to the sampled data.

class pyabc.transition.LocalTransition(k=None, k_fraction=0.25, scaling=1)[source]

Bases: pyabc.transition.base.Transition

Local KDE fit. Takes into account only the k nearest neighbors, similar to [Filippi].

Parameters
  • k (int) – Number of nearest neighbors for local covariance calculation.

  • scaling (float) – Scaling factor for the local covariance matrices.

  • k_fraction (float, optional) – Calculate number of nearest neighbors to use according to k = k_fraction * population_size (and rounds it).

EPS

Scaling of the identity matrix to be added to the covariance in case the covariances are not invertible.

Type

float

Filippi

Filippi, Sarah, Chris P. Barnes, Julien Cornebise, and Michael P.H. Stumpf. “On Optimality of Kernels for Approximate Bayesian Computation Using Sequential Monte Carlo.” Statistical Applications in Genetics and Molecular Biology 12, no. 1 (2013): 87–107. doi:10.1515/sagmb-2012-0069.

__init__(k=None, k_fraction=0.25, scaling=1)[source]

Initialize self. See help(type(self)) for accurate signature.

fit(X, w)[source]

Fit the density estimator (perturber) to the sampled data. Concrete implementations might do something like fitting a KDE.

The parameters given as X and w are automatically stored in self.X and self.w.

Parameters
  • X (pd.DataFrame) – The parameters.

  • w (array) – The corresponding weights

pdf(x)[source]

Evaluate the probability density function (PDF) at x.

Parameters

x (pd.Series, pd.DataFrame) – Parameter. If x is a series, then x should have the the columns from X passed to the fit method as indices. If x is a DataFrame, then x should have the same columns as X passed before to the fit method. The order of the columns is not important

Returns

density – Probability density at x.

Return type

float

rvs(size: int = None) → Union[pandas.core.series.Series, pandas.core.frame.DataFrame]

Sample from the density.

Parameters

size (int, optional) – Number of independent samples to draw. Defaults to 1 and is in this case equivalent to calling “rvs_single”.

Returns

samples

Return type

The samples as pandas DataFrame

Note

This method can be overridden for efficient implementations. The default is to call rvs_single repeatedly (which might not be the most efficient way).

rvs_single()[source]

Random variable sample (rvs).

Sample from the fitted distribution.

Returns

sample – A sample from the fitted model.

Return type

pd.Series

class pyabc.transition.MultivariateNormalTransition(scaling: float = 1, bandwidth_selector: Callable[[int, int], float] = <function silverman_rule_of_thumb>)[source]

Bases: pyabc.transition.base.Transition

Transition via a multivariate Gaussian KDE estimate.

Parameters
  • scaling (float) – Scaling is a factor which additionally multiplies the covariance with. Since Silverman and Scott usually have too large bandwidths, it should make most sense to have 0 < scaling <= 1

  • bandwidth_selector (optional) – Defaults to silverman_rule_of_thumb. The bandwidth selector is a function of the form f(n_samples: float, dimension: int), where n_samples denotes the (effective) samples size (and is therefore) a float and dimension is the parameter dimension.

__init__(scaling: float = 1, bandwidth_selector: Callable[[int, int], float] = <function silverman_rule_of_thumb>)[source]

Initialize self. See help(type(self)) for accurate signature.

fit(X: pandas.core.frame.DataFrame, w: numpy.ndarray) → None[source]

Fit the density estimator (perturber) to the sampled data. Concrete implementations might do something like fitting a KDE.

The parameters given as X and w are automatically stored in self.X and self.w.

Parameters
  • X (pd.DataFrame) – The parameters.

  • w (array) – The corresponding weights

pdf(x: Union[pandas.core.series.Series, pandas.core.frame.DataFrame]) → Union[float, numpy.ndarray][source]

Evaluate the probability density function (PDF) at x.

Parameters

x (pd.Series, pd.DataFrame) – Parameter. If x is a series, then x should have the the columns from X passed to the fit method as indices. If x is a DataFrame, then x should have the same columns as X passed before to the fit method. The order of the columns is not important

Returns

density – Probability density at x.

Return type

float

rvs(size: int = None) → Union[pandas.core.series.Series, pandas.core.frame.DataFrame][source]

Sample from the density.

Parameters

size (int, optional) – Number of independent samples to draw. Defaults to 1 and is in this case equivalent to calling “rvs_single”.

Returns

samples

Return type

The samples as pandas DataFrame

Note

This method can be overridden for efficient implementations. The default is to call rvs_single repeatedly (which might not be the most efficient way).

rvs_single()[source]

Random variable sample (rvs).

Sample from the fitted distribution.

Returns

sample – A sample from the fitted model.

Return type

pd.Series

exception pyabc.transition.NotEnoughParticles[source]

Bases: Exception

class pyabc.transition.Transition[source]

Bases: sklearn.base.BaseEstimator

Abstract Transition base class. Derive all Transitions from this class

Note

This class does a little bit of meta-programming.

The fit, pdf and rvs methods are automatically wrapped to handle the special case of no parameters.

Hence, you can safely assume that you encounter at least one parameter. All the defined transitions will then automatically generalize to the case of no parameter.

abstract fit(X: pandas.core.frame.DataFrame, w: numpy.ndarray) → None[source]

Fit the density estimator (perturber) to the sampled data. Concrete implementations might do something like fitting a KDE.

The parameters given as X and w are automatically stored in self.X and self.w.

Parameters
  • X (pd.DataFrame) – The parameters.

  • w (array) – The corresponding weights

mean_cv(n_samples: Union[None, int] = None) → float[source]

Estimate the uncertainty on the KDE.

Parameters

n_samples (int, optional) – Estimate the CV for n_samples samples. If this parameter is not given, the sample size of the last fit is used.

Returns

mean_cv – The estimated average coefficient of variation.

Return type

float

Note

A call to this method, as a side effect, also sets the attributes test_points_, test_weights_ and variation_at_test_points_. These are the individual points, weights and variations used to calculate the mean.

abstract pdf(x: Union[pandas.core.series.Series, pandas.core.frame.DataFrame]) → Union[float, numpy.ndarray][source]

Evaluate the probability density function (PDF) at x.

Parameters

x (pd.Series, pd.DataFrame) – Parameter. If x is a series, then x should have the the columns from X passed to the fit method as indices. If x is a DataFrame, then x should have the same columns as X passed before to the fit method. The order of the columns is not important

Returns

density – Probability density at x.

Return type

float

rvs(size: int = None) → Union[pandas.core.series.Series, pandas.core.frame.DataFrame][source]

Sample from the density.

Parameters

size (int, optional) – Number of independent samples to draw. Defaults to 1 and is in this case equivalent to calling “rvs_single”.

Returns

samples

Return type

The samples as pandas DataFrame

Note

This method can be overridden for efficient implementations. The default is to call rvs_single repeatedly (which might not be the most efficient way).

abstract rvs_single() → pandas.core.series.Series[source]

Random variable sample (rvs).

Sample from the fitted distribution.

Returns

sample – A sample from the fitted model.

Return type

pd.Series

pyabc.transition.scott_rule_of_thumb(n_samples, dimension)[source]

Scott’s rule of thumb.

\[\left ( \frac{1}{n} \right ) ^{\frac{1}{d+4}}\]

(see also scipy.stats.kde.gaussian_kde.scotts_factor)

pyabc.transition.silverman_rule_of_thumb(n_samples, dimension)[source]

Silverman’s rule of thumb.

\[\left ( \frac{4}{n (d+2)} \right ) ^ {\frac{1}{d + 4}}\]

(see also scipy.stats.kde.gaussian_kde.silverman_factor)