Transitions (Perturbation Kernels)

Perturbation strategies. The classes defined here transition the current population to the next one. pyABC implements global and local transitions. Proposals for the subsequent generation are generated from the current generation density estimates of the current generations. This is equivalent to perturbing randomly chosen particles.

These can be passed to pyabc.smc.ABCSMC via the transitions keyword argument.

class pyabc.transition.Transition

Bases: sklearn.base.BaseEstimator

Abstract Transition base class. Derive all Transitions from this class

Note

This class does a little bit of meta-programming.

The fit, pdf and rvs methods are automatically wrapped to handle the special case of no parameters.

Hence, you can safely assume that you encounter at least one parameter. All the defined transitions will then automatically generalize to the case of no paramter.

fit(X: pandas.core.frame.DataFrame, w: numpy.ndarray)

Fit the density estimator (perturber) to the sampled data. Concrete implementations might do something like fitting a KDE.

The parameters given as X and w are automatically stored in self.X and self.w.

Parameters:
  • X (pd.DataFrame) – The parameters.
  • w (array) – The corresponding weights
mean_cv(n_samples: Union[NoneType, int] = None) → float

Estimate the uncertainty on the KDE.

Parameters:n_samples (int, optional) – Estimate the CV for n_samples samples. If this parameter is not given, the sample size of the last fit is used.
Returns:mean_cv – The estimated average coefficient of variation.
Return type:float

Note

A call to this method, as a side effect, also sets the attributes test_points_, test_weights_ and variation_at_test_points_. These are the individual points, weights and varations used to calculate the mean.

pdf(x: Union[pandas.core.series.Series, pandas.core.frame.DataFrame]) → Union[float, numpy.ndarray]

Evaluate the probability density function (PDF) at x.

Parameters:x (pd.Series, pd.DataFrame) – Parameter. If x is a series, then x should have the the columns from X passed to the fit method as indices. If x is a DataFrame, then x should have the same columns as X passed before to the fit method. The order of the columns is not important
Returns:density – Probability density at x.
Return type:float
rvs(size=None)

Sample from the density.

Parameters:size (int, optional) – Number of independent samples to draw. Defaults to 1 and is in this case equivalent to calling “rvs_single”.
Returns:samples
Return type:The samples as pandas DataFrame

Note

This method can be overridden for efficient implementations. The default is to call rvs_single repeatedly (which might not be the most efficient way).

rvs_single() → pandas.core.series.Series

Random variable sample (rvs).

Sample from the fitted distribution.

Returns:sample – A sample from the fitted model.
Return type:pd.Series
class pyabc.transition.MultivariateNormalTransition(scaling=1, bandwidth_selector=<function silverman_rule_of_thumb>)

Bases: pyabc.transition.base.Transition

Transition vis a multivariate Gaussian KDE estimate.

Parameters:
  • scaling (float) – Scaling is a factor which additionally multiplies the covariance with. Since Silverman and Scott usually have too large bandwidths, it should make most sense to have 0 < scaling <= 1
  • bandwidth_selector (optional) – Defaults to silverman_rule_of_thumb. The bandwidth selector is a function of the form f(n_samples: float, dimension: int), where n_samples denotes the (effective) samples size (and is therefore) a float and dimension is the parameter dimension.
fit(X: pandas.core.frame.DataFrame, w: numpy.ndarray)

Fit the density estimator (perturber) to the sampled data. Concrete implementations might do something like fitting a KDE.

The parameters given as X and w are automatically stored in self.X and self.w.

Parameters:
  • X (pd.DataFrame) – The parameters.
  • w (array) – The corresponding weights
pdf(x: Union[pandas.core.series.Series, pandas.core.frame.DataFrame])

Evaluate the probability density function (PDF) at x.

Parameters:x (pd.Series, pd.DataFrame) – Parameter. If x is a series, then x should have the the columns from X passed to the fit method as indices. If x is a DataFrame, then x should have the same columns as X passed before to the fit method. The order of the columns is not important
Returns:density – Probability density at x.
Return type:float
rvs_single()

Random variable sample (rvs).

Sample from the fitted distribution.

Returns:sample – A sample from the fitted model.
Return type:pd.Series
class pyabc.transition.GridSearchCV(estimator, param_grid, scoring=None, fit_params=None, n_jobs=1, iid=True, refit=True, cv=5, verbose=0, pre_dispatch='2*n_jobs', error_score='raise', return_train_score=True)

Bases: sklearn.model_selection._search.GridSearchCV

Do a grid search to automatically select the best parameters for transition classes such as the pyabc.transition.MultivariateNormalTransition.

This is essentially a thin wrapper around ‘sklearn.model_selection.GridSearchCV’. It translates the scikit-learn interface to the interface used in pyABC. It implements hence a thin adapter pattern.

fit(X, y=None, groups=None)

Run fit with all sets of parameters.

Parameters:
  • X (array-like, shape = [n_samples, n_features]) – Training vector, where n_samples is the number of samples and n_features is the number of features.
  • y (array-like, shape = [n_samples] or [n_samples, n_output], optional) – Target relative to X for classification or regression; None for unsupervised learning.
  • groups (array-like, with shape (n_samples,), optional) – Group labels for the samples used while splitting the dataset into train/test set.
  • **fit_params (dict of string -> object) – Parameters passed to the fit method of the estimator
exception pyabc.transition.NotEnoughParticles

Bases: Exception

class pyabc.transition.LocalTransition(k=None, k_fraction=0.25, scaling=1)

Bases: pyabc.transition.base.Transition

Local KDE fit. Takes into account only the k nearest neighbors, similar to [Filippi].

Parameters:
  • k (int) – Number of nearest neighbors for local covariance calculation.
  • scaling (float) – Scaling factor for the local covariance matrices.
  • k_fraction (float, optional) – Calculate number of nearest neighbors to use according to k = k_fraction * population_size (and rounds it).
EPS

float – Scaling of the identity matrix to be added to the covariance in case the covariances are not invertible.

[Filippi]Filippi, Sarah, Chris P. Barnes, Julien Cornebise, and Michael P.H. Stumpf. “On Optimality of Kernels for Approximate Bayesian Computation Using Sequential Monte Carlo.” Statistical Applications in Genetics and Molecular Biology 12, no. 1 (2013): 87–107. doi:10.1515/sagmb-2012-0069.
fit(X, w)

Fit the density estimator (perturber) to the sampled data. Concrete implementations might do something like fitting a KDE.

The parameters given as X and w are automatically stored in self.X and self.w.

Parameters:
  • X (pd.DataFrame) – The parameters.
  • w (array) – The corresponding weights
pdf(x)

Evaluate the probability density function (PDF) at x.

Parameters:x (pd.Series, pd.DataFrame) – Parameter. If x is a series, then x should have the the columns from X passed to the fit method as indices. If x is a DataFrame, then x should have the same columns as X passed before to the fit method. The order of the columns is not important
Returns:density – Probability density at x.
Return type:float
rvs_single()

Random variable sample (rvs).

Sample from the fitted distribution.

Returns:sample – A sample from the fitted model.
Return type:pd.Series
pyabc.transition.scott_rule_of_thumb(n_samples, dimension)

Scott’s rule of thumb.

\[\left ( \frac{1}{n} \right ) ^{\frac{1}{d+4}}\]

(see also scipy.stats.kde.gaussian_kde.scotts_factor)

pyabc.transition.silverman_rule_of_thumb(n_samples, dimension)

Silverman’s rule of thumb.

\[\left ( \frac{4}{n (d+2)} \right ) ^ {\frac{1}{d + 4}}\]

(see also scipy.stats.kde.gaussian_kde.silverman_factor)