pyabc.predictor
Predictor
Predictor models are used in pyABC to regress parameters from data.
pypesto.predictor.Predictor
defines the abstract
base class, pypesto.predictor.SimplePredictor
an interface to external
predictor implementations.
Further, various specific implementations including linear regression, Lasso,
Gaussian processes, and neural networks are provided.
- class pyabc.predictor.GPKernelHandle(kernels: List[str] = None, kernel_kwargs: List[dict] = None, ard: bool = True)[source]
Bases:
object
Convenience class for Gaussian process kernel construction.
Allows to create kernels depending on problem dimensions.
- __call__(n_in: int) Kernel [source]
- Parameters:
n_in (Input (feature) dimension.)
- Returns:
kernel
- Return type:
Kernel created from inputs.
- __init__(kernels: List[str] = None, kernel_kwargs: List[dict] = None, ard: bool = True)[source]
- Parameters:
kernels – Names of sklearn.kernel covariance kernels. Defaults to a radial basis function (a.k.a. squared exponential) kernel “RBF” and a “WhiteKernel” to explain noise in the data. The resulting kernel is the sum of all kernels.
kernel_kwargs – Optional arguments passed to the kernel constructors.
ard – Automatic relevance determination by assigning a separate length scale per input variable. Only supported by some kernels, currently “RBF” and “Matern”. If set to True, the capable kernels are automatically informed. It the underlying scitki-learn toolbox extends support, this list needs to be updated.
- class pyabc.predictor.GPPredictor(kernel: Callable | Kernel = None, normalize_features: bool = True, normalize_labels: bool = True, joint: bool = True, log_pearson: bool = True, **kwargs)[source]
Bases:
SimplePredictor
Gaussian process model.
Similar to [1].
- class pyabc.predictor.HiddenLayerHandle(method: str | List[str] = 'mean', n_layer: int = 1, max_size: int = inf, alpha: float = 1.0)[source]
Bases:
object
Convenience class for various layer size strategies.
Allows to define sizes depending on problem dimensions.
- __call__(n_in: int, n_out: int, n_sample: int) Tuple[int, ...] [source]
- Parameters:
n_in (Input (feature) dimension.)
n_out (Output (target) dimension.)
n_sample (Number of samples.)
- Returns:
hidden_layer_sizes
- Return type:
Tuple of hidden layer sizes.
- __init__(method: str | List[str] = 'mean', n_layer: int = 1, max_size: int = inf, alpha: float = 1.0)[source]
- Parameters:
method –
Method to use. Can be any of:
”heuristic” bases the number of neurons on the number of samples to avoid overfitting. See https://stats.stackexchange.com/questions/181.
”mean” takes the mean of input and output dimension.
”max” takes the maximum of input and output dimension.
Additionally, a list of methods can be passed, in which case the minimum over all is used.
n_layer – Number of layers.
max_size – Maximum layer size. Applied to all strategies.
alpha – Factor used in “heuristic”. The higher, the fewer neurons. Recommended is a value in the range 2-10.
- class pyabc.predictor.LassoPredictor(normalize_features: bool = True, normalize_labels: bool = True, joint: bool = True, log_pearson: bool = True, **kwargs)[source]
Bases:
SimplePredictor
Lasso (least absolute shrinkage and selection) model.
Linear model with l1 regularization.
- class pyabc.predictor.LinearPredictor(normalize_features: bool = True, normalize_labels: bool = True, joint: bool = True, weight_samples: bool = False, log_pearson: bool = True, **kwargs)[source]
Bases:
SimplePredictor
Linear predictor model.
Based on [2].
[2] Fearnhead, Paul, and Dennis Prangle. “Constructing summary statistics for approximate Bayesian computation: Semi‐automatic approximate Bayesian computation.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 74.3 (2012): 419-474.
- __init__(normalize_features: bool = True, normalize_labels: bool = True, joint: bool = True, weight_samples: bool = False, log_pearson: bool = True, **kwargs)[source]
- Parameters:
predictor – Predictor model to use, fulfilling the predictor contract.
normalize_features – Whether to apply z-score normalization to the input data.
normalize_labels – Whether to apply z-score normalization to the parameters.
joint – Whether the predictor learns one model for all targets, or separate models per target.
weight_samples – Whether to use importance sampling weights. Not that not all predictors may support weighted samples.
log_pearson – Whether to log Pearson correlation coefficients after fitting.
- class pyabc.predictor.MLPPredictor(normalize_features: bool = True, normalize_labels: bool = True, joint: bool = True, hidden_layer_sizes: Tuple[int, ...] | Callable = None, log_pearson: bool = True, **kwargs)[source]
Bases:
SimplePredictor
Multi-layer perceptron regressor predictor.
See e.g. [3].
[3] Jiang, Bai, et al. “Learning summary statistic for approximate Bayesian computation via deep neural network.” Statistica Sinica (2017): 1595-1618.
- __init__(normalize_features: bool = True, normalize_labels: bool = True, joint: bool = True, hidden_layer_sizes: Tuple[int, ...] | Callable = None, log_pearson: bool = True, **kwargs)[source]
Additional keyword arguments are passed on to the model.
- Parameters:
hidden_layer_sizes – Network hidden layer sizes. Can be either a tuple of ints, or a callable taking input dimension, output dimension, and number of samples and returning a tuple of ints. The
HiddenLayerSize
provides some useful defaults.
- class pyabc.predictor.ModelSelectionPredictor(predictors: List[Predictor], split_method: str = 'train_test_split', n_splits: int = 5, test_size: float = 0.2, f_score: Callable = None)[source]
Bases:
Predictor
Model selection over a set of predictors.
Picks the model with minimum k-fold cross valdation score and retrains on full data set.
- __init__(predictors: List[Predictor], split_method: str = 'train_test_split', n_splits: int = 5, test_size: float = 0.2, f_score: Callable = None)[source]
- Parameters:
predictors – Set of predictors over which to perform model selection.
split_method – Method how to split the data set into training and test data, can be “cross_validation” for a full n_splits fold cross validation, or “train_test_split” for a single separation of a test set of size test_size.
n_splits – Number of splits to use in k-fold cross validation.
test_size – Fraction of samples to randomly pick as test set, when using a single training and test set.
f_score – Score function to assess prediction quality. Defaults to root mean square error normalized by target standard variation. Takes arguments y1, y2, std for prediction, ground truth, and standard variation, and returns the score as a float.
- fit(x: ndarray, y: ndarray, w: ndarray = None) None [source]
Fit the predictor to labeled data.
- Parameters:
x (Samples, shape (n_sample, n_feature).)
y (Targets, shape (n_sample, n_out).)
w (Weights, shape (n_sample,).)
- predict(x: ndarray, normalize: bool = False) ndarray [source]
Predict outputs using the model.
- Parameters:
x – Samples, shape (n_sample, n_feature) or (n_feature,).
normalize – Whether outputs should be normalized, or on the original scale.
- Returns:
y
- Return type:
Predicted targets, shape (n_sample, n_out).
- class pyabc.predictor.Predictor[source]
Bases:
ABC
Generic predictor model class.
A predictor should define:
fit(x, y, w=None) to fit the model on a sample of data x and outputs y, where x has shape (n_sample, n_feature), and y has shape (n_sample, n_out). Further, gets as a third argument the sample weights if weight_samples is set. Not all predictors support this.
predict(X) to predict outputs of shape (n_out,), where X has shape (n_sample, n_feature).
- abstract fit(x: ndarray, y: ndarray, w: ndarray = None) None [source]
Fit the predictor to labeled data.
- Parameters:
x (Samples, shape (n_sample, n_feature).)
y (Targets, shape (n_sample, n_out).)
w (Weights, shape (n_sample,).)
- abstract predict(x: ndarray, normalize: bool = False) ndarray [source]
Predict outputs using the model.
- Parameters:
x – Samples, shape (n_sample, n_feature) or (n_feature,).
normalize – Whether outputs should be normalized, or on the original scale.
- Returns:
y
- Return type:
Predicted targets, shape (n_sample, n_out).
- class pyabc.predictor.SimplePredictor(predictor, normalize_features: bool = True, normalize_labels: bool = True, joint: bool = True, weight_samples: bool = False, log_pearson: bool = True)[source]
Bases:
Predictor
Wrapper around generic predictor routines.
- __init__(predictor, normalize_features: bool = True, normalize_labels: bool = True, joint: bool = True, weight_samples: bool = False, log_pearson: bool = True)[source]
- Parameters:
predictor – Predictor model to use, fulfilling the predictor contract.
normalize_features – Whether to apply z-score normalization to the input data.
normalize_labels – Whether to apply z-score normalization to the parameters.
joint – Whether the predictor learns one model for all targets, or separate models per target.
weight_samples – Whether to use importance sampling weights. Not that not all predictors may support weighted samples.
log_pearson – Whether to log Pearson correlation coefficients after fitting.
- fit(x: ndarray, y: ndarray, w: ndarray)[source]
Fit the predictor to labeled data.
- Parameters:
x (Samples, shape (n_sample, n_feature).)
y (Targets, shape (n_sample, n_out).)
w (Weights, shape (n_sample,).)
- predict(x: ndarray, normalize: bool = False) ndarray [source]
Predict outputs using the model.
- Parameters:
x – Samples, shape (n_sample, n_feature) or (n_feature,).
normalize – Whether outputs should be normalized, or on the original scale.
- Returns:
y
- Return type:
Predicted targets, shape (n_sample, n_out).
- pyabc.predictor.root_mean_square_error(y1: ndarray, y2: ndarray, sigma: ndarray | float) float [source]
Root mean square error of y1 - y2 / sigma.
- Parameters:
y1 (Model simulations, shape (n_sample, n_par).)
y2 (Ground truth values, shape (n_sample, n_par).)
sigma (Normalizations, shape (n_sample,) or (1,).)
- Returns:
val
- Return type:
The normalized root mean square error value.
- pyabc.predictor.root_mean_square_relative_error(y1: ndarray, y2: ndarray) float [source]
Root mean square relative error of (y1 - y2) / y2.
Note that this may behave badly for ground truth parameters close to 0.
- Parameters:
y1 (Model simulations.)
y2 (Ground truth values.)
- Returns:
val
- Return type:
The normalized root mean square relative error value.