pyabc.predictor

Predictor

Predictor models are used in pyABC to regress parameters from data. pypesto.predictor.Predictor defines the abstract base class, pypesto.predictor.SimplePredictor an interface to external predictor implementations. Further, various specific implementations including linear regression, Lasso, Gaussian processes, and neural networks are provided.

class pyabc.predictor.GPKernelHandle(kernels: List[str] = None, kernel_kwargs: List[dict] = None, ard: bool = True)[source]

Bases: object

Convenience class for Gaussian process kernel construction.

Allows to create kernels depending on problem dimensions.

__call__(n_in: int) Kernel[source]
Parameters:

n_in (Input (feature) dimension.) –

Returns:

kernel

Return type:

Kernel created from inputs.

__init__(kernels: List[str] = None, kernel_kwargs: List[dict] = None, ard: bool = True)[source]
Parameters:
  • kernels – Names of sklearn.kernel covariance kernels. Defaults to a radial basis function (a.k.a. squared exponential) kernel “RBF” and a “WhiteKernel” to explain noise in the data. The resulting kernel is the sum of all kernels.

  • kernel_kwargs – Optional arguments passed to the kernel constructors.

  • ard – Automatic relevance determination by assigning a separate length scale per input variable. Only supported by some kernels, currently “RBF” and “Matern”. If set to True, the capable kernels are automatically informed. It the underlying scitki-learn toolbox extends support, this list needs to be updated.

class pyabc.predictor.GPPredictor(kernel: Callable | Kernel = None, normalize_features: bool = True, normalize_labels: bool = True, joint: bool = True, log_pearson: bool = True, **kwargs)[source]

Bases: SimplePredictor

Gaussian process model.

Similar to [1].

__init__(kernel: Callable | Kernel = None, normalize_features: bool = True, normalize_labels: bool = True, joint: bool = True, log_pearson: bool = True, **kwargs)[source]
Parameters:

kernel – Covariance kernel. Can be either a kernel, o

fit(x: ndarray, y: ndarray, w: ndarray = None) None[source]

Fit the predictor to labeled data.

Parameters:
  • x (Samples, shape (n_sample, n_feature).) –

  • y (Targets, shape (n_sample, n_out).) –

  • w (Weights, shape (n_sample,).) –

class pyabc.predictor.HiddenLayerHandle(method: str | List[str] = 'mean', n_layer: int = 1, max_size: int = inf, alpha: float = 1.0)[source]

Bases: object

Convenience class for various layer size strategies.

Allows to define sizes depending on problem dimensions.

__call__(n_in: int, n_out: int, n_sample: int) Tuple[int, ...][source]
Parameters:
  • n_in (Input (feature) dimension.) –

  • n_out (Output (target) dimension.) –

  • n_sample (Number of samples.) –

Returns:

hidden_layer_sizes

Return type:

Tuple of hidden layer sizes.

__init__(method: str | List[str] = 'mean', n_layer: int = 1, max_size: int = inf, alpha: float = 1.0)[source]
Parameters:
  • method

    Method to use. Can be any of:

    • ”heuristic” bases the number of neurons on the number of samples to avoid overfitting. See https://stats.stackexchange.com/questions/181.

    • ”mean” takes the mean of input and output dimension.

    • ”max” takes the maximum of input and output dimension.

    Additionally, a list of methods can be passed, in which case the minimum over all is used.

  • n_layer – Number of layers.

  • max_size – Maximum layer size. Applied to all strategies.

  • alpha – Factor used in “heuristic”. The higher, the fewer neurons. Recommended is a value in the range 2-10.

class pyabc.predictor.LassoPredictor(normalize_features: bool = True, normalize_labels: bool = True, joint: bool = True, log_pearson: bool = True, **kwargs)[source]

Bases: SimplePredictor

Lasso (least absolute shrinkage and selection) model.

Linear model with l1 regularization.

__init__(normalize_features: bool = True, normalize_labels: bool = True, joint: bool = True, log_pearson: bool = True, **kwargs)[source]

Additional keyword arguments are passed on to the model.

class pyabc.predictor.LinearPredictor(normalize_features: bool = True, normalize_labels: bool = True, joint: bool = True, weight_samples: bool = False, log_pearson: bool = True, **kwargs)[source]

Bases: SimplePredictor

Linear predictor model.

Based on [2].

__init__(normalize_features: bool = True, normalize_labels: bool = True, joint: bool = True, weight_samples: bool = False, log_pearson: bool = True, **kwargs)[source]
Parameters:
  • predictor – Predictor model to use, fulfilling the predictor contract.

  • normalize_features – Whether to apply z-score normalization to the input data.

  • normalize_labels – Whether to apply z-score normalization to the parameters.

  • joint – Whether the predictor learns one model for all targets, or separate models per target.

  • weight_samples – Whether to use importance sampling weights. Not that not all predictors may support weighted samples.

  • log_pearson – Whether to log Pearson correlation coefficients after fitting.

fit(x: ndarray, y: ndarray, w: ndarray = None) None[source]

Fit the predictor to labeled data.

Parameters:
  • x (Samples, shape (n_sample, n_feature).) –

  • y (Targets, shape (n_sample, n_out).) –

  • w (Weights, shape (n_sample,).) –

class pyabc.predictor.MLPPredictor(normalize_features: bool = True, normalize_labels: bool = True, joint: bool = True, hidden_layer_sizes: Tuple[int, ...] | Callable = None, log_pearson: bool = True, **kwargs)[source]

Bases: SimplePredictor

Multi-layer perceptron regressor predictor.

See e.g. [3].

__init__(normalize_features: bool = True, normalize_labels: bool = True, joint: bool = True, hidden_layer_sizes: Tuple[int, ...] | Callable = None, log_pearson: bool = True, **kwargs)[source]

Additional keyword arguments are passed on to the model.

Parameters:

hidden_layer_sizes – Network hidden layer sizes. Can be either a tuple of ints, or a callable taking input dimension, output dimension, and number of samples and returning a tuple of ints. The HiddenLayerSize provides some useful defaults.

fit(x: ndarray, y: ndarray, w: ndarray = None) None[source]

Fit the predictor to labeled data.

Parameters:
  • x (Samples, shape (n_sample, n_feature).) –

  • y (Targets, shape (n_sample, n_out).) –

  • w (Weights, shape (n_sample,).) –

class pyabc.predictor.ModelSelectionPredictor(predictors: List[Predictor], split_method: str = 'train_test_split', n_splits: int = 5, test_size: float = 0.2, f_score: Callable = None)[source]

Bases: Predictor

Model selection over a set of predictors.

Picks the model with minimum k-fold cross valdation score and retrains on full data set.

__init__(predictors: List[Predictor], split_method: str = 'train_test_split', n_splits: int = 5, test_size: float = 0.2, f_score: Callable = None)[source]
Parameters:
  • predictors – Set of predictors over which to perform model selection.

  • split_method – Method how to split the data set into training and test data, can be “cross_validation” for a full n_splits fold cross validation, or “train_test_split” for a single separation of a test set of size test_size.

  • n_splits – Number of splits to use in k-fold cross validation.

  • test_size – Fraction of samples to randomly pick as test set, when using a single training and test set.

  • f_score – Score function to assess prediction quality. Defaults to root mean square error normalized by target standard variation. Takes arguments y1, y2, std for prediction, ground truth, and standard variation, and returns the score as a float.

fit(x: ndarray, y: ndarray, w: ndarray = None) None[source]

Fit the predictor to labeled data.

Parameters:
  • x (Samples, shape (n_sample, n_feature).) –

  • y (Targets, shape (n_sample, n_out).) –

  • w (Weights, shape (n_sample,).) –

predict(x: ndarray, normalize: bool = False) ndarray[source]

Predict outputs using the model.

Parameters:
  • x – Samples, shape (n_sample, n_feature) or (n_feature,).

  • normalize – Whether outputs should be normalized, or on the original scale.

Returns:

y

Return type:

Predicted targets, shape (n_sample, n_out).

class pyabc.predictor.Predictor[source]

Bases: ABC

Generic predictor model class.

A predictor should define:

  • fit(x, y, w=None) to fit the model on a sample of data x and outputs y, where x has shape (n_sample, n_feature), and y has shape (n_sample, n_out). Further, gets as a third argument the sample weights if weight_samples is set. Not all predictors support this.

  • predict(X) to predict outputs of shape (n_out,), where X has shape (n_sample, n_feature).

abstract fit(x: ndarray, y: ndarray, w: ndarray = None) None[source]

Fit the predictor to labeled data.

Parameters:
  • x (Samples, shape (n_sample, n_feature).) –

  • y (Targets, shape (n_sample, n_out).) –

  • w (Weights, shape (n_sample,).) –

abstract predict(x: ndarray, normalize: bool = False) ndarray[source]

Predict outputs using the model.

Parameters:
  • x – Samples, shape (n_sample, n_feature) or (n_feature,).

  • normalize – Whether outputs should be normalized, or on the original scale.

Returns:

y

Return type:

Predicted targets, shape (n_sample, n_out).

class pyabc.predictor.SimplePredictor(predictor, normalize_features: bool = True, normalize_labels: bool = True, joint: bool = True, weight_samples: bool = False, log_pearson: bool = True)[source]

Bases: Predictor

Wrapper around generic predictor routines.

__init__(predictor, normalize_features: bool = True, normalize_labels: bool = True, joint: bool = True, weight_samples: bool = False, log_pearson: bool = True)[source]
Parameters:
  • predictor – Predictor model to use, fulfilling the predictor contract.

  • normalize_features – Whether to apply z-score normalization to the input data.

  • normalize_labels – Whether to apply z-score normalization to the parameters.

  • joint – Whether the predictor learns one model for all targets, or separate models per target.

  • weight_samples – Whether to use importance sampling weights. Not that not all predictors may support weighted samples.

  • log_pearson – Whether to log Pearson correlation coefficients after fitting.

fit(x: ndarray, y: ndarray, w: ndarray)[source]

Fit the predictor to labeled data.

Parameters:
  • x (Samples, shape (n_sample, n_feature).) –

  • y (Targets, shape (n_sample, n_out).) –

  • w (Weights, shape (n_sample,).) –

predict(x: ndarray, normalize: bool = False) ndarray[source]

Predict outputs using the model.

Parameters:
  • x – Samples, shape (n_sample, n_feature) or (n_feature,).

  • normalize – Whether outputs should be normalized, or on the original scale.

Returns:

y

Return type:

Predicted targets, shape (n_sample, n_out).

set_use_ixs(x: ndarray, log: bool = True) None[source]

Set feature indices to use.

Parameters:
  • x (Feature matrix, shape (n_sample, n_feature).) –

  • log (Whether to log.) –

pyabc.predictor.root_mean_square_error(y1: ndarray, y2: ndarray, sigma: ndarray | float) float[source]

Root mean square error of y1 - y2 / sigma.

Parameters:
  • y1 (Model simulations, shape (n_sample, n_par).) –

  • y2 (Ground truth values, shape (n_sample, n_par).) –

  • sigma (Normalizations, shape (n_sample,) or (1,).) –

Returns:

val

Return type:

The normalized root mean square error value.

pyabc.predictor.root_mean_square_relative_error(y1: ndarray, y2: ndarray) float[source]

Root mean square relative error of (y1 - y2) / y2.

Note that this may behave badly for ground truth parameters close to 0.

Parameters:
  • y1 (Model simulations.) –

  • y2 (Ground truth values.) –

Returns:

val

Return type:

The normalized root mean square relative error value.