Optimal acceptance thresholds

Note: The SilkOptimalEpsilon threshold schedule is work in progress and may or may not work. Contact us if you are interested.

[1]:
# install if not done yet
#!pip install pyabc[autograd] --quiet
[2]:
import logging

import matplotlib.pyplot as plt
import numpy as np

import pyabc

pyabc.settings.set_figure_params("pyabc")  # for beautified plots

# debug
for debugger in ["ABC.Distance", "ABC.Epsilon"]:
    logging.getLogger("ABC.Distance").setLevel(logging.DEBUG)

In this notebook:

Issues of quantile-based acceptance thresholds

Approximate Bayesian Computation with a Sequential Monte Carlo scheme (ABC-SMC) generates a sequence of populations constituting sequentially better posterior approximations

\[\pi_{\text{ABC},\varepsilon}(\theta|y_\text{obs}) \propto \int I[d(y,y_\text{obs})\leq\varepsilon]\pi(y|\theta)\pi(\theta)\operatorname{dy},\]

corresponding to reduced acceptance thresholds \(\varepsilon\). Under mild conditions, the ABC posterior converges to the true Bayesian posterior for \(\varepsilon\rightarrow 0\), i.e.

\[\pi_{\text{ABC},\varepsilon}(\theta|y_\text{obs})\rightarrow\pi(\theta|y_\text{obs}).\]

(Note that the use of insufficient summary statistics adds another layer of approximation, which we abstract from here.)

However, in practice \(\varepsilon\rightarrow 0\) is not guaranteed, or rather small-enough levels are not known, raising the question of how to determine the sequence of thresholds \(\{\varepsilon_t\}\). Often, values are pre-defined (ListEpsion), which is however overall impractical, as it requires manual tuning of at least the initial threshold can yield excessively low or high acceptance rates. A common strategy is to set the threshold of generation \(t\) as a quantile of the (weighted) distances \(\{d_i=d(y_i,y_\text{obs})\}_{i\leq N}\) accepted in generation \(t-1\). However, such an approach may fail to focus on small domains of attraction. Consider the following (deterministic) example, discussed in Silk et al. 2012, which possesses a wide local optimum and a small global optimum:

[3]:
def model(p):
    theta = p["theta"]
    return {"y": (theta - 10) ** 2 - 100 * np.exp(-100 * (theta - 3) ** 2)}


p_true = {"theta": 3}
y_obs = model(p_true)

bounds = {"theta": (0, 20)}
prior = pyabc.Distribution(
    **{
        key: pyabc.RV("uniform", lb, ub - lb)
        for key, (lb, ub) in bounds.items()
    }
)

# plot observables over parameters
_, ax = plt.subplots()
xs = np.linspace(0, 20, 1000)
ys = model({"theta": xs})["y"]
ax.plot(xs, ys, color="grey")
ax.set_xlabel(r"$\theta$")
ax.set_ylabel("y");
../_images/examples_optimal_threshold_6_0.png

Some more utilities:

[4]:
pop_size = 200
n_samples = pop_size * 20
db_path = pyabc.create_sqlite_db_id()


def plot_posterior(history, show_history=False, ax=None):
    """Plot 1d posteriors."""
    if ax is None:
        _, ax = plt.subplots(figsize=(10, 6))
    t0 = 0
    if not show_history:
        t0 = history.max_t
    for t in range(t0, history.max_t + 1):
        df, w = history.get_distribution(t=t)
        pyabc.visualization.plot_kde_1d(
            df,
            w,
            xmin=bounds["theta"][0],
            xmax=bounds["theta"][1],
            numx=1000,
            x="theta",
            ax=ax,
            label=f"t={t}" if show_history else None,
            refval=p_true,
            refval_color="grey",
            kde=pyabc.GridSearchCV(),
        )


def plot_pars_over_data(history):
    """Plot parameters over data background."""
    _, ax = plt.subplots(figsize=(10, 6))
    ax.plot(xs, ys - y_obs["y"], color="grey")
    for t in range(history.max_t + 1):
        df, _ = history.get_distribution(t=t)
        eps = history.get_all_populations().set_index("t").loc[t, "epsilon"]
        ax.plot(
            df["theta"],
            eps * np.ones_like(df["theta"]),
            ".",
            markersize=1,
            color=pyabc.visualization.colors.RED900,
        )
    ax.set_xlabel(r"$\theta$")
    ax.set_ylabel(r"$d(y,y_ {obs})$")

Let us first try a quantile-based threshold strategy:

[5]:
abc = pyabc.ABCSMC(
    model,
    prior,
    pyabc.PNormDistance(p=2),
    population_size=pop_size,
    eps=pyabc.QuantileEpsilon(alpha=0.8),
)

abc.new(db_path, y_obs)

h = abc.run(max_total_nr_simulations=n_samples)
ABC.Sampler INFO: Parallelize sampling on 4 processes.
ABC.History INFO: Start <ABCSMC id=26, start_time=2022-03-01 21:24:45>
ABC INFO: Calibration sample t = -1.
ABC INFO: t: 0, eps: 1.16338545e+02.
ABC INFO: Accepted: 200 / 255 = 7.8431e-01, ESS: 2.0000e+02.
ABC INFO: t: 1, eps: 9.08304741e+01.
ABC INFO: Accepted: 200 / 266 = 7.5188e-01, ESS: 1.9438e+02.
ABC INFO: t: 2, eps: 7.24305556e+01.
ABC INFO: Accepted: 200 / 249 = 8.0321e-01, ESS: 1.8418e+02.
ABC INFO: t: 3, eps: 6.48220804e+01.
ABC INFO: Accepted: 200 / 303 = 6.6007e-01, ESS: 1.7112e+02.
ABC INFO: t: 4, eps: 5.96357401e+01.
ABC INFO: Accepted: 200 / 272 = 7.3529e-01, ESS: 1.9748e+02.
ABC INFO: t: 5, eps: 5.69131917e+01.
ABC INFO: Accepted: 200 / 271 = 7.3801e-01, ESS: 1.9841e+02.
ABC INFO: t: 6, eps: 5.51356743e+01.
ABC INFO: Accepted: 200 / 261 = 7.6628e-01, ESS: 1.9802e+02.
ABC INFO: t: 7, eps: 5.36904752e+01.
ABC INFO: Accepted: 200 / 262 = 7.6336e-01, ESS: 1.9887e+02.
ABC INFO: t: 8, eps: 5.27431393e+01.
ABC INFO: Accepted: 200 / 256 = 7.8125e-01, ESS: 1.9775e+02.
ABC INFO: t: 9, eps: 5.20129556e+01.
ABC INFO: Accepted: 200 / 251 = 7.9681e-01, ESS: 1.9930e+02.
ABC INFO: t: 10, eps: 5.16910309e+01.
ABC INFO: Accepted: 200 / 275 = 7.2727e-01, ESS: 1.9856e+02.
ABC INFO: t: 11, eps: 5.14267838e+01.
ABC INFO: Accepted: 200 / 261 = 7.6628e-01, ESS: 1.9883e+02.
ABC INFO: t: 12, eps: 5.12757551e+01.
ABC INFO: Accepted: 200 / 270 = 7.4074e-01, ESS: 1.9832e+02.
ABC INFO: t: 13, eps: 5.11674471e+01.
ABC INFO: Accepted: 200 / 270 = 7.4074e-01, ESS: 1.9940e+02.
ABC INFO: t: 14, eps: 5.11068315e+01.
ABC INFO: Accepted: 200 / 252 = 7.9365e-01, ESS: 1.9894e+02.
ABC INFO: Stop: Total simulations budget.
ABC.History INFO: Done <ABCSMC id=26, duration=0:00:08.042182, end_time=2022-03-01 21:24:53>

The posterior should converge to a point estimate at \(\theta=3\), however when repeating the above analysis, in many cases does not do so, but converges to a bimodal distribution, or one with a single mode at \(\theta=10\). This is because the majority of samples is from the local optimum, such that a focus on a small subset is required to push the sample towards the global mode.

[6]:
plot_posterior(h)
plot_pars_over_data(h);
ABC.Transition INFO: Best params: {'scaling': 0.2875}
../_images/examples_optimal_threshold_12_1.png
../_images/examples_optimal_threshold_12_2.png

As observed in Silk et al., for large quantile values, the chance of completely missing the global optimum increases, which can be explained by more populations being generated until a similar point is reached, each providing an opportunity to miss the global optimum (and from there on ever after with high probability). This can be circumvented by a lower quantile, which increases the focus on the global optimum. Upon repetition, this analysis is less likely to solely focus on the local optimum, and more likely to correctly only sample from the global one.

[7]:
# smaller alpha

abc = pyabc.ABCSMC(
    model,
    prior,
    pyabc.PNormDistance(p=2),
    population_size=pop_size,
    eps=pyabc.QuantileEpsilon(alpha=0.4),
)

abc.new(db_path, y_obs)

h = abc.run(max_total_nr_simulations=n_samples)

plot_posterior(h)
plot_pars_over_data(h);
ABC.Sampler INFO: Parallelize sampling on 4 processes.
ABC.History INFO: Start <ABCSMC id=27, start_time=2022-03-01 21:24:54>
ABC INFO: Calibration sample t = -1.
ABC INFO: t: 0, eps: 6.98156111e+01.
ABC INFO: Accepted: 200 / 444 = 4.5045e-01, ESS: 2.0000e+02.
ABC INFO: t: 1, eps: 5.44899082e+01.
ABC INFO: Accepted: 200 / 522 = 3.8314e-01, ESS: 1.9843e+02.
ABC INFO: t: 2, eps: 5.16041676e+01.
ABC INFO: Accepted: 200 / 488 = 4.0984e-01, ESS: 1.9887e+02.
ABC INFO: t: 3, eps: 5.11285054e+01.
ABC INFO: Accepted: 200 / 501 = 3.9920e-01, ESS: 1.9861e+02.
ABC INFO: t: 4, eps: 5.10206082e+01.
ABC INFO: Accepted: 200 / 526 = 3.8023e-01, ESS: 1.9906e+02.
ABC INFO: t: 5, eps: 5.10030070e+01.
ABC INFO: Accepted: 200 / 508 = 3.9370e-01, ESS: 1.9819e+02.
ABC INFO: t: 6, eps: 5.10006094e+01.
ABC INFO: Accepted: 200 / 520 = 3.8462e-01, ESS: 1.9871e+02.
ABC INFO: t: 7, eps: 5.10000988e+01.
ABC INFO: Accepted: 200 / 513 = 3.8986e-01, ESS: 1.9733e+02.
ABC INFO: Stop: Total simulations budget.
ABC.History INFO: Done <ABCSMC id=27, duration=0:00:04.692389, end_time=2022-03-01 21:24:59>
ABC.Transition INFO: Best params: {'scaling': 0.05}
../_images/examples_optimal_threshold_14_1.png
../_images/examples_optimal_threshold_14_2.png
[8]:
# even smaller alpha

abc = pyabc.ABCSMC(
    model,
    prior,
    pyabc.PNormDistance(p=2),
    population_size=pop_size,
    eps=pyabc.QuantileEpsilon(alpha=0.2),
)

abc.new(db_path, y_obs)

h = abc.run(max_total_nr_simulations=n_samples)

plot_posterior(h)
plot_pars_over_data(h);
ABC.Sampler INFO: Parallelize sampling on 4 processes.
ABC.History INFO: Start <ABCSMC id=28, start_time=2022-03-01 21:25:00>
ABC INFO: Calibration sample t = -1.
ABC INFO: t: 0, eps: 5.53074395e+01.
ABC INFO: Accepted: 200 / 1065 = 1.8779e-01, ESS: 2.0000e+02.
ABC INFO: t: 1, eps: 5.11276404e+01.
ABC INFO: Accepted: 200 / 1164 = 1.7182e-01, ESS: 1.0078e+02.
ABC INFO: t: 2, eps: 5.10006655e+01.
ABC INFO: Accepted: 200 / 6825 = 2.9304e-02, ESS: 1.1171e+02.
ABC INFO: Stop: Total simulations budget.
ABC.History INFO: Done <ABCSMC id=28, duration=0:00:03.955953, end_time=2022-03-01 21:25:04>
ABC.Transition INFO: Best params: {'scaling': 0.05}
../_images/examples_optimal_threshold_15_1.png
../_images/examples_optimal_threshold_15_2.png

Optimal threshold based on predicting the acceptance rate

In general, an appropriate quantile in situations as sketeched above, as well as a sufficiently large population size, are problem-dependent. Silk et al. thus propose a strategy that tries to identify the problem of local optima and propose appropriate values automatically. It does so by predicting the acceptance rate in the next population as a function of the threshold. In the first place, it then chooses a threshold maximizing the Hessian of that function, yielding appropriate values for convex shapes, indicative of local optima. If such potential problems cannot be found (i.e. the function is mostly concave), instead a threshold is chosen as a trade-off of acceptance rate and threshold decrease, minimizing a combined objective. See the paper or the API documentation for further details.

In pyABC, the SilkOptimalEpsilon is based on the above-mentioned approach:

[9]:
abc = pyabc.ABCSMC(
    model,
    prior,
    pyabc.PNormDistance(p=2),
    population_size=pop_size,
    eps=pyabc.SilkOptimalEpsilon(k=10),
)

abc.new(db_path, y_obs)

h = abc.run(max_total_nr_simulations=n_samples)

plot_posterior(h)
plot_pars_over_data(h);
ABC.Sampler INFO: Parallelize sampling on 4 processes.
ABC.History INFO: Start <ABCSMC id=29, start_time=2022-03-01 21:25:04>
ABC INFO: Calibration sample t = -1.
ABC.Epsilon INFO: Optimal threshold for t=0: eps=1.1162e+02, estimated rate=7.6606e-01 (discontinuous=7.6000e-01)
ABC.Population INFO: Recording also rejected particles: True
ABC INFO: t: 0, eps: 1.11620488e+02.
ABC INFO: Accepted: 200 / 261 = 7.6628e-01, ESS: 2.0000e+02.
ABC.Epsilon INFO: Optimal threshold for t=1: eps=4.5116e+01, estimated rate=5.6816e-02 (discontinuous=7.0895e-03)
ABC INFO: t: 1, eps: 4.51162085e+01.
ABC INFO: Accepted: 200 / 29393 = 6.8043e-03, ESS: 1.9995e+02.
ABC.Epsilon INFO: Optimal threshold for t=2: eps=1.6576e+01, estimated rate=5.2353e-01 (discontinuous=5.1976e-01)
ABC INFO: Stop: Total simulations budget.
ABC.History INFO: Done <ABCSMC id=29, duration=0:00:25.216346, end_time=2022-03-01 21:25:30>
ABC.Transition INFO: Best params: {'scaling': 0.05}
../_images/examples_optimal_threshold_17_1.png
../_images/examples_optimal_threshold_17_2.png

This approach more or less reliably converges to the global minimum.

Note that this threshold method may suffer from deteriorating acceptance rates and may thus perform inferior to quantile based thresholds on problems where the latter are applicable. The SilkOptimalEpsilon can be also be combined with adaptive distances.

So, when to use such advanced threshold strategies? For most low-dimensional problems, a simple quantile-based strategy may be sufficient. However, as seen above, especially in the presence of local optima, approaches such as the above one may be necessary. At the least, carefully checking the analysis, e.g. examining low-distance particles, may generally be a good idea.