Model selection
Here is a small example on how to do Bayesian model selection.
There are more examples in the examples section of the documentation, such as a parameter inference example with a single model only.
The notebook can be downloaded here:
Model selection
.
The following classes from the pyABC package are used for this example:
[ ]:
# install if not done yet
!pip install pyabc --quiet
Defining the model
To do model selection, we first need some models. A model, in the simplest case, is just a callable which takes a single dict
as input and returns a single dict
as output. The keys of the input dictionary are the parameters of the model, the output keys denote the summary statistics. Here, the dict
is passed as parameters
and has the entry x
, which denotes the mean of a Gaussian. It returns the observed summary statistics y
, which is just the sampled value.
[1]:
%matplotlib inline
import os
import tempfile
import scipy.stats as st
import pyabc
pyabc.settings.set_figure_params('pyabc') # for beautified plots
# Define a gaussian model
sigma = 0.5
def model(parameters):
# sample from a gaussian
y = st.norm(parameters.x, sigma).rvs()
# return the sample as dictionary
return {"y": y}
For model selection we usually have more than one model. These are assembled in a list. We require a Bayesian prior over the models. The default is to have a uniform prior over the model classes. This concludes the model definition.
[2]:
# We define two models, but they are identical so far
models = [model, model]
# However, our models' priors are not the same.
# Their mean differs.
mu_x_1, mu_x_2 = 0, 1
parameter_priors = [
pyabc.Distribution(x=pyabc.RV("norm", mu_x_1, sigma)),
pyabc.Distribution(x=pyabc.RV("norm", mu_x_2, sigma)),
]
Configuring the ABCSMC run
Having the models defined, we can plug together the ABCSMC
class. We need a distance function, to measure the distance of obtained samples.
[3]:
# We plug all the ABC options together
abc = pyabc.ABCSMC(
models, parameter_priors, pyabc.PercentileDistance(measures_to_use=["y"])
)
INFO:Sampler:Parallelizing the sampling on 4 cores.
Setting the observed data
Actually measured data can now be passed to the ABCSMC. This is set via the new
method, indicating that we start a new run as opposed to resuming a stored run (see the “resume stored run” example). Moreover, we have to set the output database where the ABC-SMC run is logged.
[4]:
# y_observed is the important piece here: our actual observation.
y_observed = 1
# and we define where to store the results
db_path = "sqlite:///" + os.path.join(tempfile.gettempdir(), "test.db")
history = abc.new(db_path, {"y": y_observed})
INFO:History:Start <ABCSMC id=1, start_time=2021-02-21 20:27:46.032578>
The new
method returns a history object, whose id identifies the ABC-SMC run in the database. We’re not using this id for now. But it might be important when you load the stored data or want to continue an ABC-SMC run in the case of having more than one ABC-SMC run stored in a single database.
[5]:
print("ABC-SMC run ID:", history.id)
ABC-SMC run ID: 1
Running the ABC
We run the ABCSMC
specifying the epsilon value at which to terminate. The default epsilon strategy is the pyabc.epsilon.MedianEpsilon
. Whatever is reached first, the epsilon or the maximum number allowed populations, terminates the ABC run. The method returns a pyabc.storage.History
object, which can, for example, be queried for the posterior probabilities.
[6]:
# We run the ABC until either criterion is met
history = abc.run(minimum_epsilon=0.2, max_nr_populations=5)
INFO:ABC:Calibration sample t=-1.
INFO:Epsilon:initial epsilon is 0.4513005593964347
INFO:ABC:t: 0, eps: 0.4513005593964347.
INFO:ABC:Acceptance rate: 100 / 221 = 4.5249e-01, ESS=1.0000e+02.
INFO:ABC:t: 1, eps: 0.18718437297658125.
INFO:ABC:Acceptance rate: 100 / 353 = 2.8329e-01, ESS=8.8666e+01.
INFO:pyabc.util:Stopping: minimum epsilon.
INFO:History:Done <ABCSMC id=1, duration=0:00:02.979904, end_time=2021-02-21 20:27:49.012482>
Note that the history object is also always accessible from the ABCSMC
object:
[7]:
history is abc.history
[7]:
True
Visualization and analysis of results
The pyabc.storage.History>
object can, for example, be queried for the posterior probabilities in the populations:
[8]:
# Evaluate the model probabililties
model_probabilities = history.get_model_probabilities()
model_probabilities
[8]:
m | 0 | 1 |
---|---|---|
t | ||
0 | 0.33000 | 0.67000 |
1 | 0.25232 | 0.74768 |
And now, let’s visualize the results:
[9]:
pyabc.visualization.plot_model_probabilities(history)
[9]:
<AxesSubplot:title={'center':'Model probabilities'}, xlabel='Population index', ylabel='Probability'>

So model 1 is the more probable one. Which is expected as it was centered at 1 and the observed data was also 1, whereas model 0 was centered at 0, which is farther away from the observed data.