# Quickstart¶

## Model selection¶

Here is a small example on how to do Bayesian model selection. There are more examples in the examples section of the documentation, such as a parameter inference example with a single model only.

This notebook can be downloaded here:
`Quickstart`

.

The following classes from the pyABC package are used for this example:

## Step by step explanation¶

### Defining a model¶

To do model selection, we need first some models. A model, in the simplest case,
is just a callable which takes a single `dict`

as input
and returns a single `dict`

as output. The keys of the input dictionary are the parameters of the model, the output
keys denote the summary statistics.
Here, the `dict`

is passed as `parameters`

and has the parameter `x`

which denotes the mean of a Gaussian.
It returns the observed summary statistics `y`

, which
is just the sampled value.

```
In [1]:
```

```
%matplotlib inline
import os
import tempfile
import scipy.stats as st
from pyabc import (ABCSMC, RV, Distribution,
PercentileDistanceFunction)
# Define a gaussian model
sigma = .5
def model(parameters):
# sample from a gaussian
y = st.norm(parameters.x, sigma).rvs()
# return the sample as dictionary
return {"y": y}
```

For model selection we usually have more than one model. These are assembled in a list. We require a Bayesian prior over the models. The default is to have a uniform prior over the model classes. This concludes the model definition.

```
In [2]:
```

```
# We define two models, but they are identical so far
models = [model, model]
# However, our models' priors are not the same.
# Their mean differs.
mu_x_1, mu_x_2 = 0, 1
parameter_priors = [
Distribution(x=RV("norm", mu_x_1, sigma)),
Distribution(x=RV("norm", mu_x_2, sigma))
]
```

### Configuring the ABCSMC run¶

Having the models defined, we can plug together the `ABCSMC`

class.
We need a distance function,
to measure the distance of obtained samples.

```
In [3]:
```

```
# We plug all the ABC options together
abc = ABCSMC(
models, parameter_priors,
PercentileDistanceFunction(measures_to_use=["y"]))
```

### Setting the observed data¶

Actually measured data can now be passed to the ABCSMC.
This is set via the `new`

method, indicating that we start
a new run as opposed to resuming a stored run.
Moreover we have to set the output database where the ABC-SMC run
is logged.

```
In [4]:
```

```
# y_observed is the important piece here: our actual observation.
y_observed = 1
# and we define where to store the results
db_path = ("sqlite:///" +
os.path.join(tempfile.gettempdir(), "test.db"))
abc_id = abc.new(db_path, {"y": y_observed})
```

```
INFO:Epsilon:initial epsilon is 0.4287073303146517
INFO:History:Start <ABCSMC(id=2, start_time=2018-02-14 10:38:10.396366, end_time=None)>
```

The `new`

method returns an id, which is the id of the
ABC-SMC run in the database.
We’re not usint this id for now.
But it might be important when you load the stored data or want
to continue and ABC-SMC run in the case of having more than one
ABC-SMC run stored in a single database.

```
In [5]:
```

```
print("ABC-SMC run ID:", abc_id)
```

```
ABC-SMC run ID: 2
```

### Running the ABC¶

We run the `ABCSMC`

specifying the epsilon value at which to terminate.
The default epsilon strategy is the `pyabc.epsilon.MedianEpsilon`

.
Whatever is reached first, the epsilon or the maximum number allowed populations,
terminates the ABC run. The method returns a `History`

object. which
can, for example be queried for the posterior probabilities.

```
In [6]:
```

```
# We run the ABC until either criterion is met
history = abc.run(minimum_epsilon=0.05, max_nr_populations=2)
```

```
INFO:ABC:t:0 eps:0.4287073303146517
INFO:ABC:t:1 eps:0.2061058996742635
INFO:History:Done <ABCSMC(id=2, start_time=2018-02-14 10:38:10.396366, end_time=2018-02-14 10:38:12.101931)>
```

Note that the history object is also always accesible from the abcsmc object:

```
In [7]:
```

```
history is abc.history
```

```
Out[7]:
```

```
True
```

The `History`

object can, for example,
be queried for the posterior probabilities in the populations.

```
In [8]:
```

```
# Evaluate the model probabililties
model_probabilities = history.get_model_probabilities()
model_probabilities
```

```
Out[8]:
```

m | 0 | 1 |
---|---|---|

t | ||

0 | 0.380000 | 0.620000 |

1 | 0.306158 | 0.693842 |

And now, let’s visualize the results:

```
In [9]:
```

```
model_probabilities.plot.bar();
```

So model 1 is the more probable one. Which is expected as it was centered at 1 and the observed data was also 1, whereas model 0 was centered af 0, which is farther away from the observed data.