# Quickstart¶

Here is a small example on how to do Bayesian model selection.

There are more examples in the examples section of the documentation, such as a parameter inference example with a single model only.

The notebook can be downloaded here:
`Quickstart`

.

The following classes from the pyABC package are used for this example:

## Step by step explanation¶

### Defining a model¶

To do model selection, we first need some models. A model, in the
simplest case, is just a callable which takes a single `dict`

as input
and returns a single `dict`

as output. The keys of the input
dictionary are the parameters of the model, the output keys denote the
summary statistics. Here, the `dict`

is passed as `parameters`

and
has the entry `x`

, which denotes the mean of a Gaussian. It returns
the observed summary statistics `y`

, which is just the sampled value.

```
In [ ]:
```

```
%matplotlib inline
import os
import tempfile
import scipy.stats as st
from pyabc import (ABCSMC, RV, Distribution,
PercentileDistanceFunction)
# Define a gaussian model
sigma = .5
def model(parameters):
# sample from a gaussian
y = st.norm(parameters.x, sigma).rvs()
# return the sample as dictionary
return {"y": y}
```

For model selection we usually have more than one model. These are assembled in a list. We require a Bayesian prior over the models. The default is to have a uniform prior over the model classes. This concludes the model definition.

```
In [2]:
```

```
# We define two models, but they are identical so far
models = [model, model]
# However, our models' priors are not the same.
# Their mean differs.
mu_x_1, mu_x_2 = 0, 1
parameter_priors = [
Distribution(x=RV("norm", mu_x_1, sigma)),
Distribution(x=RV("norm", mu_x_2, sigma))
]
```

### Configuring the ABCSMC run¶

Having the models defined, we can plug together the `ABCSMC`

class. We
need a distance function, to measure the distance of obtained samples.

```
In [3]:
```

```
# We plug all the ABC options together
abc = ABCSMC(
models, parameter_priors,
PercentileDistanceFunction(measures_to_use=["y"]))
```

### Setting the observed data¶

Actually measured data can now be passed to the ABCSMC. This is set via
the `new`

method, indicating that we start a new run as opposed to
resuming a stored run (see the “resume stored run” example). Moreover,
we have to set the output database where the ABC-SMC run is logged.

```
In [4]:
```

```
# y_observed is the important piece here: our actual observation.
y_observed = 1
# and we define where to store the results
db_path = ("sqlite:///" +
os.path.join(tempfile.gettempdir(), "test.db"))
abc_id = abc.new(db_path, {"y": y_observed})
```

```
INFO:Epsilon:initial epsilon is 0.4949724470321585
INFO:History:Start <ABCSMC(id=3, start_time=2018-04-08 22:10:37.202603, end_time=None)>
```

The `new`

method returns an id, which is the id of the ABC-SMC run in
the database. We’re not usint this id for now. But it might be important
when you load the stored data or want to continue an ABC-SMC run in the
case of having more than one ABC-SMC run stored in a single database.

```
In [5]:
```

```
print("ABC-SMC run ID:", abc_id)
```

```
ABC-SMC run ID: 3
```

### Running the ABC¶

We run the `ABCSMC`

specifying the epsilon value at which to
terminate. The default epsilon strategy is the
`pyabc.epsilon.MedianEpsilon`

. Whatever is reached first, the epsilon
or the maximum number allowed populations, terminates the ABC run. The
method returns a `pyabc.storage.History`

object, which can, for
example, be queried for the posterior probabilities.

```
In [6]:
```

```
# We run the ABC until either criterion is met
history = abc.run(minimum_epsilon=0.05, max_nr_populations=2)
```

```
INFO:ABC:t:0 eps:0.4949724470321585
INFO:ABC:t:1 eps:0.27733174471374944
INFO:History:Done <ABCSMC(id=3, start_time=2018-04-08 22:10:37.202603, end_time=2018-04-08 22:10:45.112760)>
```

Note that the history object is also always accessible from the abcsmc object:

```
In [7]:
```

```
history is abc.history
```

```
Out[7]:
```

```
True
```

The `pyabc.storage.History>`

object can, for example, be queried for
the posterior probabilities in the populations:

```
In [8]:
```

```
# Evaluate the model probabililties
model_probabilities = history.get_model_probabilities()
model_probabilities
```

```
Out[8]:
```

m | 0 | 1 |
---|---|---|

t | ||

0 | 0.350000 | 0.650000 |

1 | 0.320771 | 0.679229 |

And now, let’s visualize the results:

```
In [9]:
```

```
model_probabilities.plot.bar();
```

So model 1 is the more probable one. Which is expected as it was centered at 1 and the observed data was also 1, whereas model 0 was centered at 0, which is farther away from the observed data.