Parameter inference

This example illustrates parameter inference for a single model. (Check also the model selection example if you’re interested in comparing multiple models.)

This notebook can be downloaded here: Parameter Inference.

We’re going to use the following classes from the pyABC package:

  • ABCSMC, our entry point to parameter inference,
  • RV, to define the prior over a single parameter,
  • Distribution, to define the prior over a possibly higher dimensional parameter space,
  • MultivariateNormalTransition, to do a kernel density estimate (KDE) for visualization purposes.

Let’s start to import the necessary classes. We also set up matplotlib and we’re going to use pandas as well.

In [1]:
from pyabc import (ABCSMC, Distribution, RV,
                   MultivariateNormalTransition)
import scipy as sp
import scipy.stats as st
import tempfile
import os
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

Our model is about as simple as it gets. We assume a Gaussian model \(\mathcal{N}(\mathrm{mean}, 1)\) with the single parameter \(\mathrm{mean}\). The variance is 1. In this case, the parameter dictionary that is passed to the model has only the single key mean. We name the sampled data just data. It might look like overcomplicating things to return a whole dictionary, but as soon as we return heterogeneous data this starts to make a lot of sense.

In [2]:
def model(parameter):
    return {"data": parameter["mean"] + sp.randn()}

We then define the prior for the mean to be uniform over the interval \([0, 5]\):

In [3]:
prior = Distribution(mean=RV("uniform", 0, 5))

(Actually, this has to be read as \([0, 0+5]\). For example, RV("uniform", 1, 5) is uniform over the interval \([1,6]\). Check the scipy.stats package for details of the definition.)

We also need to specify when we consider data to be close in form of a distance funtion. We just take the absolute value of the difference here.

In [4]:
def distance(x, y):
    return abs(x["data"] - y["data"])

Now we create the ABCSMC object, passing the model, the prior and the distance to it.

In [5]:
abc = ABCSMC(model, prior, distance)

To get going, we have to specify where to log the ABC-SMC runs.

We can later query the database with the help of the History class.

Usually you would now have some measure data which you want to know the posterior of. Here, we just assume, that the measured data was 2.5.

In [6]:
db_path = ("sqlite:///" +
           os.path.join(tempfile.gettempdir(), "test.db"))
observation = 2.5
abc.new(db_path, {"data": observation})
INFO:Epsilon:initial epsilon is 1.2393604085164727
INFO:History:Start <ABCSMC(id=5, start_time=2018-04-11 08:15:04.293824, end_time=None)>
Out[6]:
5

The new method returned an integer. This is the id of the ABC-SMC run. This id is only important if more than one ABC-SMC run is stored in the same database.

Let’s start the sampling now. We’ll sample until the acceptance threshold epsilon drops below 0.2. We also specify that we want a maximum number of 10 populations. So whatever is reached first, minimum_epsilon or max_nr_populations, will stop further sampling.

For the simple model we defined above, this should only take a couple of seconds:

In [7]:
history = abc.run(minimum_epsilon=.2, max_nr_populations=10)
INFO:ABC:t:0 eps:1.2393604085164727
INFO:ABC:t:1 eps:0.5937826556325588
INFO:ABC:t:2 eps:0.3472986615093911
INFO:ABC:t:3 eps:0.18389361913444088
INFO:History:Done <ABCSMC(id=5, start_time=2018-04-11 08:15:04.293824, end_time=2018-04-11 08:15:42.121007)>

The History object returned by ABCSMC.run can be used to query the database. This object is also available via abc.history.

In [8]:
history is abc.history
Out[8]:
True

Now we visualize the probability density functions. The vertical line indicates the location of the observation. Given our model, we expect the mean to be close to the observed data.

In [9]:
from pyabc.visualization import plot_kde_1d
fig, ax = plt.subplots()
for t in range(history.max_t+1):
    df, w = history.get_distribution(m=0, t=t)
    plot_kde_1d(df, w,
                xmin=0, xmax=5,
                x="mean", ax=ax,
                label="PDF t={}".format(t))
ax.axvline(observation, color="k", linestyle="dashed");
ax.legend();
../_images/examples_parameter_inference_21_0.png

That’s it. Now you can go ahead and try more sophisticated models.