Parameter inference

This example illustrates parameter inference for a single model. (Check also the model selection example if you’re interested in comparing multiple models.)

This notebook can be downloaded here: Parameter Inference.

We’re going to use the following classes from the pyABC package:

  • ABCSMC, our entry point to parameter inference,
  • RV, to define the prior over a single parameter,
  • Distribution, to define the prior over a possibly higher dimensional parameter space,
  • MultivariateNormalTransition, to do a kernel density estimate (KDE) for visualization purposes.

Let’s start to import the necessary classes. We also set up matplotlib and we’re going to use pandas as well.

In [1]:
from pyabc import (ABCSMC, Distribution, RV,
                   MultivariateNormalTransition)
import scipy as sp
import scipy.stats as st
import tempfile
import os
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

Our model is about as simple as it gets. We assume a Gaussian model \(\mathcal{N}(\mathrm{mean}, 1)\) with the single parameter mean. The variance is 1. In this case, the parameter dictionary which is passed to the model has only the single key mean. We name the sampled data just data. It might seem like overcomplicating things to return a whole dictionary, but as soon as we return heterogeneous data this starts to make a lot of sense.

In [2]:
def model(parameter):
    return {"data": parameter["mean"] + sp.randn()}

We then define the prior over the mean to be uniform over the interval \([0, 5]\).

In [3]:
prior = Distribution(mean=RV("uniform", 0, 5))

(Actually, this has to be read as [0, 0+5]. For example, RV("uniform", 1, 5) is uniform over the interval [1,6]. Check the scipy.stats package for details of the definition.) We also need to express when we consider data to be close in form of a distance funtion. We just take the absolute value of the difference here.

In [4]:
def distance(x, y):
    return abs(x["data"] - y["data"])

Now we create the ABCSMC object, passing the model, the prior and the distance to it.

In [5]:
abc = ABCSMC(model, prior, distance)

To get going, we have to specify where to log the ABC-SMC runs. We can later query the database with the help of the History class. Usually you would now have some measure data which you want to know the posterior of. Here, we just assume, that the measured data was 2.5.

In [6]:
db_path = ("sqlite:///" +
           os.path.join(tempfile.gettempdir(), "test.db"))
observation = 2.5
abc.new(db_path, {"data": observation})
INFO:Epsilon:initial epsilon is 1.4445875309564158
INFO:History:Start <ABCSMC(id=1, start_time=2018-02-14 10:37:58.225691, end_time=None)>
Out[6]:
1

The new method returned an integer. This is the id of the ABC-SMC run. This id is only important if more than one ABC-SMC run is stored in the same database.

Let’s start the sampling now. We’ll sample until the acceptance threshold epsilon drops below 0.2. We also specify that we want a maximum number of 10 populations. So whatever is reached first, minimum_epsilon or max_nr_populations will stop further sampling. For the simple model we defined above, this should only take a couple of seconds.

In [7]:
history = abc.run(minimum_epsilon=.2, max_nr_populations=10)
INFO:ABC:t:0 eps:1.4445875309564158
INFO:ABC:t:1 eps:0.6338713934612004
INFO:ABC:t:2 eps:0.31452414559694164
INFO:ABC:t:3 eps:0.16611608563822114
INFO:History:Done <ABCSMC(id=1, start_time=2018-02-14 10:37:58.225691, end_time=2018-02-14 10:38:03.835733)>

The History object returned by ABCSMC.run can be used to query the database. This object is also available via abc.history

In [8]:
history is abc.history
Out[8]:
True

Now we visualize the probability density functions. The vertical line indicates the location of the observation. Given our model, we expect the mean to be close to the observed data.

In [9]:
from pyabc.visualization import plot_kde_1d
fig, ax = plt.subplots()
for t in range(history.max_t+1):
    df, w = history.get_distribution(m=0, t=t)
    plot_kde_1d(df, w,
                xmin=0, xmax=5,
                x="mean", ax=ax,
                label="PDF t={}".format(t))
ax.axvline(observation, color="k", linestyle="dashed");
ax.legend();
../_images/examples_parameter_inference_17_0.png

That’s it. Now you can go ahead and try more sophisticated models.