Resuming stored ABC runs

In this examle, it is illustrated how stored ABC runs can be loaded and continued later on. This might make sense if you decide later on to run a couple more populations for increased accuracy.

The models used in this example are similar to the ones from the parameter inference tutorial.

This notebook can be downloaded here: Resuming stored ABC runs.

In this example, we’re going to use the following classes:

  • ABCSMC, our entry point to parameter inference,

  • RV, to define the prior over a single parameter,

  • Distribution, to define the prior over a possibly higher dimensional parameter space,

[ ]:
# install if not done yet
!pip install pyabc --quiet
[1]:
import os
from tempfile import gettempdir

import numpy as np

from pyabc import ABCSMC, RV, Distribution

As usually, we start with the definition of the model, the prior and the distance function.

[2]:
def model(parameter):
    return {"data": parameter["mean"] + np.random.randn()}


prior = Distribution(mean=RV("uniform", 0, 5))


def distance(x, y):
    return abs(x["data"] - y["data"])


db = "sqlite:///" + os.path.join(gettempdir(), "test.db")

We next make a new ABC-SMC run and also print the id of this run. We’ll use the id later on to resume the run.

[3]:
abc = ABCSMC(model, prior, distance)
history = abc.new(db, {"data": 2.5})
run_id = history.id
print("Run ID:", run_id)
INFO:History:Start <ABCSMC(id=1, start_time=2020-01-10 19:58:36.207963, end_time=None)>
Run ID: 1

We then run up to 3 generations, or until the acceptance threshold 0.1 is reached – whatever happens first.

[4]:
history = abc.run(minimum_epsilon=0.1, max_nr_populations=3)
INFO:ABC:Calibration sample before t=0.
INFO:Epsilon:initial epsilon is 1.281948779424301
INFO:ABC:t: 0, eps: 1.281948779424301.
INFO:ABC:Acceptance rate: 100 / 193 = 5.1813e-01, ESS=1.0000e+02.
INFO:ABC:t: 1, eps: 0.593462311078578.
INFO:ABC:Acceptance rate: 100 / 338 = 2.9586e-01, ESS=8.2825e+01.
INFO:ABC:t: 2, eps: 0.3285232421992942.
INFO:ABC:Acceptance rate: 100 / 506 = 1.9763e-01, ESS=7.8478e+01.
INFO:History:Done <ABCSMC(id=1, start_time=2020-01-10 19:58:36.207963, end_time=2020-01-10 19:58:41.387478)>

Let’s verify that we have 3 populations.

[5]:
history.n_populations
[5]:
3

We now create a completely new ABCSMC object. We pass the same model, prior and distance from before.

[6]:
abc_continued = ABCSMC(model, prior, distance)

Note

You could actually pass different models, priors and distance functions here. This might make sense if, for example, in the meantime you came up with a more efficient model implementation or distance function.

For the experts: under certain circumstances it can even be mathematically correct to change the prior after a couple of populations.

To resume a run, we use the load method. This loads the necessary data. We pass to this method the id of the run we want to continue.

[7]:
abc_continued.load(db, run_id)
[7]:
<pyabc.storage.history.History at 0x7fe45e76b9e8>
[8]:
abc_continued.run(minimum_epsilon=0.1, max_nr_populations=1)
INFO:Epsilon:initial epsilon is 0.19946300333077085
INFO:ABC:t: 3, eps: 0.19946300333077085.
INFO:ABC:Acceptance rate: 100 / 931 = 1.0741e-01, ESS=9.0195e+01.
INFO:History:Done <ABCSMC(id=1, start_time=2020-01-10 19:58:36.207963, end_time=2020-01-10 19:58:48.110429)>
[8]:
<pyabc.storage.history.History at 0x7fe45e76b9e8>

Let’s check the number of populations of the resumed run. It should be 4, as we did 3 populations before and added another one.

[9]:
abc_continued.history.n_populations
[9]:
4

That’s it. This was a basic tutorial on how to continue stored ABC-SMC runs.

Note

For advanced users:

In situations where the distance function or epsilon require initialization, it is possible that resuming a run via load(), we lose information because not everything can be stored in the database. This concerns hyper-parameters in individual objects specified by the user.

If that is the case, however the user can somehow store e.g. the distance function used in the first run, and pass this very object to abc_continued. Then it is ideally fully initialized, so that setting distance_function.require_initialize = False, it is just as if the first run had not been interrupted.

However, even if information was lost, after load() the process usually quickly re-adjusts itself in 1 or 2 iterations, so that this is not much of a problem.