Welcome to pyABC’s documentation!¶
 Release
0.10.4
 Source code
pyABC is a framework for distributed, likelihoodfree inference. That means, if you have a model and some data and want to know the posterior distribution over the model parameters, i.e. you want to know with which probability which parameters explain the observed data, then pyABC might be for you.
All you need is some way to numerically draw samples from the model, given the model parameters. pyABC “inverts” the model for you and tells you which parameters were well matching and which ones not. You do not need to analytically calculate the likelihood function.
pyABC runs efficiently on multicore machines and distributed cluster setups. It is easy to use and flexibly extensible.
If you use it in your work, you can cite the paper:
Emmanuel Klinger, Dennis Rickert, Jan Hasenauer; pyABC: distributed, likelihoodfree inference; Bioinformatics 2018; https://doi.org/10.1093/bioinformatics/bty361
What is pyABC about?¶
pyABC helps to solve the problem of parameter inference, in settings where all you can do is simulate from the (blackbox) model but no further analysis is possible. Putting it differently, if you have a forward simulator, then pyABC does the backwards parameter inference step for you.
What you need¶
Some kind of experimentally observed or synthetically generated data
a parametrized stochastic simulator which supposedly explains the data (e.g. a function which possibly uses a random number generator)
What you don’t need¶
the likelihood function: p(dataparameter) is not required.
When better not to use pyABC¶
ABCSMC likelihoodfree inference is not a hammer for every nail. If you’re actually able to write down the likelihood then using it and applying a different inference technique which exploits it might be the better approach. This package helps to solve the much harder problem of likelihoodfree inference.
Why to use pyABC?¶
This is a package for Approximate Bayesian Computation, using a Sequential Monte Carlo scheme. This provides a particularly efficient technique for Bayesian posterior estimation in cases where it is very hard to calculate the likelihood function efficiently.
pyABC was designed to perform well on
multicore and
distributed environments.
pyABC integrates well with SGE like environments, as commonly found in scientific settings, but can also be deployed to the cloud. Amongst other backend modes, Dask.distributed can be used under the hood. A Redis based broker or IPython parallel is also supported.
It sounds like a contradiction, but pyABC is on the one hand easy to use for standard applications, on the other hand it allows for flexible experimentation, exploring all aspects of new ABCSMC schemes. Apart of a rich set of default choices, it is easy to parametrize aspects of your algorithm through the implementation of custom classes.
Installation und Upgrading¶
Preparation¶
This package requires Python 3.6 or later. The package is tested on Linux (using Travis continuous integration).
Not all of the package’s functionality is available for Microsoft Windows. As some of the multicore parallelizations rely on forking, these won’t work on Windows. However, most other parts of the package should work on Windows as well.
My system’s Python distribution is outdated, what now?¶
Several Python distributions can coexist on a single system. If you don’t have access to a recent Python version via your system’s package manager (this might be the case for old Debian or Ubuntu operating systems), it is recommended to install the latest version of the Anaconda Python 3 distribution. See also: Installing Anaconda on a Cluster environment.
PIP Installation¶
Install with root rights into you system’s Python distribution¶
The package can be installed via pip.:
pip install pyabc
into your system’s Python distribution. This requires usually root access.
Install as user into your home directory (recommended)¶
Installing pyABC into your system’s Python distribution can be problematic as you might not want to change your system’s Python installation or you don’t have root rights. The recommended alternative is to install pyABC into your home directory with:
pip install user pyabc
GIT Installation¶
If you want the bleeding edge version, install directly from github:
pip install git+https://github.com/icbdcm/pyabc.git
Upgrading¶
If you want to upgrade from a previous pyABC version, use:
pip install upgrade pyabc
instead of pip install
.
You can also consult the pip documentation
on how to manage packages.
If you installed pyABC into your
home directory with
pip install user pyabc
, then upgrade also with the user
flag:
pip install upgrade user pyabc
Installing Anaconda on a Cluster environment¶
We’re assuming you’re on a Linux environment. Use the most recent Anaconda Python 3.x distribution. As of writing this documentation, this is the Anaconda Python 3.6 distribution. To install it, run:
wget https://repo.continuum.io/archive/Anaconda34.4.0Linuxx86_64.sh
to download the installer. To execute the installer run:
bash Anaconda34.4.0Linuxx86_64.sh
and follow the guided installation process (i.e. approve the license and tell the installer where to install it to). You might want to replace the “4.4.0” by the most recent version of Anaconda. Find out on the Anaconda Download page which one it is.
Note
The Anaconda installer asks you at the end of the installation whether you want to use Anaconda Python as your default Python:: bash
Do you wish the installer to prepend the Anaconda3 install location to PATH in your /home/username/.bashrc ? [yesno] [no] >>>
If you answer yes, the path to the Anaconda installation is prepended to
your PATH
environment variable and subsequent calls to pip
(see below) use the Anaconda Python pip (check with the command
which pip
).
If you answer no, you need to ensure manually, that the correct Python
installation is used.
Just saying “yes” here might safe you from some difficulties later on.
Optional dependencies¶
pyABC has an optional interface to the R language. To enable it install
pyabc via pip install pyabc[R]
. All Python based features will work just
fine if R is not installed. See also
pyABC’s external API.
pyABC optionally uses git to store commit hashed in its database leveraging
the gitpython package. This feature can be installed via
pip install pyabc[git]
.
Examples¶
The following examples should help to get a better idea of how to use pyABC.
Quickstart¶
Here is a small example on how to do Bayesian model selection.
There are more examples in the examples section of the documentation, such as a parameter inference example with a single model only.
The notebook can be downloaded here:
Quickstart
.
The following classes from the pyABC package are used for this example:
PercentileDistanceFunction
Step by step explanation¶
Defining a model¶
To do model selection, we first need some models. A model, in the simplest case, is just a callable which takes a single dict
as input and returns a single dict
as output. The keys of the input dictionary are the parameters of the model, the output keys denote the summary statistics. Here, the dict
is passed as parameters
and has the entry x
, which denotes the mean of a Gaussian. It returns the observed summary statistics y
, which is just the sampled value.
[1]:
%matplotlib inline
import os
import tempfile
import scipy.stats as st
import pyabc
# Define a gaussian model
sigma = .5
def model(parameters):
# sample from a gaussian
y = st.norm(parameters.x, sigma).rvs()
# return the sample as dictionary
return {"y": y}
For model selection we usually have more than one model. These are assembled in a list. We require a Bayesian prior over the models. The default is to have a uniform prior over the model classes. This concludes the model definition.
[2]:
# We define two models, but they are identical so far
models = [model, model]
# However, our models' priors are not the same.
# Their mean differs.
mu_x_1, mu_x_2 = 0, 1
parameter_priors = [
pyabc.Distribution(x=pyabc.RV("norm", mu_x_1, sigma)),
pyabc.Distribution(x=pyabc.RV("norm", mu_x_2, sigma))
]
Configuring the ABCSMC run¶
Having the models defined, we can plug together the ABCSMC
class. We need a distance function, to measure the distance of obtained samples.
[3]:
# We plug all the ABC options together
abc = pyabc.ABCSMC(
models, parameter_priors,
pyabc.PercentileDistance(measures_to_use=["y"]))
Setting the observed data¶
Actually measured data can now be passed to the ABCSMC. This is set via the new
method, indicating that we start a new run as opposed to resuming a stored run (see the “resume stored run” example). Moreover, we have to set the output database where the ABCSMC run is logged.
[4]:
# y_observed is the important piece here: our actual observation.
y_observed = 1
# and we define where to store the results
db_path = ("sqlite:///" +
os.path.join(tempfile.gettempdir(), "test.db"))
history = abc.new(db_path, {"y": y_observed})
INFO:History:Start <ABCSMC(id=42, start_time=20191110 23:35:30.553987, end_time=None)>
The new
method returns a history object, whose id identifies the ABCSMC run in the database. We’re not using this id for now. But it might be important when you load the stored data or want to continue an ABCSMC run in the case of having more than one ABCSMC run stored in a single database.
[5]:
print("ABCSMC run ID:", history.id)
ABCSMC run ID: 42
Running the ABC¶
We run the ABCSMC
specifying the epsilon value at which to terminate. The default epsilon strategy is the pyabc.epsilon.MedianEpsilon
. Whatever is reached first, the epsilon or the maximum number allowed populations, terminates the ABC run. The method returns a pyabc.storage.History
object, which can, for example, be queried for the posterior probabilities.
[6]:
# We run the ABC until either criterion is met
history = abc.run(minimum_epsilon=0.2, max_nr_populations=5)
INFO:ABC:Calibration sample before t=0.
INFO:Epsilon:initial epsilon is 0.6363337188196165
INFO:ABC:t: 0, eps: 0.6363337188196165.
INFO:ABC:Acceptance rate: 100 / 184 = 5.4348e01.
INFO:ABC:t: 1, eps: 0.26980369986431585.
INFO:ABC:Acceptance rate: 100 / 314 = 3.1847e01.
INFO:ABC:t: 2, eps: 0.14036759656554718.
INFO:ABC:Acceptance rate: 100 / 608 = 1.6447e01.
INFO:History:Done <ABCSMC(id=42, start_time=20191110 23:35:30.553987, end_time=20191110 23:35:33.314035)>
Note that the history object is also always accessible from the abcsmc object:
[7]:
history is abc.history
[7]:
True
The pyabc.storage.History>
object can, for example, be queried for the posterior probabilities in the populations:
[8]:
# Evaluate the model probabililties
model_probabilities = history.get_model_probabilities()
model_probabilities
[8]:
m  0  1 

t  
0  0.300000  0.700000 
1  0.275637  0.724363 
2  0.273168  0.726832 
And now, let’s visualize the results:
[9]:
pyabc.visualization.plot_model_probabilities(history)
[9]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f0e6fe47b38>
So model 1 is the more probable one. Which is expected as it was centered at 1 and the observed data was also 1, whereas model 0 was centered at 0, which is farther away from the observed data.
Parameter inference¶
This example illustrates parameter inference for a single model. (Check also the model selection example if you’re interested in comparing multiple models.)
This notebook can be downloaded here:
Parameter Inference
.
We’re going to use the following classes from the pyABC package:
ABCSMC
, our entry point to parameter inference,RV
, to define the prior over a single parameter,Distribution
, to define the prior over a possibly higher dimensional parameter space,MultivariateNormalTransition
, to do a kernel density estimate (KDE) for visualization purposes.
Let’s start to import the necessary classes. We also set up matplotlib and we’re going to use pandas as well.
[1]:
import pyabc
import numpy as np
import scipy.stats as st
import tempfile
import os
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
Our model is about as simple as it gets. We assume a Gaussian model \(\mathcal{N}(\mathrm{mean}, 0.5^2)\) with the single parameter \(\mathrm{mean}\). The variance is \(0.5^2\). In this case, the parameter dictionary that is passed to the model has only the single key mean
. We name the sampled data just data
. It might look like overcomplicating things to return a whole dictionary, but as soon as we return heterogeneous data this starts to make a lot of sense.
[2]:
def model(parameter):
return {"data": parameter["mean"] + 0.5 * np.random.randn()}
We then define the prior for the mean
to be uniform over the interval \([0, 5]\):
[3]:
prior = pyabc.Distribution(mean=pyabc.RV("uniform", 0, 5))
(Actually, this has to be read as \([0, 0+5]\). For example, RV("uniform", 1, 5)
is uniform over the interval \([1,6]\). Check the scipy.stats
package for details of the definition.)
We also need to specify when we consider data to be close in form of a distance funtion. We just take the absolute value of the difference here.
[4]:
def distance(x, y):
return abs(x["data"]  y["data"])
Now we create the ABCSMC
object, passing the model, the prior and the distance to it.
[5]:
abc = pyabc.ABCSMC(model, prior, distance)
To get going, we have to specify where to log the ABCSMC runs.
We can later query the database with the help of the History
class.
Usually you would now have some measure data which you want to know the posterior of. Here, we just assume, that the measured data was 2.5.
[6]:
db_path = ("sqlite:///" +
os.path.join(tempfile.gettempdir(), "test.db"))
observation = 2.5
abc.new(db_path, {"data": observation})
INFO:History:Start <ABCSMC(id=1, start_time=20200113 17:20:35.575163, end_time=None)>
[6]:
<pyabc.storage.history.History at 0x7f9b70fceba8>
The new
method returned an integer. This is the id of the ABCSMC run. This id is only important if more than one ABCSMC run is stored in the same database.
Let’s start the sampling now. We’ll sample until the acceptance threshold epsilon drops below 0.2. We also specify that we want a maximum number of 10 populations. So whatever is reached first, minimum_epsilon
or max_nr_populations
, will stop further sampling.
For the simple model we defined above, this should only take a couple of seconds:
[7]:
history = abc.run(minimum_epsilon=.1, max_nr_populations=10)
INFO:ABC:Calibration sample before t=0.
INFO:Epsilon:initial epsilon is 1.319732537446763
INFO:ABC:t: 0, eps: 1.319732537446763.
INFO:ABC:Acceptance rate: 100 / 183 = 5.4645e01, ESS=1.0000e+02.
INFO:ABC:t: 1, eps: 0.6266780218276696.
INFO:ABC:Acceptance rate: 100 / 212 = 4.7170e01, ESS=9.5391e+01.
INFO:ABC:t: 2, eps: 0.37028667452653585.
INFO:ABC:Acceptance rate: 100 / 293 = 3.4130e01, ESS=7.0363e+01.
INFO:ABC:t: 3, eps: 0.14892234681938504.
INFO:ABC:Acceptance rate: 100 / 713 = 1.4025e01, ESS=7.6078e+01.
INFO:ABC:t: 4, eps: 0.07515483935316057.
INFO:ABC:Acceptance rate: 100 / 1255 = 7.9681e02, ESS=9.2198e+01.
INFO:ABC:Stopping: minimum epsilon.
INFO:History:Done <ABCSMC(id=1, start_time=20200113 17:20:35.575163, end_time=20200113 17:20:41.845830)>
The History
object returned by ABCSMC.run can be used to query the database.
This object is also available via abc.history.
[8]:
history is abc.history
[8]:
True
Now we visualize the probability density functions. The vertical line indicates the location of the observation. Given our model, we expect the mean to be close to the observed data.
[9]:
fig, ax = plt.subplots()
for t in range(history.max_t+1):
df, w = history.get_distribution(m=0, t=t)
pyabc.visualization.plot_kde_1d(
df, w,
xmin=0, xmax=5,
x="mean", ax=ax,
label="PDF t={}".format(t))
ax.axvline(observation, color="k", linestyle="dashed");
ax.legend();
pyABC also offers various other visualization routines in order to analyze the parameter estimation run:
[10]:
_, arr_ax = plt.subplots(2, 2)
pyabc.visualization.plot_sample_numbers(history, ax=arr_ax[0][0])
pyabc.visualization.plot_epsilons(history, ax=arr_ax[0][1])
pyabc.visualization.plot_credible_intervals(
history, levels=[0.95, 0.9, 0.5], ts=[0, 1, 2, 3, 4],
show_mean=True, show_kde_max_1d=True,
refval={'mean': 2.5}, arr_ax=arr_ax[1][0])
pyabc.visualization.plot_effective_sample_sizes(history, ax=arr_ax[1][1])
plt.gcf().set_size_inches((12, 8))
plt.gcf().tight_layout()
That’s it. Now you can go ahead and try more sophisticated models.
Early stopping of model simulations¶
This notebook can be downloaded here:
Early Stopping
.
For certain distance functions and certain models it is possible to calculate the distance onthefly while the model is running. This is e.g. possible if the distance is calculated as a cumulative sum and the model is a stochastic process. For example, Markov Jump Processes belong to this class. However, we want to keep things simple here and only demonstrate how to use the pyABC interface in such cases. So don’t expect a sophisticated (or even useful) model implementation here.
In this example we’ll use in particular the following classes for integrated simulation and accepting/rejecting a parameter:
Let’s start with the necessary imports:
[1]:
%matplotlib inline
from pyabc import (ABCSMC,
RV, Distribution,
IntegratedModel, ModelResult,
MedianEpsilon,
LocalTransition)
from pyabc.sampler import SingleCoreSampler
import matplotlib.pyplot as plt
import os
import tempfile
import pandas as pd
import numpy as np
db_path = ("sqlite:///" +
os.path.join(tempfile.gettempdir(), "test.db"))
We define here a (very) simple stochastic process, purely for demonstrative reasons. First, we fix the number of steps n_steps to 30.
[2]:
n_steps = 30
We then define our process as follows:
in which \(\xi \sim U(0, 1)\) denotes a uniformly in \([0, 1]\) distributed random variable, and \(s\) is the step size, $s = $ step_size.
The function simulate
implements this stochastic process:
[3]:
def simulate(step_size):
trajectory = np.zeros(n_steps)
for t in range(1, n_steps):
xi = np.random.uniform()
trajectory[t] = trajectory[t1] + xi * step_size
return trajectory
We take as distance function between two such generated trajectories the sum of the absolute values of the pointwise differences.
[4]:
def distance(trajectory_1, trajectory_2):
return np.absolute(trajectory_1  trajectory_2).sum()
Let’s run the simulation and plot the trajectories to get a better idea of the so generated data. We set the ground truth step size gt_step_size to
[5]:
gt_step_size = 5
This will be used to generate the data which will be subject to inference later on.
[6]:
gt_trajectory = simulate(gt_step_size)
trajectoy_2 = simulate(2)
dist_1_2 = distance(gt_trajectory, trajectoy_2)
plt.plot(gt_trajectory,
label="Step size = {} (Ground Truth)".format(gt_step_size))
plt.plot(trajectoy_2,
label="Step size = 2")
plt.legend();
plt.title("Distance={:.2f}".format(dist_1_2));
As you might have noted already we could calculate the distance on the fly. After each step in the stochastic process, we could increment the cumulative sum. This will supposedly save time in the ABCSMC run later on.
To implement this in pyABC we use the IntegratedModel
.
Let’s start with the code first and explain it afterwards.
[7]:
class MyStochasticProcess(IntegratedModel):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self.n_early_stopped = 0
def integrated_simulate(self, pars, eps):
cumsum = 0
trajectory = np.zeros(n_steps)
for t in range(1, n_steps):
xi = np.random.uniform()
next_val = trajectory[t1] + xi * pars["step_size"]
cumsum += abs(next_val  gt_trajectory[t])
trajectory[t] = next_val
if cumsum > eps:
self.n_early_stopped += 1
return ModelResult(accepted=False)
return ModelResult(accepted=True,
distance=cumsum,
sum_stats={"trajectory": trajectory})
Our MyStochasticProcess
class is a subclass of IntegratedModel <pyabc.model.IntegratedModel>
.
The __init__
method is not really necessary. Here, we just want to keep track of how often early stopping has actually happened.
More interesting is the integrated_simulate
method. This is where the real thing happens. As already said, we calculate the cumulative sum on the fly. In each simulation step, we update the cumulative sum. Note that gt_trajectory is actually a global variable here. If cumsum > eps at some step of the simulation, we return immediately, indicating that the parameter was not accepted by returning ModelResult(accepted=False)
. If the cumsum never passed eps, the parameter got
accepted. In this case we return an accepted result together with the calculated distance and the trajectory. Note that, while it is mandatory to return the distance, returning the trajectory is optional. If it is returned, it is stored in the database.
We define a uniform prior over the interval \([0, 10]\) over the step size
[8]:
prior = Distribution(step_size=RV("uniform", 0 , 10))
and create and instance of our integrated model MyStochasticProcess
[9]:
model = MyStochasticProcess()
We then configure the ABCSMC run. As the distance function is calculated within MyStochasticProcess
, we just pass None
to the distance_function
parameter. As sampler, we use the SingleCoreSampler
here. We do so to correctly keep track of MyStochasticProcess.n_early_stopped
. Otherwise, the counter gets incremented in subprocesses and we don’t see anything here. Of course, you could also use the MyStochasticProcess
model in a multicore or distributed setting.
Importantly, we prespecify the initial acceptance threshold to a given value, here to 300. Otherwise, pyABC will try to automatically determine it by drawing samples from the prior and evaluating the distance function. However, we do not have a distance function here, so this approach would break down.
[10]:
abc = ABCSMC(models=model,
parameter_priors=prior,
distance_function=None,
sampler=SingleCoreSampler(),
population_size=30,
transitions=LocalTransition(k_fraction=.2),
eps=MedianEpsilon(300, median_multiplier=0.7))
We then indicate that we want to start a new ABCSMC run:
[11]:
abc.new(db_path)
INFO:History:Start <ABCSMC(id=2, start_time=20200517 19:18:46.241180, end_time=None)>
[11]:
<pyabc.storage.history.History at 0x7ff6c8c06c50>
We do not need to pass any data here. However, we could still pass additionally a dictionary {"trajectory": gt_trajectory}
only for storage purposes to the new
method. The data will however be ignored during the ABCSMC run.
Then, let’s start the sampling
[12]:
h = abc.run(minimum_epsilon=40, max_nr_populations=3)
INFO:ABC:t: 0, eps: 300.
INFO:ABC:Acceptance rate: 30 / 138 = 2.1739e01, ESS=3.0000e+01.
INFO:ABC:t: 1, eps: 106.00572890481364.
INFO:ABC:Acceptance rate: 30 / 159 = 1.8868e01, ESS=1.9388e+01.
INFO:ABC:t: 2, eps: 61.485885933264285.
INFO:ABC:Acceptance rate: 30 / 1786 = 1.6797e02, ESS=2.4338e+01.
INFO:History:Done <ABCSMC(id=2, start_time=20200517 19:18:46.241180, end_time=20200517 19:18:48.515938)>
and check how often the early stopping was used:
[13]:
model.n_early_stopped
[13]:
1993
Quite a lot actually.
Lastly we estimate KDEs of the different populations to inspect our results and plot everything (the vertical dashed line is the ground truth step size).
[14]:
from pyabc.visualization import plot_kde_1d
fig, ax = plt.subplots()
for t in range(h.max_t+1):
particles = h.get_distribution(m=0, t=t)
plot_kde_1d(*particles, "step_size",
label="t={}".format(t), ax=ax,
xmin=0, xmax=10, numx=300)
ax.axvline(gt_step_size, color="k", linestyle="dashed");
That’s it. You should be able to see how the distribution contracts around the true parameter.
Resuming stored ABC runs¶
In this examle, it is illustrated how stored ABC runs can be loaded and continued later on. This might make sense if you decide later on to run a couple more populations for increased accuracy.
The models used in this example are similar to the ones from the parameter inference tutorial.
This notebook can be downloaded here:
Resuming stored ABC runs
.
In this example, we’re going to use the following classes:
ABCSMC
, our entry point to parameter inference,RV
, to define the prior over a single parameter,Distribution
, to define the prior over a possibly higher dimensional parameter space,
Let’s start with the imports.
[1]:
from pyabc import ABCSMC, Distribution, RV
import numpy as np
from tempfile import gettempdir
import os
As usually, we start with the definition of the model, the prior and the distance function.
[2]:
def model(parameter):
return {"data": parameter["mean"] + np.random.randn()}
prior = Distribution(mean=RV("uniform", 0, 5))
def distance(x, y):
return abs(x["data"]  y["data"])
db = "sqlite:///" + os.path.join(gettempdir(), "test.db")
We next make a new ABCSMC run and also print the id of this run. We’ll use the id later on to resume the run.
[3]:
abc = ABCSMC(model, prior, distance)
history = abc.new(db, {"data": 2.5})
run_id = history.id
print("Run ID:", run_id)
INFO:History:Start <ABCSMC(id=1, start_time=20200110 19:58:36.207963, end_time=None)>
Run ID: 1
We then run up to 3 generations, or until the acceptance threshold 0.1 is reached – whatever happens first.
[4]:
history = abc.run(minimum_epsilon=.1, max_nr_populations=3)
INFO:ABC:Calibration sample before t=0.
INFO:Epsilon:initial epsilon is 1.281948779424301
INFO:ABC:t: 0, eps: 1.281948779424301.
INFO:ABC:Acceptance rate: 100 / 193 = 5.1813e01, ESS=1.0000e+02.
INFO:ABC:t: 1, eps: 0.593462311078578.
INFO:ABC:Acceptance rate: 100 / 338 = 2.9586e01, ESS=8.2825e+01.
INFO:ABC:t: 2, eps: 0.3285232421992942.
INFO:ABC:Acceptance rate: 100 / 506 = 1.9763e01, ESS=7.8478e+01.
INFO:History:Done <ABCSMC(id=1, start_time=20200110 19:58:36.207963, end_time=20200110 19:58:41.387478)>
Let’s verify that we have 3 populations.
[5]:
history.n_populations
[5]:
3
We now create a completely new ABCSMC object. We pass the same model, prior and distance from before.
[6]:
abc_continued = ABCSMC(model, prior, distance)
Note
You could actually pass different models, priors and distance functions here. This might make sense if, for example, in the meantime you came up with a more efficient model implementation or distance function.
For the experts: under certain circumstances it can even be mathematically correct to change the prior after a couple of populations.
To resume a run, we use the load
method. This loads the necessary data. We pass to this method the id of the run we want to continue.
[7]:
abc_continued.load(db, run_id)
[7]:
<pyabc.storage.history.History at 0x7fe45e76b9e8>
[8]:
abc_continued.run(minimum_epsilon=.1, max_nr_populations=1)
INFO:Epsilon:initial epsilon is 0.19946300333077085
INFO:ABC:t: 3, eps: 0.19946300333077085.
INFO:ABC:Acceptance rate: 100 / 931 = 1.0741e01, ESS=9.0195e+01.
INFO:History:Done <ABCSMC(id=1, start_time=20200110 19:58:36.207963, end_time=20200110 19:58:48.110429)>
[8]:
<pyabc.storage.history.History at 0x7fe45e76b9e8>
Let’s check the number of populations of the resumed run. It should be 4, as we did 3 populations before and added another one.
[9]:
abc_continued.history.n_populations
[9]:
4
That’s it. This was a basic tutorial on how to continue stored ABCSMC runs.
Note
For advanced users:
In situations where the distance function or epsilon require initialization, it is possible that resuming a run via load(), we lose information because not everything can be stored in the database. This concerns hyperparameters in individual objects specified by the user.
If that is the case, however the user can somehow store e.g. the distance function used in the first run, and pass this very object to abc_continued. Then it is ideally fully initialized, so that setting distance_function.require_initialize = False, it is just as if the first run had not been interrupted.
However, even if information was lost, after load() the process usually quickly readjusts itself in 1 or 2 iterations, so that this is not much of a problem.
Using R with pyABC¶
This example illustrates how to use models, summary statistics and distance functions defined in R. We’re assuming you’re already familiar with the basic workings of pyABC. If not, consult the other tutorial examples.
Download this notebook Using R with pyABC
.
In this example, we’re introducing the new class
R
which is our interface with R.
We use this class to to load an external R script.
[1]:
%matplotlib inline
from pyabc.external import R
r = R("myRModel.R")
/home/yannik/pyabc/pyabc/external/r_rpy2.py:76: UserWarning: The support of the R language for ABCSMC is considered experimental. The API might change in the future.
warnings.warn("The support of the R language for ABCSMC is "
Note
R("myRModel.R")
does, amongst other things, the equivalent
to R’s source("myRModel.R")
. That is, the entire script is
executed with all the possible side effects this might have.
You can download the file here: myRModel.R
.
But now, let’s have a look at it.
[2]:
r.display_source_ipython()
[2]:
# Blue print for pyABC parameter inference runs.
# A proposal of how to structure an R file for usage with pyABC
#' The model to be simulated.
#' In this example, it is just a multivariate normal
#' distribution. The number of parameters depends on the
#' model and can be arbitrary. However, the parameters
#' should be real numbers.
#' The return type is arbitrary as well.
myModel < function(pars){
x < rnorm(1) + pars$meanX
y < rnorm(1) + pars$meanY
c(x,y) # It is not important that it is a vector.
}
#' Calculates summary statistics from whatever the model returns
#'
#' It is important that the summary statistics have names
#' to store them correctly in pyABC's database.
#' In many cases, the summary statistics function might just
#' pass through the result of the model function if the
#' summary statistics calculation is already
#' done there. Splitting summary statistics and the model
#' makes most sense in a model selection scenario.
#'
#' @param modelResult The data simulated by the model
#' @return Named list of summary statistics.
mySummaryStatistics < function(modelResult){
list(x=modelResult[1],
y=modelResult[2],
mtcars=mtcars, # Can also pass data frames
cars=cars,
arbitraryKey="Some random text")
}
#' Calculate distance between summary statistics
#'
#' @param sumStatSample The summary statistics of the sample
#' proposed py pyABC
#' @param sumStatData The summary statistics of the observed
#' data for which we want to calculate the posterior.
#' @return A single float
myDistance < function(sumStatSample, sumStatData){
sqrt((sumStatSample$x  sumStatData$x)^2
+ abs(sumStatSample$y  sumStatData$y)^2)
}
# We store the observed data as named list
# in a variable.
# This is passed by pyABC as to myDistance
# as the sumStatData argument
mySumStatData < list(x=4, y=8, mtcars=mtcars, cars=cars)
# The functions for the model, the summary
# statistics and distance
# have to be constructed in
# such a way that the following always makes sense:
#
# myDistance(
# mySummaryStatistics(myModel(list(meanX=1, meanY=2))),
# mySummaryStatistics(myModel(list(meanX=2, meanY=2))))
We see that four relevant objects are defined in the file.
myModel
mySummaryStatistics (optional)
myDistance
mySumStatData
The names of these do not matter. The mySummaryStatistics
is actually optional and can be omitted in case the model calculates the summary statistics directly. We load the defined functions using the r
object:
[3]:
model = r.model("myModel")
distance = r.distance("myDistance")
sum_stat = r.summary_statistics("mySummaryStatistics")
From there on, we can use them (almost) as if they were ordinary Python functions.
[4]:
from pyabc import Distribution, RV, ABCSMC
prior = Distribution(meanX=RV("uniform", 0, 10),
meanY=RV("uniform", 0, 10))
abc = ABCSMC(model, prior, distance,
summary_statistics=sum_stat)
We also load the observation with r.observation
and pass it to a new ABC run.
[5]:
import os
from tempfile import gettempdir
db = "sqlite:///" + os.path.join(gettempdir(), "test.db")
abc.new(db, r.observation("mySumStatData"))
INFO:History:Start <ABCSMC(id=13, start_time=20191105 16:48:13.133784, end_time=None)>
[5]:
13
We start a run which terminates as soon as an acceptance threshold of 0.9 or less is reached or the maximum number of 4 populations is sampled.
[6]:
history = abc.run(minimum_epsilon=0.9, max_nr_populations=4)
INFO:ABC:Calibration sample before t=0.
INFO:Epsilon:initial epsilon is 4.814333033994599
INFO:ABC:t: 0, eps: 4.814333033994599.
INFO:ABC:Acceptance rate: 100 / 228 = 4.3860e01.
INFO:ABC:t: 1, eps: 3.1168154206606684.
INFO:ABC:Acceptance rate: 100 / 198 = 5.0505e01.
INFO:ABC:t: 2, eps: 2.1660393377118674.
INFO:ABC:Acceptance rate: 100 / 257 = 3.8911e01.
INFO:ABC:t: 3, eps: 1.4566595534582736.
INFO:ABC:Acceptance rate: 100 / 348 = 2.8736e01.
INFO:History:Done <ABCSMC(id=13, start_time=20191105 16:48:13.133784, end_time=20191105 16:48:21.052689)>
Lastly, we plot the results and observe how the generations contract slowly around the oserved value. (Note, that the contraction around the observed value is a particular property of the chosen example and not always the case.)
[7]:
from pyabc.visualization import plot_kde_2d
for t in range(history.n_populations):
df, w = abc.history.get_distribution(0, t)
ax = plot_kde_2d(df, w, "meanX", "meanY",
xmin=0, xmax=10,
ymin=0, ymax=10,
numx=100, numy=100)
ax.scatter([4], [8],
edgecolor="black",
facecolor="white",
label="Observation");
ax.legend();
ax.set_title("PDF t={}".format(t))
And we can also retrieve summary statistics such as a stored DataFrame, although the DataFrame was acutally defined in R.
[8]:
history.get_weighted_sum_stats_for_model(m=0, t=1)[1][0]["cars"].head()
[8]:
speed  dist  

1  4.0  2.0 
2  4.0  10.0 
3  7.0  4.0 
4  7.0  22.0 
5  8.0  16.0 
Dumping the results to a file format R can read¶
Although you could query pyABC’s database directly from R since the database
is just a SQL database (e.g. SQLite), pyABC ships with a utility for facilitate
export of the database.
Use the abcdump
utility provided by pyABC to dump results to
file formats such as csv, feather, html, json and others.
These can be easiliy read in by R. See Exporting pyABC’s database for how to use this
utility.
Assume you dumped to the feather format:
abcexport db results.db out exported.feather format feather
You could read the results in with the following R snippet
install.packages("feather")
install.packages("jsonlite")
library("feather")
library("jsonlite")
loadedDf < data.frame(feather("exported.feather"))
jsonStr < loadedDf$sumstat_ss_df[1]
sumStatDf < fromJSON(jsonStr)
If you prefer CSV over the feather format you can also do that.
Ordinary Differential Equations: Conversion Reaction¶
This example was kindly contributed by Lukas Sandmeir and Elba Raimundez. It can be downloaded here:
Ordinary Differential Equations
.
This example provides a model for the interconversion of two species (\(X_1\) and \(X_2\)) following firstorder mass action kinetics with the parameters \(\Theta_1\) and \(\Theta_2\) respectively:
Measurement of \([X_2]\) is provided as \(Y = [X_2]\).
We will show how to estimate \(\Theta_1\) and \(\Theta_2\) using pyABC.
[1]:
%matplotlib inline
from pyabc import (ABCSMC,
RV, Distribution,
MedianEpsilon,
LocalTransition)
from pyabc.visualization import plot_kde_2d, plot_data_callback
import matplotlib.pyplot as plt
import os
import tempfile
import numpy as np
import scipy as sp
db_path = ("sqlite:///" +
os.path.join(tempfile.gettempdir(), "test.db"))
Data¶
We use an artificial data set which consists of a vector of time points \(t\) and a measurement vector \(Y\). This data was created using the parameter values which are assigned to \(\Theta_{\text{true}}\) and by adding normaly distributed measurement noise with variance \(\sigma^2 = 0.015^2\).
ODE model¶
Define the true parameters
[2]:
theta1_true, theta2_true = np.exp([2.5, 2])
and the measurement data
[3]:
measurement_data = np.array([0.0244, 0.0842, 0.1208,
0.1724, 0.2315, 0.2634,
0.2831, 0.3084, 0.3079,
0.3097, 0.3324])
as well as the time points at whith to evaluate
[4]:
measurement_times = np.arange(len(measurement_data))
and the initial conditions for \(X_1\) and \(X_2\)
[5]:
init = np.array([1, 0])
Define the ODE model
[6]:
def f(y, t0, theta1, theta2):
x1, x2 = y
dx1 =  theta1 * x1 + theta2 * x2
dx2 = theta1 * x1  theta2 * x2
return dx1, dx2
def model(pars):
sol = sp.integrate.odeint(
f, init, measurement_times,
args=(pars["theta1"],pars["theta2"]))
return {"X_2": sol[:,1]}
Integration of the ODE model for the true parameter values
[7]:
true_trajectory = model({"theta1": theta1_true,
"theta2": theta2_true})["X_2"]
Let’s visualize the results
[8]:
plt.plot(true_trajectory, color="C0", label='Simulation')
plt.scatter(measurement_times, measurement_data,
color="C1", label='Data')
plt.xlabel('Time $t$')
plt.ylabel('Measurement $Y$')
plt.title('Conversion reaction: True parameters fit')
plt.legend()
plt.show()
[9]:
def distance(simulation, data):
return np.absolute(data["X_2"]  simulation["X_2"]).sum()
Define the prior for \(\Theta_1\) and \(\Theta_2\)
[10]:
parameter_prior = Distribution(theta1=RV("uniform", 0, 1),
theta2=RV("uniform", 0, 1))
parameter_prior.get_parameter_names()
[10]:
['theta1', 'theta2']
[11]:
abc = ABCSMC(models=model,
parameter_priors=parameter_prior,
distance_function=distance,
population_size=50,
transitions=LocalTransition(k_fraction=.3),
eps=MedianEpsilon(500, median_multiplier=0.7))
INFO:Sampler:Parallelizing the sampling on 4 cores.
[12]:
abc.new(db_path, {"X_2": measurement_data});
INFO:History:Start <ABCSMC(id=1, start_time=20200517 19:15:07.639385, end_time=None)>
[13]:
h = abc.run(minimum_epsilon=0.1, max_nr_populations=5)
INFO:ABC:t: 0, eps: 500.
INFO:ABC:Acceptance rate: 50 / 53 = 9.4340e01, ESS=5.0000e+01.
INFO:ABC:t: 1, eps: 1.5042107753358946.
INFO:ABC:Acceptance rate: 50 / 137 = 3.6496e01, ESS=3.4582e+01.
INFO:ABC:t: 2, eps: 0.6016110594343894.
INFO:ABC:Acceptance rate: 50 / 206 = 2.4272e01, ESS=3.2066e+01.
INFO:ABC:t: 3, eps: 0.36455456951279086.
INFO:ABC:Acceptance rate: 50 / 390 = 1.2821e01, ESS=4.8250e+01.
INFO:ABC:t: 4, eps: 0.19779626599870773.
INFO:ABC:Acceptance rate: 50 / 251 = 1.9920e01, ESS=4.2094e+01.
INFO:History:Done <ABCSMC(id=1, start_time=20200517 19:15:07.639385, end_time=20200517 19:15:09.697238)>
Visualization of the probability density functions for \(\Theta_1\) and \(\Theta_2\)
[14]:
fig = plt.figure(figsize=(10,8))
for t in range(h.max_t+1):
ax = fig.add_subplot(3, np.ceil(h.max_t / 3), t+1)
ax = plot_kde_2d(
*h.get_distribution(m=0, t=t), "theta1", "theta2",
xmin=0, xmax=1, numx=200, ymin=0, ymax=1, numy=200, ax=ax)
ax.scatter([theta1_true], [theta2_true], color="C1",
label='$\Theta$ true = {:.3f}, {:.3f}'.format(
theta1_true, theta2_true))
ax.set_title("Posterior t={}".format(t))
ax.legend()
fig.tight_layout()
We can also plot the simulated trajectories:
[15]:
_, ax = plt.subplots()
def plot_data(sum_stat, weight, ax, **kwargs):
"""Plot a single trajectory"""
ax.plot(measurement_times, sum_stat['X_2'], color='grey', alpha=0.1)
def plot_mean(sum_stats, weights, ax, **kwargs):
"""Plot mean over all samples"""
weights = np.array(weights)
weights /= weights.sum()
data = np.array([sum_stat['X_2'] for sum_stat in sum_stats])
mean = (data * weights.reshape((1, 1))).sum(axis=0)
ax.plot(measurement_times, mean, color='C2', label='Sample mean')
ax = plot_data_callback(h, plot_data, plot_mean, ax=ax)
plt.plot(true_trajectory, color="C0", label='Simulation')
plt.scatter(measurement_times, measurement_data,
color="C1", label='Data')
plt.xlabel('Time $t$')
plt.ylabel('Measurement $Y$')
plt.title('Conversion reaction: Simulated data fit')
plt.legend()
plt.show()
Markov Jump Process: Reaction Network¶
In the following, we fit stochastic chemical reaction kinetics with pyABC and show how to perform model selection between two competing models.
This notebook can be downloaded here: Markov Jump Process: Reaction Network
.
We consider the two Markov jump process models \(m_1\) and \(m_2\) for conversion of (chemical) species \(X\) to species \(Y\):
Each model is equipped with a single rate parameter \(k\). To simulate these models, we define a simple Gillespie simulator:
[1]:
import numpy as np
def h(x, pre, c):
return (x**pre).prod(1) * c
def gillespie(x, c, pre, post, max_t):
"""
Gillespie simulation
Parameters

x: 1D array of size n_species
The initial numbers.
c: 1D array of size n_reactions
The reaction rates.
pre: array of size n_reactions x n_species
What is to be consumed.
post: array of size n_reactions x n_species
What is to be produced
max_t: int
Timulate up to time max_t
Returns

t, X: 1d array, 2d array
t: The time points.
X: The history of the species.
``X.shape == (t.size, x.size)``
"""
t = 0
t_store = [t]
x_store = [x.copy()]
S = post  pre
while t < max_t:
h_vec = h(x, pre, c)
h0 = h_vec.sum()
if h0 == 0:
break
delta_t = np.random.exponential(1 / h0)
# no reaction can occur any more
if not np.isfinite(delta_t):
t_store.append(max_t)
x_store.append(x)
break
reaction = np.random.choice(c.size, p=h_vec/h0)
t = t + delta_t
x = x + S[reaction]
t_store.append(t)
x_store.append(x)
return np.array(t_store), np.array(x_store)
Next, we define the models in terms of ther initial molecule numbers \(x_0\), an array pre
which determines what is to be consumed (the left hand side of the reaction equations) and an array post
which determines what is to be produced (the right hand side of the reaction equations). Moreover, we define that the simulation time should not exceed MAX_T
seconds.
Model 1 starts with initial concentrations \(X=40\) and \(Y=3\). The reaction \(X + Y \rightarrow 2Y\) is encoded in pre = [[1, 1]]
and post = [[0, 2]]
.
[2]:
MAX_T = 0.1
class Model1:
__name__ = "Model 1"
x0 = np.array([40, 3]) # Initial molecule numbers
pre = np.array([[1, 1]], dtype=int)
post = np.array([[0, 2]])
def __call__(self, par):
t, X = gillespie(self.x0,
np.array([float(par["rate"])]),
self.pre, self.post,
MAX_T)
return {"t": t, "X" : X}
Model 2 inherits the initial concentration from model 1. The reaction \(X \rightarrow Y\) is incoded in pre = [[1, 0]]
and post = [[0, 1]]
.
[3]:
class Model2(Model1):
__name__ = "Model 2"
pre = np.array([[1, 0]], dtype=int)
post = np.array([[0, 1]])
We draw one stochastic simulation from model 1 (the “Observation”) and and one from model 2 (the “Competition”) and visualize both
[4]:
%matplotlib inline
import matplotlib.pyplot as plt
true_rate = 2.3
observations = [Model1()({"rate": true_rate}),
Model2()({"rate": 30})]
fig, axes = plt.subplots(ncols=2)
fig.set_size_inches((12, 4))
for ax, title, obs in zip(axes, ["Observation", "Competition"],
observations):
ax.step(obs["t"], obs["X"]);
ax.legend(["Species X", "Species Y"]);
ax.set_xlabel("Time");
ax.set_ylabel("Concentration");
ax.set_title(title);
We observe that species \(X\) is converted into species \(Y\) in both cases. The difference of the concentrations over time can be quite subtle.
We define a distance function as \(L_1\) norm of two trajectories, evaluated at 20 time points:
Note that we only consider the concentration of species \(Y\) for distance calculation. And in code:
[5]:
N_TEST_TIMES = 20
t_test_times = np.linspace(0, MAX_T, N_TEST_TIMES)
def distance(x, y):
xt_ind = np.searchsorted(x["t"], t_test_times)  1
yt_ind = np.searchsorted(y["t"], t_test_times)  1
error = (np.absolute(x["X"][:,1][xt_ind]
 y["X"][:,1][yt_ind]).sum()
/ t_test_times.size)
return error
For ABC, we choose for both models a uniform prior over the interval \([0, 100]\) for their single rate parameters:
[6]:
from pyabc import Distribution, RV
prior = Distribution(rate=RV("uniform", 0, 100))
We initialize the ABCSMC class passing the two models, their priors and the distance function.
[7]:
from pyabc import ABCSMC
from pyabc.populationstrategy import AdaptivePopulationSize
abc = ABCSMC([Model1(),
Model2()],
[prior, prior],
distance,
population_size=AdaptivePopulationSize(500, 0.15))
INFO:Sampler:Parallelizing the sampling on 4 cores.
We initialize a new ABC run, taking as observed data the one generated by model 1. The ABC run is to be stored in the sqlite database located at /tmp/mjp.db
.
[8]:
abc_id = abc.new("sqlite:////tmp/mjp.db", observations[0])
INFO:History:Start <ABCSMC(id=1, start_time=20200517 19:12:20.087395, end_time=None)>
We start pyABC which automatically parallelizes across all available cores.
[9]:
history = abc.run(minimum_epsilon=0.7, max_nr_populations=15)
INFO:ABC:Calibration sample before t=0.
INFO:Epsilon:initial epsilon is 10.65
INFO:ABC:t: 0, eps: 10.65.
INFO:ABC:Acceptance rate: 500 / 1023 = 4.8876e01, ESS=5.0000e+02.
INFO:Adaptation:Change nr particles 500 > 115
INFO:ABC:t: 1, eps: 6.75.
INFO:ABC:Acceptance rate: 115 / 272 = 4.2279e01, ESS=1.0816e+02.
INFO:Adaptation:Change nr particles 115 > 89
INFO:ABC:t: 2, eps: 5.316484503773134.
INFO:ABC:Acceptance rate: 89 / 167 = 5.3293e01, ESS=6.2536e+01.
INFO:Adaptation:Change nr particles 89 > 82
INFO:ABC:t: 3, eps: 3.9215865820412024.
INFO:ABC:Acceptance rate: 82 / 220 = 3.7273e01, ESS=5.8737e+01.
INFO:Adaptation:Change nr particles 82 > 92
INFO:ABC:t: 4, eps: 3.2.
INFO:ABC:Acceptance rate: 92 / 330 = 2.7879e01, ESS=2.1439e+01.
INFO:Adaptation:Change nr particles 92 > 75
INFO:ABC:t: 5, eps: 2.45.
INFO:ABC:Acceptance rate: 75 / 401 = 1.8703e01, ESS=2.1633e+01.
INFO:Adaptation:Change nr particles 75 > 96
INFO:ABC:t: 6, eps: 2.145736215699017.
INFO:ABC:Acceptance rate: 96 / 549 = 1.7486e01, ESS=2.9474e+01.
INFO:Adaptation:Change nr particles 96 > 102
INFO:ABC:t: 7, eps: 1.75.
INFO:ABC:Acceptance rate: 102 / 792 = 1.2879e01, ESS=7.8885e+01.
INFO:Adaptation:Change nr particles 102 > 58
INFO:ABC:t: 8, eps: 1.4.
INFO:ABC:Acceptance rate: 58 / 357 = 1.6246e01, ESS=4.9798e+01.
INFO:Adaptation:Change nr particles 58 > 57
INFO:ABC:t: 9, eps: 1.25.
INFO:ABC:Acceptance rate: 57 / 583 = 9.7770e02, ESS=4.1186e+01.
INFO:Adaptation:Change nr particles 57 > 61
INFO:ABC:t: 10, eps: 1.0627609318694404.
INFO:ABC:Acceptance rate: 61 / 1185 = 5.1477e02, ESS=5.7597e+01.
INFO:Adaptation:Change nr particles 61 > 46
INFO:ABC:t: 11, eps: 0.95.
INFO:ABC:Acceptance rate: 46 / 1461 = 3.1485e02, ESS=4.2846e+01.
INFO:Adaptation:Change nr particles 46 > 52
INFO:ABC:t: 12, eps: 0.8.
INFO:ABC:Acceptance rate: 52 / 4317 = 1.2045e02, ESS=3.9330e+01.
INFO:Adaptation:Change nr particles 52 > 53
INFO:ABC:t: 13, eps: 0.75.
INFO:ABC:Acceptance rate: 53 / 4867 = 1.0890e02, ESS=4.8614e+01.
INFO:Adaptation:Change nr particles 53 > 51
INFO:ABC:t: 14, eps: 0.6526353965182403.
INFO:ABC:Acceptance rate: 51 / 15746 = 3.2389e03, ESS=4.4594e+01.
INFO:Adaptation:Change nr particles 51 > 55
INFO:ABC:Stopping: minimum epsilon.
INFO:History:Done <ABCSMC(id=1, start_time=20200517 19:12:20.087395, end_time=20200517 19:14:05.753541)>
We first inspect the model probabilities.
[10]:
ax = history.get_model_probabilities().plot.bar();
ax.set_ylabel("Probability");
ax.set_xlabel("Generation");
ax.legend([1, 2], title="Model", ncol=2,
loc="lower center", bbox_to_anchor=(.5, 1));
The mass at model 2 decreased, the mass at model 1 increased slowly. The correct model 1 is detected towards the later generations. We then inspect the distribution of the rate parameters:
[11]:
from pyabc.visualization import plot_kde_1d
fig, axes = plt.subplots(2)
fig.set_size_inches((6, 6))
axes = axes.flatten()
axes[0].axvline(true_rate, color="black", linestyle="dotted")
for m, ax in enumerate(axes):
for t in range(0, history.n_populations, 2):
df, w = history.get_distribution(m=m, t=t)
if len(w) > 0: # Particles in a model might die out
plot_kde_1d(df, w, "rate", ax=ax, label=f"t={t}",
xmin=0, xmax=20 if m == 0 else 100,
numx=200)
ax.set_title(f"Model {m+1}")
axes[0].legend(title="Generation",
loc="upper left", bbox_to_anchor=(1, 1));
fig.tight_layout()
The true rate is closely approximated by the posterior over the rate of model 1. It is a little harder to interpret the posterior over model 2. Apparently a rate between 20 and 40 yields data most similar to the observed data.
Lastly, we visualize the evolution of the population sizes. The population sizes were automatically selected by pyABC and varied over the course of the generations. (We do not plot the size of th first generation, which was set to 500)
[12]:
populations = history.get_all_populations()
ax = populations[populations.t >= 1].plot("t", "particles",
style= "o")
ax.set_xlabel("Generation");
The initially chosen population size was adapted to the desired target accuracy. A larger population size was automatically selected by pyABC while both models were still alive. The population size decreased during the later populations thereby saving computational time.
Multiscale model: Tumor spheroid growth¶
Download this notebook here: Multiscale model: Tumor spheroid growth
.
Installation prerequisites¶
To run this example, install the tumor2d package available here: Agentbased 2D Tumor growth model. For details on the model, see Jagiella, Rickert, Theis, Hasenauer (2017). (Note that the tumor2d package is not part of the paper by Jagiella, Rickert, Theis, Hasenauer 2017, but was specifically developed and adapted for pyABC, however based on the original source code).
Model description¶
Briefly, the model is a multiscale model describing the growth of a 2D tumor spheroid (see animations below). It is a hybrid discretecontinuum model which exploits an agentbased description for individual cells and a PDEbased description for the extracellular matrix. The intracellular mechanism of celldivision is described by continuoustime Markov chains and decision rules. In particular the initial phase of the growth is substantially stochastic. During the later phases, with higher cell numbers, averaging effects occur.
The proliferating cells are depicted in orange
Extracellular matrix intensity is coded in the intensity of the green color
To simulate a growing spheroid, we employ the simulate function provided by the tumor2d package.
[1]:
from time import time
from tumor2d import simulate
start_time = time()
observation = simulate(division_rate=4.17e2,
initial_spheroid_radius=1.2e1,
initial_quiescent_cell_fraction=7.5e1,
division_depth=100,
ecm_production_rate=5e3,
ecm_degradation_rate=8e4,
ecm_division_threshold=1e2)
print(f"Simulation took {time()  start_time:.2f}s")
Simulation took 49.06s
The simulation took a little time. The simulate function does not return each individual cell. Instead, we “measure” the radial density of proliferating cells, the radial extracellular matrix density and the spheroid radius over time. These statistics summarize the underlying data.
[2]:
%matplotlib inline
import matplotlib.pyplot as plt
from string import capwords
fig, axes = plt.subplots(ncols=3)
fig.set_size_inches((16, 5))
color = {"growth_curve":
"k",
"extra_cellular_matrix_profile":
"green",
"proliferation_profile":
"orange"}
x_label = {"growth_curve":
"Time (d)",
"extra_cellular_matrix_profile":
"Distance to rim (μm)",
"proliferation_profile":
"Distance to rim (μm)"}
y_label = {"growth_curve":
"Radius (μm)",
"extra_cellular_matrix_profile":
"Extracellular matrix intensity",
"proliferation_profile":
"Fraction proliferating cells"}
for ax, (key, val) in zip(axes, observation.items()):
ax.plot(val, color=color[key])
ax.set_title(capwords(key.replace("_", " ")))
ax.set_xlabel(x_label[key])
ax.set_ylabel(y_label[key])
if key.endswith("profile"):
ax.set_xlim(0, 600)
Exemplary pyABC inference run with the model¶
To perform parameter inference with pyABC, we first define a prior over the seven parameters. We do this on the logdomain.
[3]:
from pyabc import Distribution, RV
limits = dict(log_division_rate=(3, 1),
log_division_depth=(1, 3),
log_initial_spheroid_radius=(0, 1.2),
log_initial_quiescent_cell_fraction=(5, 0),
log_ecm_production_rate=(5, 0),
log_ecm_degradation_rate=(5, 0),
log_ecm_division_threshold=(5, 0))
prior = Distribution(**{key: RV("uniform", a, b  a)
for key, (a,b) in limits.items()})
Note that we define the limits
dictionary for later visualization usages and then initialize the prior indirectly to not to repeat ourselves.
Each spheroid simulation runs already in its own subprocess, which is forked off from the main process by the simulate function. We therefore do not need to parallelize using multiprocessing, but will instead use multithreading. pyABC ships with a flexible sampler which accepts any backend implementing the concurrent.futures.Executor
interface. For instance, we can use the ThreadPoolExecutor
. For distributed execution the IPython parallel client would be an alternative, but the
ThreadPoolExecutor
is good enough for illustrative purposes.
Since we’ve defined the priors on the log domain, we also use a model accepting logtransformed parameters. This model is defined as tumor2d.log_model
. It is a thin wrapper around the tumor2d.simulate
function and transforms the parameters accordingly before executing the simulate function.
The tumor2d package also provides a distance function and default data. These data were obtained from 100 samples from the so called “reference parameters”, which we also used above to simulate one growing spheroid. We takte the mean from these 100 simulations as the observed data. The distance function calculates \(L_2\) norms for each of the three summary statistics and adds them up. (These \(L_2\) norms are normalized by the pointwise variances.)
[4]:
from tumor2d import log_model, distance, load_default
from pyabc import ABCSMC
from pyabc.sampler import ConcurrentFutureSampler
from concurrent.futures import ThreadPoolExecutor
data_mean = load_default()[1] # (raw, mean, var)
pool = ThreadPoolExecutor(max_workers=2)
sampler = ConcurrentFutureSampler(pool)
abc = ABCSMC(log_model, prior, distance,
population_size=3,
sampler=sampler)
As usually, we initialize a new ABC inference run with the new
method.
[5]:
abc.new("sqlite:////tmp/test.db", data_mean)
INFO:Epsilon:initial epsilon is 22450777.905396923
INFO:History:Start <ABCSMC(id=6, start_time=20180411 08:43:27.549224, end_time=None)>
[5]:
6
Since the inference would run very long on a single local machine, we sample only a single population for illustration purposes. Note, that we’ve also set the number of particles to a very small number. This is much too low for a real inference run, but serves here for illustration purposes and saves sampling time.
[6]:
history = abc.run(max_nr_populations=1, minimum_epsilon=0)
INFO:ABC:t:0 eps:22450777.905396923
INFO:History:Done <ABCSMC(id=6, start_time=20180411 08:43:27.549224, end_time=20180411 08:46:15.690193)>
We visualize the first populatoin with pyABC’s plot_kde_matrix
function.
[7]:
from pyabc.visualization import plot_kde_matrix
df, w = history.get_distribution(m=0)
plot_kde_matrix(df, w, limits=limits);
We see kernel density estimates on the lower diagonal and scatter plots of the weighted particles on the upper diagonal. Note that the weight of the particles is not encoded in the scatter plots. On the diagonal we see the marginal distributions.
Results of a stored, distributedly executed pyABC inference run¶
We’ve run the inference described above in a distributed fashion on one of our clusters on about 800 cores simultaneously. The stored data is provided as part of the tumor2d package. (Note, that the summary statistics were actually also stored in the database. However, to not to distribute a 200MB database, we deleted them afterwards. So they are not contained in the databases distributed with the tumor2d package. If you’re interested in the full database, don’t hesitate to contact us)
We load the stored database and visualize generation 5.
[8]:
from pyabc import History
from tumor2d import stored_data_db
h_loaded = History("sqlite:///" + stored_data_db)
df, w = h_loaded.get_distribution(m=0, t=5)
plot_kde_matrix(df, w, limits=limits);
The posterior is still quite broad. At the last generation the posterior is notably sharper:
[9]:
df, w = h_loaded.get_distribution(m=0)
plot_kde_matrix(df, w, limits=limits);
Adaptive population sizes¶
For this run, we employed one of pyABC’s unique features: Adaptive population sizes. We initialized with population size 500. pyABC tuned the necessary population size automatically.
[10]:
populations = h_loaded.get_all_populations()
(populations[populations.t >= 0]
.plot("t", "particles", marker="o"));
In this example, the population size was roughly constant, however, this is not always the case.
The full executio of this notebook takes a little due to the tumor growth simulations performed:
[11]:
print(f"Execution time: {(time()  start_time)/60:.1f}m")
Execution time: 8.8m
The complete stored inference¶
We also compiled an animted gif from the complete course of the generations. Observe how the posterior slowly contracts:
Stochastic Differential Equation: Ion channel noise in HodgkinHuxley neurons¶
Download this notebook here: Stochastic Differential Equation: Ion channel noise in HodgkinHuxley neurons
In the following, we estimate parameters of the stochastic differential equation model of ion channel noise in HodgkinHuxley neurons presented in:
Goldwyn, Joshua H., Nikita S. Imennov, Michael Famulare, and Eric SheaBrown. “Stochastic Differential Equation Models for Ion Channel Noise in HodgkinHuxley Neurons.” Physical Review E 83, no. 4 (2011): 041908. doi:10.1103/PhysRevE.83.041908.
The code was implemented in Fortran 95 and made available in ModelDB: ModelDB.
(The code is not included in pyABC and neither developed nor maintained by the pyABC developers.)
Download and compilation of the Fortran model¶
We start by downloading the code from ModelDB. For this, the requests
package is needed.
[1]:
import requests
URL = ("https://senselab.med.yale.edu/modeldb/"
"eavBinDown.cshtml?o=128502&a=23&mime=application/zip")
req = requests.request("GET", URL)
The zip file to which URL
points is stored in memory. The code is then extracted into a temporary directory and compiled using make HH_run
provided as part of the download from ModelDB. The Fortran compiler gfortran
is required for compilation.
[2]:
import os
from zipfile import ZipFile
import subprocess
import tempfile
from io import BytesIO
tempdir = tempfile.mkdtemp()
archive = ZipFile(BytesIO(req.content))
archive.extractall(tempdir)
ret = subprocess.run(
["make", "HH_run"],
cwd=os.path.join(tempdir, "ModelDBFolder"))
EXEC = os.path.join(tempdir, "ModelDBFolder", "HH_run")
print(f"The executable location is {EXEC}")
The executable location is /tmp/tmpujmwnetn/ModelDBFolder/HH_run
The variable EXEC
points to the executable.
A simulate function is defined which uses the subprocess.run
function to execute the external binary. The external binary writes to stdout. The output is captured and stored in a pandas dataframe. This dataframe is returned by the simulate
function.
[3]:
import pandas as pd
import numpy as np
def simulate(model=2, membrane_dim=10, time_steps=1e4,
time_step_size=0.01, isi=100, dc=20, noise=0,
sine_amplitude=0, sine_frequency=0,
voltage_clamp=0, data_to_print=1, rng_seed=None):
"""
Simulate the SDE Ion Channel Model defined
in an external Fortran simulator.
Returns: pandas.DataFrame
Index: t, Time
Columns: V, Na, K
V: Voltage
Na, K: Proportion of open channels
"""
if rng_seed is None:
rng_seed = np.random.randint(2**322) + 1
membrane_area = membrane_dim**2
re = subprocess.run(
[EXEC, str(model),
# the binary cannot very long floats
f"{membrane_area:.5f}", str(time_steps),
str(time_step_size), str(isi), f"{dc:.5f}",
str(noise), str(sine_amplitude), str(sine_frequency),
str(voltage_clamp), str(data_to_print),
str(rng_seed)],
stdout=subprocess.PIPE)
df = pd.read_table(BytesIO(re.stdout),
delim_whitespace=True,
header=None, index_col=0,
names=["t", "V", "Na", "K"])
return df
Generating the observed data¶
We run a simulations and plot the fraction of open “K” channels and open “Na” channels:
[4]:
import matplotlib.pyplot as plt
%matplotlib inline
gt = {"dc": 20, "membrane_dim": 10}
fig, axes = plt.subplots(nrows=2, sharex=True)
fig.set_size_inches((12,8))
for _ in range(10):
observation = simulate(**gt)
observation.plot(y="K", color="C1", ax=axes[0]);
observation.plot(y="Na", color="C0", ax=axes[1]);
for ax in axes:
ax.legend().set_visible(False)
axes[0].set_title("K")
axes[0].set_ylabel("K")
axes[1].set_title("Na")
axes[1].set_ylabel("Na");
We observe how the channels open and close and also that the individual trajectores differ from realization to realization, even though we simulate for the exact same parameter set. We take the last simulation as observed data.
Defining distance and prior¶
We’ll now demonstrate how to use pyABC to estimate parameters of the model. Here, we’ll focus on the dc
and the membrane_dim
parameters. The dc
parameter describes the input current, the membrane_dim
is the square root of the membrane area. We choose uniform priors:
[5]:
from pyabc import Distribution, RV, ABCSMC
dcmin, dcmax = 2, 30
memmin, memmax = 1, 12
prior = Distribution(
dc=RV("uniform", dcmin, dcmax  dcmin),
membrane_dim=RV("uniform", memmin, memmax  memmin))
The distance function is defined as \(L_2\) norm between the fractions of open “K” channels.
[6]:
def distance(x, y):
diff = x["data"]["K"]  y["data"]["K"]
dist = np.sqrt(np.sum(diff**2))
return dist
We also define a small simulate_pyabc
wrapper function, which wraps the simulate
function. This is needed to comply with the interface expected by pyABC
.
[7]:
def simulate_pyabc(parameter):
res = simulate(**parameter)
return {"data": res}
Performing parameter inference with pyABC¶
We are now ready to start pyABC. As usually, we first initialize the ABCSMC object, then pass the observed data and the database location in which to store the logged parameters and summary statistics, and finally run the inference until the maximum number of allowed populations max_nr_populations
or the final acceptance threshold minimum_epsilon
is reached.
[8]:
abc = ABCSMC(simulate_pyabc, prior, distance,
population_size=35)
abc_id = abc.new("sqlite:///"
+ os.path.join(tempdir, "test.db"),
{"data": observation})
history = abc.run(max_nr_populations=10, minimum_epsilon=6)
INFO:Epsilon:initial epsilon is 12.418323389587156
INFO:History:Start <ABCSMC(id=1, start_time=20170920 11:42:03.976982, end_time=None)>
INFO:ABC:t:0 eps:12.4183233896
INFO:ABC:t:1 eps:11.537720558193579
INFO:ABC:t:2 eps:10.331234246232354
INFO:ABC:t:3 eps:9.138290891195336
INFO:ABC:t:4 eps:8.27383912256212
INFO:ABC:t:5 eps:7.693082673497323
INFO:ABC:t:6 eps:6.781590876632613
INFO:ABC:t:7 eps:6.073019960463141
INFO:ABC:t:8 eps:5.570268139592767
INFO:History:Done <ABCSMC(id=1, start_time=20170920 11:42:03.976982, end_time=20170920 12:37:07.851393)>
Visualization of the estimated parameters¶
We plot the posterior distribution after a few generations together with the parameters generating the observed data (the dotted line and the orange dot).
[9]:
from pyabc.visualization import plot_kde_matrix
dfw = history.get_distribution(m=0)
grid = plot_kde_matrix(*dfw,
limits={"dc": (dcmin, dcmax),
"membrane_dim": (memmin, memmax)})
grid.map_diag(lambda x, **kwargs: plt.gca().axvline(
gt[x.name], color="k", linestyle="dotted"))
grid.map_lower(lambda x, y, **kwargs: plt.gca().scatter(
[gt[x.name]], [gt[y.name]], color="orange"))
plt.gcf().set_size_inches(8, 8)
The dc
parameter is very well detected. (Don’t get confused by the yaxis. It applies to the scatterplot, not to the marginal distribution.) The distribution of membrane_dim
is broader. (Note that even the exact posterior is not necessarily peaked at the ground truth parameters).
Evaluation of the fit¶
We compare four types of data:
samples from the prior distribution,
samples from the posterior distribution,
the data generated by the accepted parameters, and
the observation.
[10]:
from pyabc.transition import MultivariateNormalTransition
fig, axes = plt.subplots(nrows=3, sharex=True)
fig.set_size_inches(8, 12)
n = 5 # Number of samples to plot from each category
# Plot samples from the prior
alpha = .5
for _ in range(n):
prior_sample = simulate(**prior.rvs())
prior_sample.plot(y="K", ax=axes[0],
color="C1", alpha=alpha)
# Fit a posterior KDE and plot samples form it
posterior = MultivariateNormalTransition()
posterior.fit(*history.get_distribution(m=0))
for _ in range(n):
posterior_sample = simulate(**posterior.rvs())
posterior_sample.plot(y="K", ax=axes[1],
color="C0", alpha=alpha)
# Plot the stored summary statistics
sum_stats = history.get_weighted_sum_stats_for_model(m=0, t=history.max_t)
for stored in sum_stats[1][:n]:
stored["data"].plot(y="K", ax=axes[2],
color="C2", alpha=alpha)
# Plot the observation
for ax in axes:
observation.plot(y="K", ax=ax, color="k", linewidth=1.5)
ax.legend().set_visible(False)
ax.set_ylabel("K");
# Add a legend with pseudo artists to first plot
axes[0].legend([plt.plot([0], color="C1")[0],
plt.plot([0], color="C0")[0],
plt.plot([0], color="C2")[0],
plt.plot([0], color="k")[0]],
["Prior", "Posterior",
"Stored, accepted", "Observation"],
bbox_to_anchor=(.5, 1),
loc="lower center",
ncol=4);
We observe that the samples from the prior exhibit the largest variation and do not resemble the observation well. The samples from the posterior are much closer to the observed data. Even a little bit closer are the samples generated by the accepted parameters. This has at least two reasons: First, the posterior KDEfit smoothes the particle populations. Second, the sample generated by a parameter that was accepted is biased towards being more similar to the observed data as compared to a random sample from that parameter.
Adaptive Distances¶
In this example, we show how and when to use the adaptive distances feature of pyabc. “Adaptive distances” means that the distance function is not predefined (e.g. after preprocessing), but evolves over time during the ABC run, depending on the observed summary statistics. This can be useful if different summary statistics vary on different scales, but it is not immediately clear how to weight them. For this case, in adaptive distances weights are adjusted in each iteration so as to balance the impact of all summary statistics on the computed distance.
Currently, adaptively weighted pnorm distances (e.g. Euclidean) are implemented in pyABC, but it is easily possible to define arbitrary adaptive distances.
The notebook can be downloaded here
.
For illustration, we consider a simple Gaussian model:
[1]:
import numpy as np
import matplotlib.pyplot as pyplot
import pyabc.visualization
import logging
# for debugging
df_logger = logging.getLogger('Distance')
df_logger.setLevel(logging.DEBUG)
# model definition
def model(p):
return {'ss1': p['theta'] + 1 + 0.1 * np.random.normal(),
'ss2': 2 + 10 * np.random.normal()}
# true model parameter
theta_true = 3
# observed summary statistics
observation = {'ss1': theta_true + 1, 'ss2': 2}
# prior distribution
prior = pyabc.Distribution(theta=pyabc.RV('uniform', 0, 10))
# database
db_path = pyabc.create_sqlite_db_id(file_="adaptive_distance.db")
Summary statistic ss2 has a high variance compared to summary statistic ss1. In addition, ss1 is informative about the model parameters \(\theta\), ss2 not. We expect that the proposal distribution for \(\theta\) iteratively centers around the true value \(\theta=3\). Thus, the variability for the sampled ss1 decreases iteratively, while the variability of the sampled ss2 stays approximately constant. If both summary statistics are weighted similarly in the calculation of the distance between sample and observation, there is hence an undesirable high impact of ss2, so that convergence can be slowed down. In contrast, if we weight ss1 higher, we may hope that our estimation of \(\theta\) is improved.
These informal expectations being stated, let us continue with the implementation. First, we consider a nonadaptive Euclidean distance:
[2]:
distance = pyabc.PNormDistance(p=2)
abc = pyabc.ABCSMC(model, prior, distance)
abc.new(db_path, observation)
history0 = abc.run(minimum_epsilon=.1, max_nr_populations=8)
INFO:Sampler:Parallelizing the sampling on 4 cores.
INFO:History:Start <ABCSMC(id=1, start_time=20200517 19:04:50.510578, end_time=None)>
INFO:ABC:Calibration sample before t=0.
INFO:Epsilon:initial epsilon is 7.8880029032336605
INFO:ABC:t: 0, eps: 7.8880029032336605.
INFO:ABC:Acceptance rate: 100 / 179 = 5.5866e01, ESS=1.0000e+02.
INFO:ABC:t: 1, eps: 5.523085306293169.
INFO:ABC:Acceptance rate: 100 / 278 = 3.5971e01, ESS=9.5810e+01.
INFO:ABC:t: 2, eps: 3.604292240991575.
INFO:ABC:Acceptance rate: 100 / 446 = 2.2422e01, ESS=9.3157e+01.
INFO:ABC:t: 3, eps: 2.6280179368021246.
INFO:ABC:Acceptance rate: 100 / 624 = 1.6026e01, ESS=9.7895e+01.
INFO:ABC:t: 4, eps: 1.9526078076523994.
INFO:ABC:Acceptance rate: 100 / 866 = 1.1547e01, ESS=9.9520e+01.
INFO:ABC:t: 5, eps: 1.421630356209448.
INFO:ABC:Acceptance rate: 100 / 1211 = 8.2576e02, ESS=9.8095e+01.
INFO:ABC:t: 6, eps: 1.0264476750481035.
INFO:ABC:Acceptance rate: 100 / 2163 = 4.6232e02, ESS=9.9592e+01.
INFO:ABC:t: 7, eps: 0.6954601270195314.
INFO:ABC:Acceptance rate: 100 / 2695 = 3.7106e02, ESS=9.6657e+01.
INFO:History:Done <ABCSMC(id=1, start_time=20200517 19:04:50.510578, end_time=20200517 19:05:02.773526)>
Let us visualize the results for the nonadaptive distance:
[3]:
# plotting
def plot_history(history):
fig, ax = pyplot.subplots()
for t in range(history.max_t + 1):
df, w = history.get_distribution(m=0, t=t)
pyabc.visualization.plot_kde_1d(df, w, xmin=0, xmax=10,
x='theta', ax=ax,
label="PDF t={}".format(t))
ax.axvline(theta_true, color='k', linestyle='dashed', label="True value")
ax.legend()
plot_history(history0)
Second, we consider an adaptive Euclidean distance:
[4]:
distance_adaptive = pyabc.AdaptivePNormDistance(p=2)
abc = pyabc.ABCSMC(
model, prior, distance_adaptive,
acceptor = pyabc.UniformAcceptor(use_complete_history=True))
abc.new(db_path, observation)
history1 = abc.run(minimum_epsilon=.1, max_nr_populations=8)
INFO:Sampler:Parallelizing the sampling on 4 cores.
INFO:History:Start <ABCSMC(id=2, start_time=20200517 19:05:03.383392, end_time=None)>
INFO:ABC:Calibration sample before t=0.
DEBUG:Distance:updated weights[0] = {'ss1': 1.56452520342288, 'ss2': 0.4354747965771201}
INFO:Epsilon:initial epsilon is 6.333984429500564
INFO:ABC:t: 0, eps: 6.333984429500564.
INFO:ABC:Acceptance rate: 100 / 206 = 4.8544e01, ESS=1.0000e+02.
DEBUG:Distance:updated weights[1] = {'ss1': 1.544531889838737, 'ss2': 0.455468110161263}
INFO:ABC:t: 1, eps: 4.054424144457223.
INFO:ABC:Acceptance rate: 100 / 236 = 4.2373e01, ESS=9.8630e+01.
DEBUG:Distance:updated weights[2] = {'ss1': 1.7056562135730826, 'ss2': 0.2943437864269172}
INFO:ABC:t: 2, eps: 2.2603152932058155.
INFO:ABC:Acceptance rate: 100 / 350 = 2.8571e01, ESS=9.9408e+01.
DEBUG:Distance:updated weights[3] = {'ss1': 1.7667287419172013, 'ss2': 0.23327125808279867}
INFO:ABC:t: 3, eps: 1.41871138633234.
INFO:ABC:Acceptance rate: 100 / 363 = 2.7548e01, ESS=9.4863e+01.
DEBUG:Distance:updated weights[4] = {'ss1': 1.8653024503845181, 'ss2': 0.1346975496154818}
INFO:ABC:t: 4, eps: 0.7834754377542915.
INFO:ABC:Acceptance rate: 100 / 421 = 2.3753e01, ESS=9.9527e+01.
DEBUG:Distance:updated weights[5] = {'ss1': 1.9117344325305943, 'ss2': 0.08826556746940574}
INFO:ABC:t: 5, eps: 0.46930212392005605.
INFO:ABC:Acceptance rate: 100 / 484 = 2.0661e01, ESS=9.6169e+01.
DEBUG:Distance:updated weights[6] = {'ss1': 1.9513043703561577, 'ss2': 0.048695629643842143}
INFO:ABC:t: 6, eps: 0.2532391266895982.
INFO:ABC:Acceptance rate: 100 / 729 = 1.3717e01, ESS=9.4790e+01.
DEBUG:Distance:updated weights[7] = {'ss1': 1.9619806450885102, 'ss2': 0.03801935491148991}
INFO:ABC:t: 7, eps: 0.17029769703549383.
INFO:ABC:Acceptance rate: 100 / 1001 = 9.9900e02, ESS=8.8512e+01.
DEBUG:Distance:updated weights[8] = {'ss1': 1.9691416799754418, 'ss2': 0.03085832002455836}
INFO:History:Done <ABCSMC(id=2, start_time=20200517 19:05:03.383392, end_time=20200517 19:05:09.456919)>
In the debug output of abc.run above, it can be seen how the weights evolve over time. Note that we set the acceptor to use_complete_history=True
in order to get nested acceptance regions. It means that also all previous acceptance criteria are reapplied. This is optional here but may be beneficial sometimes. Let us visualize the results for the adaptive distance:
[5]:
plot_history(history1)
We observe differences compared to the nonadaptive setting. In particular, the densitities tend to be narrower around the true parameter \(\theta=3\). In addition, despite the better convergence, the required number of samples in total is lower, as not so much time was wasted trying to match an uninformative summary statistic:
[6]:
histories = [history0, history1]
labels = ["Fixed distance", "Adaptive distance"]
pyabc.visualization.plot_sample_numbers(histories, labels)
[6]:
<matplotlib.axes._subplots.AxesSubplot at 0x7fa57dec8c90>
Indeed, the acceptance rates for the adaptive distance function are continually higher:
[7]:
pyabc.visualization.plot_acceptance_rates_trajectory(histories, labels)
[7]:
<matplotlib.axes._subplots.AxesSubplot at 0x7fa57b6805d0>
In detail, the adaptive distance feature works as follows: In each iteration of the ABCSMC run, after having obtained the desired number of accepted particles (and once at the beginning using a sample from the prior), the method DistanceFunction.update()
is called. It is given a set of summary statistics which can be used to e.g. compute weights for the distance measure in the next iteration. In order to avoid bias, via DistanceFunction.configure_sampler()
, the distance function can tell
the sampler to not only record accepted particles, but all that were generated during the sampling process. So, when you want to define your own adaptive distance function, you will typically only need to overwrite these two methods. For implementation details and an example of how this can look in practice, please inspect the code of AdaptivePNormDistance
.
Make it robust¶
A problem with the previous adaptive distance occurs when the weights do not work properly. For instance, there could be large weights assigned to rather noninformative statistics, if these vary comparably little. Let us have a look at an example:
[8]:
import numpy as np
import tempfile
import os
import matplotlib.pyplot as pyplot
import pyabc.visualization
import logging
# for debugging
df_logger = logging.getLogger('Distance')
df_logger.setLevel(logging.DEBUG)
# model definition
def model(p):
return {'ss1': p['theta'] + 1 + 1 * np.random.normal(),
'ss2': 2 + 0.01 * np.random.normal()}
# true model parameter
theta_true = 3
# observed summary statistics
observation = {'ss1': theta_true + 1, 'ss2': 5}
# prior distribution
prior = pyabc.Distribution(theta=pyabc.RV('uniform', 0, 10))
# database
db_path = pyabc.create_sqlite_db_id(file_="adaptive_distance.db")
# plotting
def plot_history(history):
fig, ax = pyplot.subplots()
for t in range(history.max_t + 1):
df, w = history.get_distribution(m=0, t=t)
pyabc.visualization.plot_kde_1d(df, w, xmin=0, xmax=10,
x='theta', ax=ax,
label="PDF t={}".format(t))
ax.axvline(theta_true, color='k', linestyle='dashed', label="True value")
ax.legend()
Essentially, we changed the variances of the two summary statistics, and in addition we shifted the observed value ss2
to 5, which can be interpreted as a measurement error. Now the problem is that this value is highly unlikely to sample from under the model, and when using adaptive weights, this statistic might due to the small variance be assigned a high weight, worsening the problem.
[9]:
distance = pyabc.PNormDistance(p=2)
abc = pyabc.ABCSMC(model, prior, distance)
abc.new(db_path, observation)
history0 = abc.run(minimum_epsilon=.001, max_nr_populations=5)
plot_history(history0)
INFO:Sampler:Parallelizing the sampling on 4 cores.
INFO:History:Start <ABCSMC(id=3, start_time=20200517 19:05:10.866128, end_time=None)>
INFO:ABC:Calibration sample before t=0.
INFO:Epsilon:initial epsilon is 3.706402296069415
INFO:ABC:t: 0, eps: 3.706402296069415.
INFO:ABC:Acceptance rate: 100 / 212 = 4.7170e01, ESS=1.0000e+02.
INFO:ABC:t: 1, eps: 3.15157877623159.
INFO:ABC:Acceptance rate: 100 / 277 = 3.6101e01, ESS=8.8141e+01.
INFO:ABC:t: 2, eps: 3.0302449490062813.
INFO:ABC:Acceptance rate: 100 / 591 = 1.6920e01, ESS=8.6973e+01.
INFO:ABC:t: 3, eps: 3.0073861196608798.
INFO:ABC:Acceptance rate: 100 / 949 = 1.0537e01, ESS=4.7024e+01.
INFO:ABC:t: 4, eps: 2.99728988999716.
INFO:ABC:Acceptance rate: 100 / 2643 = 3.7836e02, ESS=9.4997e+01.
INFO:History:Done <ABCSMC(id=3, start_time=20200517 19:05:10.866128, end_time=20200517 19:05:15.525464)>
[10]:
distance_adaptive = pyabc.AdaptivePNormDistance(p=2)
abc = pyabc.ABCSMC(
model, prior, distance_adaptive,
acceptor = pyabc.UniformAcceptor(use_complete_history=True))
abc.new(db_path, observation)
history1 = abc.run(minimum_epsilon=.001, max_nr_populations=5)
plot_history(history1)
INFO:Sampler:Parallelizing the sampling on 4 cores.
INFO:History:Start <ABCSMC(id=4, start_time=20200517 19:05:15.911001, end_time=None)>
INFO:ABC:Calibration sample before t=0.
DEBUG:Distance:updated weights[0] = {'ss1': 0.006881759937664396, 'ss2': 1.9931182400623355}
INFO:Epsilon:initial epsilon is 5.978341674392063
INFO:ABC:t: 0, eps: 5.978341674392063.
INFO:ABC:Acceptance rate: 100 / 217 = 4.6083e01, ESS=1.0000e+02.
DEBUG:Distance:updated weights[1] = {'ss1': 0.0070948287132201185, 'ss2': 1.9929051712867798}
INFO:ABC:t: 1, eps: 5.96192003320892.
INFO:ABC:Acceptance rate: 100 / 453 = 2.2075e01, ESS=9.8071e+01.
DEBUG:Distance:updated weights[2] = {'ss1': 0.006846740182629466, 'ss2': 1.9931532598173705}
INFO:ABC:t: 2, eps: 5.953640497054222.
INFO:ABC:Acceptance rate: 100 / 1116 = 8.9606e02, ESS=8.8886e+01.
DEBUG:Distance:updated weights[3] = {'ss1': 0.0078039047055010686, 'ss2': 1.9921960952944988}
INFO:ABC:t: 3, eps: 5.944041103889234.
INFO:ABC:Acceptance rate: 100 / 1675 = 5.9701e02, ESS=9.4860e+01.
DEBUG:Distance:updated weights[4] = {'ss1': 0.007194747743678035, 'ss2': 1.992805252256322}
INFO:ABC:t: 4, eps: 5.937966249548398.
INFO:ABC:Acceptance rate: 100 / 4048 = 2.4704e02, ESS=9.2597e+01.
DEBUG:Distance:updated weights[5] = {'ss1': 0.007245644674039862, 'ss2': 1.9927543553259601}
INFO:History:Done <ABCSMC(id=4, start_time=20200517 19:05:15.911001, end_time=20200517 19:05:24.904768)>
These results are as expected: The adaptive weights make the situation much worse. Our solution is to in addition to the insample variance also take the bias of the samples to the observed data into account, using e.g. the root_mean_square_deviation
as scale function.
[11]:
distance_adaptive = pyabc.AdaptivePNormDistance(
p=2, scale_function=pyabc.distance.root_mean_square_deviation)
abc = pyabc.ABCSMC(
model, prior, distance_adaptive,
acceptor = pyabc.UniformAcceptor(use_complete_history=True))
abc.new(db_path, observation)
history2 = abc.run(minimum_epsilon=.001, max_nr_populations=5)
plot_history(history2)
INFO:Sampler:Parallelizing the sampling on 4 cores.
INFO:History:Start <ABCSMC(id=5, start_time=20200517 19:05:25.289967, end_time=None)>
INFO:ABC:Calibration sample before t=0.
DEBUG:Distance:updated weights[0] = {'ss1': 0.8693530881715638, 'ss2': 1.1306469118284361}
INFO:Epsilon:initial epsilon is 4.1598448905527015
INFO:ABC:t: 0, eps: 4.1598448905527015.
INFO:ABC:Acceptance rate: 100 / 191 = 5.2356e01, ESS=1.0000e+02.
DEBUG:Distance:updated weights[1] = {'ss1': 0.918230325976682, 'ss2': 1.0817696740233183}
INFO:ABC:t: 1, eps: 3.4786660223500068.
INFO:ABC:Acceptance rate: 100 / 217 = 4.6083e01, ESS=9.6361e+01.
DEBUG:Distance:updated weights[2] = {'ss1': 1.185968550423523, 'ss2': 0.8140314495764771}
INFO:ABC:t: 2, eps: 2.6024103755968793.
INFO:ABC:Acceptance rate: 100 / 315 = 3.1746e01, ESS=6.4854e+01.
DEBUG:Distance:updated weights[3] = {'ss1': 1.2741754842631852, 'ss2': 0.7258245157368146}
INFO:ABC:t: 3, eps: 2.226845050651682.
INFO:ABC:Acceptance rate: 100 / 589 = 1.6978e01, ESS=9.3687e+01.
DEBUG:Distance:updated weights[4] = {'ss1': 1.2944987868600584, 'ss2': 0.7055012131399416}
INFO:ABC:t: 4, eps: 2.134101801007238.
INFO:ABC:Acceptance rate: 100 / 812 = 1.2315e01, ESS=7.5661e+01.
DEBUG:Distance:updated weights[5] = {'ss1': 1.3615100631960027, 'ss2': 0.6384899368039972}
INFO:History:Done <ABCSMC(id=5, start_time=20200517 19:05:25.289967, end_time=20200517 19:05:29.055446)>
In this setting, the accuracy and sample numbers (see below) are rougly back to the nonweighted case. Applying this method to the first model shows that it is also applicable there, though potentially slightly less efficient. This demonstrates that this method is more robust in taking model error into account, which in practice can easily occur.
[12]:
pyabc.visualization.plot_sample_numbers([history0, history1, history2], ["Fixed distance", "Adaptive", "Robust adaptive"])
[12]:
<matplotlib.axes._subplots.AxesSubplot at 0x7fa57a5a5fd0>
Weight diverse data with replicates¶
The problem we consider now is that we can separate our summary statistics (=data) into subsets which are informative for different parameters, but the sizes of these subsets are not balanced. This happens frequently in practice, e.g. when one has a time series of 100 measurements, compared to a single data point. If these two statistics are informative for different parameters, then the impact of the first kind on the computed distance value can be overly high, and in addition the automatic weighting as introduced by Prangle 2017 may not reduce, or can indeed even worsen, the problem.
Thus, what we want to do is add additional “factors” by which we multiply each data point’s weight. These factors take into account the number of summary statistics that are equally informative. At the moment, these factors still need to be defined manually.
Let us consider a toy model, where we just copy a single summary statistic N0=100 times. Note that this model is highly artificical, as in practise there might be more information contained in e.g. a time series than in a single measurement, but not N0 times as much, such that a good factor would be somewhere between 1 and N0.
[13]:
import pyabc
import numpy as np
import os
import tempfile
import matplotlib.pyplot as plt
import logging
# for debugging
df_logger = logging.getLogger('Distance')
df_logger.setLevel(logging.INFO)
N0 = 100
N1 = 1
p_true = {'p0': 5, 'p1': 10}
def model(p):
ss = {}
s0 = p['p0'] + 2 * np.random.normal()
for j in range(N0):
ss['p0_' + str(j)] = s0
s1 = p['p1'] + 0.01 * np.random.normal()
for j in range(N1):
ss['p1_' + str(j)] = s1
return ss
prior = pyabc.Distribution(p0=pyabc.RV("uniform", 0, 20),
p1=pyabc.RV("uniform", 0, 20))
First, we consider uniform weights of 1:
[14]:
distance = pyabc.PNormDistance(p=1)
abc = pyabc.ABCSMC(model, prior, distance)
observation = model(p_true)
db_path = pyabc.create_sqlite_db_id(file_="adaptive_distance.db")
abc.new(db_path, observation)
history1 = abc.run(max_nr_populations=8)
INFO:Sampler:Parallelizing the sampling on 4 cores.
INFO:History:Start <ABCSMC(id=6, start_time=20200517 19:05:29.823671, end_time=None)>
INFO:ABC:Calibration sample before t=0.
INFO:Epsilon:initial epsilon is 800.0444048639816
INFO:ABC:t: 0, eps: 800.0444048639816.
INFO:ABC:Acceptance rate: 100 / 189 = 5.2910e01, ESS=1.0000e+02.
INFO:ABC:t: 1, eps: 268.2464180029644.
INFO:ABC:Acceptance rate: 100 / 276 = 3.6232e01, ESS=8.1118e+01.
INFO:ABC:t: 2, eps: 137.22044383082294.
INFO:ABC:Acceptance rate: 100 / 313 = 3.1949e01, ESS=8.6443e+01.
INFO:ABC:t: 3, eps: 57.29736590464114.
INFO:ABC:Acceptance rate: 100 / 745 = 1.3423e01, ESS=9.1310e+01.
INFO:ABC:t: 4, eps: 36.65914919005845.
INFO:ABC:Acceptance rate: 100 / 1027 = 9.7371e02, ESS=8.5098e+01.
INFO:ABC:t: 5, eps: 22.66796750562429.
INFO:ABC:Acceptance rate: 100 / 1613 = 6.1996e02, ESS=8.1483e+01.
INFO:ABC:t: 6, eps: 13.607513908634385.
INFO:ABC:Acceptance rate: 100 / 3279 = 3.0497e02, ESS=7.4191e+01.
INFO:ABC:t: 7, eps: 9.210802088377672.
INFO:ABC:Acceptance rate: 100 / 5807 = 1.7221e02, ESS=6.8810e+01.
INFO:History:Done <ABCSMC(id=6, start_time=20200517 19:05:29.823671, end_time=20200517 19:06:18.577467)>
[15]:
# plotting
def plot_history(history):
fig, ax = plt.subplots()
for t in range(history.max_t + 1):
df, w = history.get_distribution(m=0, t=t)
pyabc.visualization.plot_kde_1d(df, w, xmin=0, xmax=20,
x='p0', ax=ax,
label="PDF t={}".format(t),
refval=p_true)
ax.legend()
fig, ax = plt.subplots()
for t in range(history.max_t + 1):
df, w = history.get_distribution(m=0, t=t)
pyabc.visualization.plot_kde_1d(df, w, xmin=0, xmax=20,
x='p1', ax=ax,
label="PDF t={}".format(t),
refval=p_true)
ax.legend()
plot_history(history1)
Next, we use adaptive distances but no factors:
[16]:
distance = pyabc.AdaptivePNormDistance(p=1, )#factors=factors)
abc = pyabc.ABCSMC(model, prior, distance)
db_path = pyabc.create_sqlite_db_id(file_="adaptive_distance.db")
abc.new(db_path, observation)
history2 = abc.run(max_nr_populations=8)
plot_history(history2)
INFO:Sampler:Parallelizing the sampling on 4 cores.
INFO:History:Start <ABCSMC(id=7, start_time=20200517 19:06:20.031098, end_time=None)>
INFO:ABC:Calibration sample before t=0.
INFO:Epsilon:initial epsilon is 804.2703630289134
INFO:ABC:t: 0, eps: 804.2703630289134.
INFO:ABC:Acceptance rate: 100 / 178 = 5.6180e01, ESS=1.0000e+02.
INFO:ABC:t: 1, eps: 358.6365746592437.
INFO:ABC:Acceptance rate: 100 / 206 = 4.8544e01, ESS=8.7650e+01.
INFO:ABC:t: 2, eps: 207.5158094913825.
INFO:ABC:Acceptance rate: 100 / 226 = 4.4248e01, ESS=8.8427e+01.
INFO:ABC:t: 3, eps: 86.67115523817868.
INFO:ABC:Acceptance rate: 100 / 411 = 2.4331e01, ESS=7.9603e+01.
INFO:ABC:t: 4, eps: 41.204337714577626.
INFO:ABC:Acceptance rate: 100 / 882 = 1.1338e01, ESS=7.7426e+01.
INFO:ABC:t: 5, eps: 19.12608852927998.
INFO:ABC:Acceptance rate: 100 / 2463 = 4.0601e02, ESS=9.2439e+01.
INFO:ABC:t: 6, eps: 11.040758226322744.
INFO:ABC:Acceptance rate: 100 / 3174 = 3.1506e02, ESS=4.0567e+01.
INFO:ABC:t: 7, eps: 6.9115476681531325.
INFO:ABC:Acceptance rate: 100 / 6178 = 1.6186e02, ESS=7.7286e+01.
INFO:History:Done <ABCSMC(id=7, start_time=20200517 19:06:20.031098, end_time=20200517 19:07:15.103555)>
Next, we account for the discrepancy in data point counts by using selfdefined scaling factors:
[17]:
factors = {}
for j in range(N0):
factors['p0_' + str(j)] = 1/N0
for j in range(N1):
factors['p1_' + str(j)] = 1/N1
distance = pyabc.PNormDistance(p=1, factors=factors)
abc = pyabc.ABCSMC(model, prior, distance)
db_path = pyabc.create_sqlite_db_id(file_="adaptive_distance.db")
abc.new(db_path, observation)
history3 = abc.run(max_nr_populations=8)
plot_history(history3)
INFO:Sampler:Parallelizing the sampling on 4 cores.
INFO:History:Start <ABCSMC(id=8, start_time=20200517 19:07:16.405688, end_time=None)>
INFO:ABC:Calibration sample before t=0.
INFO:Epsilon:initial epsilon is 12.22980361353965
INFO:ABC:t: 0, eps: 12.22980361353965.
INFO:ABC:Acceptance rate: 100 / 192 = 5.2083e01, ESS=1.0000e+02.
INFO:ABC:t: 1, eps: 8.600402453730833.
INFO:ABC:Acceptance rate: 100 / 201 = 4.9751e01, ESS=9.4792e+01.
INFO:ABC:t: 2, eps: 5.3932862874730185.
INFO:ABC:Acceptance rate: 100 / 256 = 3.9062e01, ESS=9.5373e+01.
INFO:ABC:t: 3, eps: 3.9473374176324265.
INFO:ABC:Acceptance rate: 100 / 261 = 3.8314e01, ESS=8.8531e+01.
INFO:ABC:t: 4, eps: 3.0014600650079553.
INFO:ABC:Acceptance rate: 100 / 290 = 3.4483e01, ESS=8.7498e+01.
INFO:ABC:t: 5, eps: 2.2334325618901922.
INFO:ABC:Acceptance rate: 100 / 343 = 2.9155e01, ESS=8.5720e+01.
INFO:ABC:t: 6, eps: 1.449495261301063.
INFO:ABC:Acceptance rate: 100 / 434 = 2.3041e01, ESS=8.2761e+01.
INFO:ABC:t: 7, eps: 1.0094444189437735.
INFO:ABC:Acceptance rate: 100 / 669 = 1.4948e01, ESS=7.9831e+01.
INFO:History:Done <ABCSMC(id=8, start_time=20200517 19:07:16.405688, end_time=20200517 19:08:01.548653)>
Next, we consider automatic weighting and factors:
[18]:
import logging
df_logger = logging.getLogger('Distance')
# df_logger.setLevel(logging.DEBUG)
factors = {}
for j in range(N0):
factors['p0_' + str(j)] = 1/N0
for j in range(N1):
factors['p1_' + str(j)] = 1/N1
distance = pyabc.AdaptivePNormDistance(p=1, factors=factors)
abc = pyabc.ABCSMC(model, prior, distance)
db_path = pyabc.create_sqlite_db_id(file_="adaptive_distance.db")
abc.new(db_path, observation)
history4 = abc.run(max_nr_populations=8)
plot_history(history4)
INFO:Sampler:Parallelizing the sampling on 4 cores.
INFO:History:Start <ABCSMC(id=9, start_time=20200517 19:08:04.018039, end_time=None)>
INFO:ABC:Calibration sample before t=0.
INFO:Epsilon:initial epsilon is 11.684884345125214
INFO:ABC:t: 0, eps: 11.684884345125214.
INFO:ABC:Acceptance rate: 100 / 237 = 4.2194e01, ESS=1.0000e+02.
INFO:ABC:t: 1, eps: 7.5145369751746465.
INFO:ABC:Acceptance rate: 100 / 224 = 4.4643e01, ESS=9.3767e+01.
INFO:ABC:t: 2, eps: 5.082483209496991.
INFO:ABC:Acceptance rate: 100 / 239 = 4.1841e01, ESS=9.7918e+01.
INFO:ABC:t: 3, eps: 3.580477037132026.
INFO:ABC:Acceptance rate: 100 / 235 = 4.2553e01, ESS=8.0270e+01.
INFO:ABC:t: 4, eps: 2.8507886265808073.
INFO:ABC:Acceptance rate: 100 / 286 = 3.4965e01, ESS=7.1282e+01.
INFO:ABC:t: 5, eps: 2.089738549526154.
INFO:ABC:Acceptance rate: 100 / 346 = 2.8902e01, ESS=8.6558e+01.
INFO:ABC:t: 6, eps: 2.1546560321644264.
INFO:ABC:Acceptance rate: 100 / 373 = 2.6810e01, ESS=8.1666e+01.
INFO:ABC:t: 7, eps: 1.8136827815872332.
INFO:ABC:Acceptance rate: 100 / 417 = 2.3981e01, ESS=8.0980e+01.
INFO:History:Done <ABCSMC(id=9, start_time=20200517 19:08:04.018039, end_time=20200517 19:09:05.047786)>
The results for the distances that refactor are best, and in this case fixed and adaptive weights give similar results. In addition to the much better posteriors, the sample numbers are much lower, as the below plots show.
[19]:
histories = [history1, history2, history3, history4]
labels = ["Standard", "Adaptive", "Factors", "Adaptive + Factors"]
pyabc.visualization.plot_sample_numbers(histories, labels)
pyabc.visualization.plot_total_sample_numbers(histories, labels, yscale='log10')
pyabc.visualization.plot_effective_sample_sizes(histories, labels)
[19]:
<matplotlib.axes._subplots.AxesSubplot at 0x7fa57a2d9c90>
Aggregating and weighting diverse data¶
In this notebook, we illustrate the aggregation of various data, and how to combine that with an adaptive scheme of computing weights.
Aggregating diverse distance functions¶
We want to combine different distance metrics operating on subsets of the data to one distance value. As a toy model, assume we want to combine a Laplace and a Normal distance.
[1]:
import pyabc
import numpy as np
from scipy import stats
import os
import tempfile
import matplotlib.pyplot as plt
p_true = {'p0': 0, 'p1': 0}
def model(p):
return {'s0': p['p0'] + 0.1 * np.random.normal(),
's1': p['p1'] + 0.1 * np.random.normal()}
observation = {'s0': 0, 's1': 0}
def distance0(x, x_0):
return abs(x['s0']  x_0['s0'])
def distance1(x, x_0):
return (x['s1']  x_0['s1'])**2
# prior
prior = pyabc.Distribution(
p0=pyabc.RV("uniform", 1, 2),
p1=pyabc.RV("uniform", 1, 2))
The key is now to use pyabc.distance.AggregatedDistance
to combine both.
[2]:
distance = pyabc.AggregatedDistance([distance0, distance1])
abc = pyabc.ABCSMC(model, prior, distance)
db_path = "sqlite:///" + os.path.join(tempfile.gettempdir(), "tmp.db")
abc.new(db_path, observation)
history1 = abc.run(max_nr_populations=6)
# plotting
def plot_history(history):
fig, ax = plt.subplots()
for t in range(history.max_t + 1):
df, w = history.get_distribution(m=0, t=t)
pyabc.visualization.plot_kde_1d(df, w, xmin=1, xmax=1,
x='p0', ax=ax,
label="PDF t={}".format(t),
refval=p_true)
ax.legend()
fig, ax = plt.subplots()
for t in range(history.max_t + 1):
df, w = history.get_distribution(m=0, t=t)
pyabc.visualization.plot_kde_1d(df, w, xmin=1, xmax=1,
x='p1', ax=ax,
label="PDF t={}".format(t),
refval=p_true)
ax.legend()
plot_history(history1)
INFO:Sampler:Parallelizing the sampling on 4 cores.
INFO:History:Start <ABCSMC(id=1, start_time=20200517 19:07:50.881259, end_time=None)>
INFO:ABC:Calibration sample before t=0.
INFO:Epsilon:initial epsilon is 1.0385570782028162
INFO:ABC:t: 0, eps: 1.0385570782028162.
INFO:ABC:Acceptance rate: 100 / 161 = 6.2112e01, ESS=1.0000e+02.
INFO:ABC:t: 1, eps: 0.6266478571969578.
INFO:ABC:Acceptance rate: 100 / 227 = 4.4053e01, ESS=9.8797e+01.
INFO:ABC:t: 2, eps: 0.3797778758541551.
INFO:ABC:Acceptance rate: 100 / 263 = 3.8023e01, ESS=9.6432e+01.
INFO:ABC:t: 3, eps: 0.23271514846350852.
INFO:ABC:Acceptance rate: 100 / 249 = 4.0161e01, ESS=8.2414e+01.
INFO:ABC:t: 4, eps: 0.14867766193042023.
INFO:ABC:Acceptance rate: 100 / 277 = 3.6101e01, ESS=7.4941e+01.
INFO:ABC:t: 5, eps: 0.09375644596444883.
INFO:ABC:Acceptance rate: 100 / 402 = 2.4876e01, ESS=8.3770e+01.
INFO:History:Done <ABCSMC(id=1, start_time=20200517 19:07:50.881259, end_time=20200517 19:07:57.400829)>
Weighted aggregation¶
A problem with the previous aggregation of distance function is that usually they vary on different scales. In order to account for all in a similar manner, one thing one can do is to weight them.
Let us look at a simple example of two summary statistics which vary on very different scales:
[3]:
import pyabc
import numpy as np
from scipy import stats
import os
import tempfile
import matplotlib.pyplot as plt
p_true = {'p0': 0, 'p1': 0}
def model(p):
return {'s0': p['p0'] + 0.1 * np.random.normal(),
's1': p['p1'] + 100 * np.random.normal()}
observation = {'s0': 0, 's1': 0}
def distance0(x, x_0):
return abs(x['s0']  x_0['s0'])
def distance1(x, x_0):
return (x['s1']  x_0['s1'])**2
# prior
prior = pyabc.Distribution(
p0=pyabc.RV("uniform", 1, 2),
p1=pyabc.RV("uniform", 1, 2))
distance = pyabc.AggregatedDistance([distance0, distance1])
abc = pyabc.ABCSMC(model, prior, distance)
db_path = "sqlite:///" + os.path.join(tempfile.gettempdir(), "tmp.db")
abc.new(db_path, observation)
history1 = abc.run(max_nr_populations=6)
# plotting
def plot_history(history):
fig, ax = plt.subplots()
for t in range(history.max_t + 1):
df, w = history.get_distribution(m=0, t=t)
pyabc.visualization.plot_kde_1d(df, w, xmin=1, xmax=1,
x='p0', ax=ax,
label="PDF t={}".format(t),
refval=p_true)
ax.legend()
fig, ax = plt.subplots()
for t in range(history.max_t + 1):
df, w = history.get_distribution(m=0, t=t)
pyabc.visualization.plot_kde_1d(df, w, xmin=1, xmax=1,
x='p1', ax=ax,
label="PDF t={}".format(t),
refval=p_true)
ax.legend()
plot_history(history1)
INFO:Sampler:Parallelizing the sampling on 4 cores.
INFO:History:Start <ABCSMC(id=2, start_time=20200517 19:07:58.540981, end_time=None)>
INFO:ABC:Calibration sample before t=0.
INFO:Epsilon:initial epsilon is 4323.119783938766
INFO:ABC:t: 0, eps: 4323.119783938766.
INFO:ABC:Acceptance rate: 100 / 204 = 4.9020e01, ESS=1.0000e+02.
INFO:ABC:t: 1, eps: 983.687851784918.
INFO:ABC:Acceptance rate: 100 / 380 = 2.6316e01, ESS=8.6737e+01.
INFO:ABC:t: 2, eps: 212.01991735522245.
INFO:ABC:Acceptance rate: 100 / 970 = 1.0309e01, ESS=8.3643e+01.
INFO:ABC:t: 3, eps: 49.848612246498334.
INFO:ABC:Acceptance rate: 100 / 1660 = 6.0241e02, ESS=9.1908e+01.
INFO:ABC:t: 4, eps: 12.207788840598266.
INFO:ABC:Acceptance rate: 100 / 3263 = 3.0647e02, ESS=8.9227e+01.
INFO:ABC:t: 5, eps: 3.2591185551715833.
INFO:ABC:Acceptance rate: 100 / 7422 = 1.3473e02, ESS=9.0671e+01.
INFO:History:Done <ABCSMC(id=2, start_time=20200517 19:07:58.540981, end_time=20200517 19:08:20.718662)>
The algorithm has problems extracting information from the first summary statistic on the first parameter, because the second summary statistic is on a much larger scale. Let us use the pyabc.distance.AdaptiveAggregatedDistance
instead, which tries to find good weights itself (and even adapts these weights over time):
[4]:
# prior
prior = pyabc.Distribution(
p0=pyabc.RV("uniform", 1, 2),
p1=pyabc.RV("uniform", 1, 2))
distance = pyabc.AdaptiveAggregatedDistance([distance0, distance1])
abc = pyabc.ABCSMC(model, prior, distance)
db_path = "sqlite:///" + os.path.join(tempfile.gettempdir(), "tmp.db")
abc.new(db_path, observation)
history2 = abc.run(max_nr_populations=6)
plot_history(history2)
INFO:Sampler:Parallelizing the sampling on 4 cores.
INFO:History:Start <ABCSMC(id=3, start_time=20200517 19:08:22.020127, end_time=None)>
INFO:ABC:Calibration sample before t=0.
INFO:Epsilon:initial epsilon is 0.584370579542136
INFO:ABC:t: 0, eps: 0.584370579542136.
INFO:ABC:Acceptance rate: 100 / 194 = 5.1546e01, ESS=1.0000e+02.
INFO:ABC:t: 1, eps: 0.6947726110213078.
INFO:ABC:Acceptance rate: 100 / 237 = 4.2194e01, ESS=9.4883e+01.
INFO:ABC:t: 2, eps: 0.7810550392598525.
INFO:ABC:Acceptance rate: 100 / 297 = 3.3670e01, ESS=8.7866e+01.
INFO:ABC:t: 3, eps: 0.6964029572131732.
INFO:ABC:Acceptance rate: 100 / 340 = 2.9412e01, ESS=9.3900e+01.
INFO:ABC:t: 4, eps: 0.6563335926638891.
INFO:ABC:Acceptance rate: 100 / 561 = 1.7825e01, ESS=8.7756e+01.
INFO:ABC:t: 5, eps: 0.7396888829703054.
INFO:ABC:Acceptance rate: 100 / 881 = 1.1351e01, ESS=8.9301e+01.
INFO:History:Done <ABCSMC(id=3, start_time=20200517 19:08:22.020127, end_time=20200517 19:08:28.121491)>
The result is much better. We can also only initially calculate weights by setting adaptive=False
:
[5]:
# prior
prior = pyabc.Distribution(
p0=pyabc.RV("uniform", 1, 2),
p1=pyabc.RV("uniform", 1, 2))
distance = pyabc.AdaptiveAggregatedDistance(
[distance0, distance1], adaptive=False)
abc = pyabc.ABCSMC(model, prior, distance)
db_path = "sqlite:///" + os.path.join(tempfile.gettempdir(), "tmp.db")
abc.new(db_path, observation)
history3 = abc.run(max_nr_populations=6)
plot_history(history3)
INFO:Sampler:Parallelizing the sampling on 4 cores.
INFO:History:Start <ABCSMC(id=4, start_time=20200517 19:08:29.746638, end_time=None)>
INFO:ABC:Calibration sample before t=0.
INFO:Epsilon:initial epsilon is 0.5422601259707908
INFO:ABC:t: 0, eps: 0.5422601259707908.
INFO:ABC:Acceptance rate: 100 / 206 = 4.8544e01, ESS=1.0000e+02.
INFO:ABC:t: 1, eps: 0.3171556467305309.
INFO:ABC:Acceptance rate: 100 / 215 = 4.6512e01, ESS=9.2208e+01.
INFO:ABC:t: 2, eps: 0.19220858620929399.
INFO:ABC:Acceptance rate: 100 / 250 = 4.0000e01, ESS=9.5807e+01.
INFO:ABC:t: 3, eps: 0.11699645089129365.
INFO:ABC:Acceptance rate: 100 / 443 = 2.2573e01, ESS=8.5142e+01.
INFO:ABC:t: 4, eps: 0.0745616388302114.
INFO:ABC:Acceptance rate: 100 / 624 = 1.6026e01, ESS=8.8944e+01.
INFO:ABC:t: 5, eps: 0.044523155316940115.
INFO:ABC:Acceptance rate: 100 / 1267 = 7.8927e02, ESS=7.9595e+01.
INFO:History:Done <ABCSMC(id=4, start_time=20200517 19:08:29.746638, end_time=20200517 19:08:36.384282)>
Here, precalibration performs comparable to adaptation, because the weights do not change so much over time. Note that one can also specify other scale functions, by passing AdaptiveAggregatedDistance(distances, scale_function)
, e.g. pyabc.distance.mean[/median/span]
.
The following plots demonstrate that we not only have a much better posterior approximation after the same number of iterations in the second and third run compared to the first, but we achieve that actually with a much lower number of samples.
[6]:
histories = [history1, history2, history3]
labels = ["Standard", "Adaptive", "PreCalibrated"]
pyabc.visualization.plot_sample_numbers(histories, labels, rotation=45)
pyabc.visualization.plot_total_sample_numbers(histories, labels, yscale='log10', rotation=45)
pyabc.visualization.plot_effective_sample_sizes(histories, labels)
[6]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f86e8dce810>
Script based external simulators¶
In this notebook, we demonstrate the usage of the script based external simulators, summary statistics, and distance functions features.
These allow to use neararbitrary programing languages and output for pyABC. The main concept is that all communication happens via the file system. This comes at the cost of a considerable overhead, making this feature not optimal for models with a low simulation time. For more expensive models, the overhead should be negligible.
This notebook is similar to the using_R notebook, but circumvents usage of the rpy2 package.
[1]:
import pyabc
import pyabc.external
/home/yannik/anaconda3/lib/python3.7/sitepackages/distributed/config.py:20: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
defaults = yaml.load(f)
Here, we define model, summary statistics and distance. Note that if possible, alternatively FileIdSumStat can be used to read in the summary statistics directly to python and then use a python based distance function.
[2]:
model = pyabc.external.ExternalModel(executable="Rscript", file="rmodel/model.r")
sumstat = pyabc.external.ExternalSumStat(executable="Rscript", file="rmodel/sumstat.r")
distance = pyabc.external.ExternalDistance(executable="Rscript", file="rmodel/distance.r")
dummy_sumstat = pyabc.external.create_sum_stat() # can also use a real file here
[3]:
pars = {'meanX': 3, 'meanY': 4}
mf = model(pars)
print(mf)
sf = sumstat(mf)
print(sf)
distance(sf, dummy_sumstat)
{'loc': '/tmp/modelsim_n6oahdmc', 'returncode': 0}
{'loc': '/tmp/sumstat_nys4kf_l', 'returncode': 0}
[3]:
2.994722
[4]:
from pyabc import Distribution, RV, ABCSMC
prior = Distribution(meanX=RV("uniform", 0, 10),
meanY=RV("uniform", 0, 10))
abc = ABCSMC(model, prior, distance,
summary_statistics=sumstat,
population_size=20)
import os
from tempfile import gettempdir
db = "sqlite:///" + os.path.join(gettempdir(), "test.db")
abc.new(db, dummy_sumstat)
history = abc.run(minimum_epsilon=0.9, max_nr_populations=4)
INFO:History:Start <ABCSMC(id=1, start_time=20190923 18:36:16.815211, end_time=None)>
INFO:Epsilon:initial epsilon is 5.276756500000001
INFO:ABC:t:0 eps:5.276756500000001
INFO:ABC:t:1 eps:4.03523
INFO:ABC:t:2 eps:2.3500524869456942
INFO:ABC:t:3 eps:1.857179536835703
INFO:History:Done <ABCSMC(id=1, start_time=20190923 18:36:16.815211, end_time=20190923 18:37:14.892234)>
Note the significantly longer runtimes compared to using rpy2. This is because the simulation time of this model is very short, such that repeatedly accessing the file system creates a notable overhead. For more expensive models, this overhead however becomes less notable. Still, if applicable, more efficient ways of communication between model and simulator are preferable.
[5]:
from pyabc.visualization import plot_kde_2d
for t in range(history.n_populations):
df, w = abc.history.get_distribution(0, t)
ax = plot_kde_2d(df, w, "meanX", "meanY",
xmin=0, xmax=10,
ymin=0, ymax=10,
numx=100, numy=100)
ax.scatter([4], [8],
edgecolor="black",
facecolor="white",
label="Observation");
ax.legend();
ax.set_title("PDF t={}".format(t))
Data plots¶
This example illustrates data plotting for summary statistics of observed vs. simulated data. This can be used for easy assessment of how good we can fit the data.
Let’s start to import the necessary classes. We also set up matplotlib and we’re going to use numpy and pandas as well.
[1]:
from pyabc.visualization import plot_data_default
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
Define summary statistics¶
First, we will define some arbitrary summary statistics to be used in this example. We will define different summary statistics with different data types. Data types of the value of the summary statistic can be 1d numpy array, 2d numpy array, pandas data frame.
[2]:
observed = {'data 1': np.array([1,1,2,2,3,3,4,4]),
'data 2': pd.DataFrame({'measurement': [1,2,3,4,1,2,3,4]}),
'data 3': pd.DataFrame({'measurement 1': [1,2,3,4,1,2,3,4],
'measurement 2': [1,5,4,6,7,2,6,2]}),
'data 4': np.array([[4,4,5,3,2,2,1,2],[4,3,4,3,1,1,2,1]]),
'data 5': np.array([1,1,2,2,3,3,4,4]),
}
We do the same for the simulated data:
[3]:
simulated = {'data 1': np.array([1,2,4,6,8,10,12,14]),
'data 2': pd.DataFrame({'measurement': [1,2,4,6,1,2,4,6]}),
'data 3': pd.DataFrame({'measurement 1': [2,3,4,5,3,4,1,2],
'measurement 2': [1,6,5,4,6,2,5,2]}),
'data 4': np.array([[13,13,9,7,8,3,2,1],[14,12,10,9,6,4,3,2]]),
'data 5': np.array([1,2,4,6,8,10,12,14]),
}
Plot¶
Now that we have defined two dictionaries for both the observed and simulated summary statistics, we can call the plotting function from pyABC.visualization
[4]:
plot_data_default(observed, simulated)
plt.gcf().set_size_inches(9, 6)
plt.show()
Note that there is also a function pyabc.visualization.plot_data_callback
operating via callback functions and thus allowing more flexibility. This function is illustrated in the conversion reaction notebook.
Measurement noise assessment¶
In this notebook, we illustrate how to use pyABC with different noise models. For simplicity, we use a simple ODE model of a conversion reaction. For simplicity, we consider a single parameter:
[1]:
import numpy as np
import scipy as sp
import matplotlib.pyplot as plt
import pyabc
%matplotlib inline
init = np.array([1, 0])
def f(y, t0, theta1, theta2=np.exp(2)):
x1, x2 = y
dx1 =  theta1 * x1 + theta2 * x2
dx2 = theta1 * x1  theta2 * x2
return dx1, dx2
theta1_true = np.exp(2.5)
theta_true = {'theta1': theta1_true}
theta_min, theta_max = 0.05, 0.15
prior = pyabc.Distribution(
theta1=pyabc.RV("uniform", theta_min, theta_maxtheta_min))
n_time = 10
measurement_times = np.linspace(0, 10, n_time)
We assume that the underlying dynamics of our observations follow the following model:
[2]:
def model(pars):
sol = sp.integrate.odeint(
f, init, measurement_times,
args=(pars["theta1"],))
return {'X_2': sol[:,1]}
true_trajectory = model(theta_true)
However, we assume that our measurements are subject to additive Gaussian noise:
[3]:
sigma = 0.02
def model_noisy(pars):
sim = model(pars)
return {'X_2': sim['X_2'] + sigma * np.random.randn(n_time)}
measured_data = model_noisy(theta_true)
# plot data
plt.plot(measurement_times, true_trajectory['X_2'], color="C0",
label='Simulation')
plt.scatter(measurement_times, measured_data['X_2'],
color="C1", label='Data')
plt.xlabel('Time $t$')
plt.ylabel('Measurement $Y$')
plt.title('Conversion reaction: True parameters fit')
plt.legend()
plt.show()
Ignoring noise¶
In the notebook “Ordinary Differential Equations: Conversion Reaction”, this model is used without accounting for a noise model, which is strictly speaking not correct. In this case, we get the following result:
[4]:
def distance(simulation, data):
return np.absolute(data["X_2"]  simulation["X_2"]).sum()
abc = pyabc.ABCSMC(model, prior, distance)
abc.new(pyabc.create_sqlite_db_id(), measured_data)
history = abc.run(max_nr_populations=10)
INFO:Sampler:Parallelizing the sampling on 4 cores.
INFO:History:Start <ABCSMC(id=128, start_time=20200615 19:23:37.333605, end_time=None)>
INFO:ABC:Calibration sample before t=0.
INFO:Epsilon:initial epsilon is 0.5624782572935602
INFO:ABC:t: 0, eps: 0.5624782572935602.
INFO:ABC:Acceptance rate: 100 / 229 = 4.3668e01, ESS=1.0000e+02.
INFO:ABC:t: 1, eps: 0.3394877969449191.
INFO:ABC:Acceptance rate: 100 / 201 = 4.9751e01, ESS=9.9892e+01.
INFO:ABC:t: 2, eps: 0.19851989584245494.
INFO:ABC:Acceptance rate: 100 / 210 = 4.7619e01, ESS=9.9249e+01.
INFO:ABC:t: 3, eps: 0.15919232295862715.
INFO:ABC:Acceptance rate: 100 / 200 = 5.0000e01, ESS=9.9827e+01.
INFO:ABC:t: 4, eps: 0.14438278698726847.
INFO:ABC:Acceptance rate: 100 / 215 = 4.6512e01, ESS=9.9683e+01.
INFO:ABC:t: 5, eps: 0.1370462087460876.
INFO:ABC:Acceptance rate: 100 / 223 = 4.4843e01, ESS=9.9604e+01.
INFO:ABC:t: 6, eps: 0.1346182762154195.
INFO:ABC:Acceptance rate: 100 / 198 = 5.0505e01, ESS=9.8976e+01.
INFO:ABC:t: 7, eps: 0.13449453193467115.
INFO:ABC:Acceptance rate: 100 / 239 = 4.1841e01, ESS=9.4218e+01.
INFO:ABC:t: 8, eps: 0.13444005696155228.
INFO:ABC:Acceptance rate: 100 / 209 = 4.7847e01, ESS=9.5063e+01.
INFO:ABC:t: 9, eps: 0.13439720882423833.
INFO:ABC:Acceptance rate: 100 / 256 = 3.9062e01, ESS=9.8735e+01.
INFO:History:Done <ABCSMC(id=128, start_time=20200615 19:23:37.333605, end_time=20200615 19:23:41.795215)>
As one can see in the below plot, this converges to a point estimate as \(\varepsilon\rightarrow 0\), and does not correctly represent the posterior. In particular, in general this point estimate will not capture the correct parameter value (indicated by the grey line).
[5]:
_, ax = plt.subplots()
for t in range(history.max_t + 1):
pyabc.visualization.plot_kde_1d_highlevel(
history, x="theta1", t=t, refval=theta_true, refval_color='grey',
xmin=theta_min, xmax=theta_max, numx=100, ax=ax, label=f"Iteration {t}")
ax.legend()
plt.show()
Add noise to the model output¶
To correctly account for noise, there are essentially two possibilities: Firstly, we can use the noisified model output:
[6]:
abc = pyabc.ABCSMC(model_noisy, prior, distance)
abc.new(pyabc.create_sqlite_db_id(), measured_data)
history_noisy_output = abc.run(max_nr_populations=10)
INFO:Sampler:Parallelizing the sampling on 4 cores.
INFO:History:Start <ABCSMC(id=129, start_time=20200615 19:23:42.824635, end_time=None)>
INFO:ABC:Calibration sample before t=0.
INFO:Epsilon:initial epsilon is 0.6026361265317445
INFO:ABC:t: 0, eps: 0.6026361265317445.
INFO:ABC:Acceptance rate: 100 / 201 = 4.9751e01, ESS=1.0000e+02.
INFO:ABC:t: 1, eps: 0.40307937702819413.
INFO:ABC:Acceptance rate: 100 / 183 = 5.4645e01, ESS=9.9927e+01.
INFO:ABC:t: 2, eps: 0.26804878544751437.
INFO:ABC:Acceptance rate: 100 / 254 = 3.9370e01, ESS=9.8752e+01.
INFO:ABC:t: 3, eps: 0.23005068461777956.
INFO:ABC:Acceptance rate: 100 / 269 = 3.7175e01, ESS=8.7397e+01.
INFO:ABC:t: 4, eps: 0.19748032366319818.
INFO:ABC:Acceptance rate: 100 / 535 = 1.8692e01, ESS=9.5180e+01.
INFO:ABC:t: 5, eps: 0.18275242228026406.
INFO:ABC:Acceptance rate: 100 / 793 = 1.2610e01, ESS=8.6146e+01.
INFO:ABC:t: 6, eps: 0.16605305697113107.
INFO:ABC:Acceptance rate: 100 / 1622 = 6.1652e02, ESS=7.6403e+01.
INFO:ABC:t: 7, eps: 0.15351169146934754.
INFO:ABC:Acceptance rate: 100 / 3122 = 3.2031e02, ESS=8.4115e+01.
INFO:ABC:t: 8, eps: 0.1418635171783479.
INFO:ABC:Acceptance rate: 100 / 5402 = 1.8512e02, ESS=8.8656e+01.
INFO:ABC:t: 9, eps: 0.12830890205714054.
INFO:ABC:Acceptance rate: 100 / 12580 = 7.9491e03, ESS=8.4644e+01.
INFO:History:Done <ABCSMC(id=129, start_time=20200615 19:23:42.824635, end_time=20200615 19:23:59.654520)>
[7]:
_, ax = plt.subplots()
for t in range(history_noisy_output.max_t + 1):
pyabc.visualization.plot_kde_1d_highlevel(
history_noisy_output, x="theta1", t=t,
refval=theta_true, refval_color='grey',
xmin=theta_min, xmax=theta_max, ax=ax, numx=200, label=f"Iteration {t}")
ax.legend()
[7]:
<matplotlib.legend.Legend at 0x7feef67a5ac0>
This curve is much broader and, as one can show, closer to the correct posterior.
Modify the acceptance step¶
Secondly, we can alternatively use the nonnoisy model, but adjust the acceptance step:
[8]:
acceptor = pyabc.StochasticAcceptor()
kernel = pyabc.IndependentNormalKernel(var=sigma**2)
eps = pyabc.Temperature()
abc = pyabc.ABCSMC(model, prior, kernel, eps=eps, acceptor=acceptor)
abc.new(pyabc.create_sqlite_db_id(), measured_data)
history_acceptor = abc.run(max_nr_populations=10)
INFO:Sampler:Parallelizing the sampling on 4 cores.
INFO:History:Start <ABCSMC(id=130, start_time=20200615 19:24:00.496237, end_time=None)>
INFO:ABC:Calibration sample before t=0.
INFO:ABC:t: 0, eps: 18.524729283158507.
INFO:ABC:Acceptance rate: 100 / 383 = 2.6110e01, ESS=1.0000e+02.
INFO:ABC:t: 1, eps: 2.384675437876244.
INFO:ABC:Acceptance rate: 100 / 349 = 2.8653e01, ESS=9.8885e+01.
INFO:ABC:t: 2, eps: 1.0.
INFO:ABC:Acceptance rate: 100 / 190 = 5.2632e01, ESS=9.6103e+01.
INFO:ABC:Stopping: minimum epsilon.
INFO:History:Done <ABCSMC(id=130, start_time=20200615 19:24:00.496237, end_time=20200615 19:24:04.042482)>
We use a pyabc.StochasticAcceptor
for the acceptor, replacing the default pyabc.UniformAcceptor
, in order to accept when
where \(\pi(Dy,\theta)\) denotes the distribution of noisy data \(D\) given nonnoisy model output \(y\) and parameters \(\theta\). Here, we use a pyabc.IndependentNormalKernel
in place of a pyabc.Distance
to capture the normal noise \(\pi(Dy,\theta)\sim\mathcal{N}(Dy,\theta,\sigma)\). Also other noise models are possible, including Laplace or binomial noise. In place of the pyabc.Epsilon
, we employ a pyabc.Temperature
which implements schemes to decrease a
temperature \(T\searrow 1\), s.t. in iteration \(t\) we sample from
where \(p(y\theta)\) denotes the model output likelihood, and \(\pi(\theta)\) the parameters prior.
Each of acceptor, kernel and temperature offers various configuration options, however the default parameters have shown to be quite stable already.
[9]:
_, ax = plt.subplots()
for t in range(history_acceptor.max_t + 1):
pyabc.visualization.plot_kde_1d_highlevel(
history_acceptor, x="theta1", t=t,
refval=theta_true, refval_color='grey',
xmin=theta_min, xmax=theta_max, ax=ax, numx=200, label=f"Iteration {t}")
ax.legend()
plt.show()
We see that we get essentially the same posterior distribution as with the noisy output (a bit more peaked, showing that the approximation error with the noisy output was not negligible yet), however at a much lower computational cost, as the below plot shows:
[10]:
histories = [history_noisy_output, history_acceptor]
labels = ["noisy model", "stochastic acceptor"]
pyabc.visualization.plot_sample_numbers(histories, labels)
plt.show()
Thus, the stochastic acceptor is the proposed method to use in practice. For further details consult the API documentation.
Estimate noise parameters¶
Our formulation of the modified acceptance step allows the noise model to be parameterdependent (so does in theory also the noisified model output). Thus one can estimate parameters like e.g. the standard deviation of Gaussian noise onthefly. A parameterdependent noise model is specified by passing a function to the kernel, which takes the parameters and returns an array of variances corresponding to the data. This is currently implemented for the pyabc.IndependentNormalKernel
,
pyabc.IndependentLaplaceKernel
, pyabc.BinomialKernel
.
Parameters are often estimated on a logarithmic scale if fold changes are of interest. We show this here exemplarily with the example of the standard deviation of a normal noise kernel:
[11]:
theta_true_var = {'theta1': theta1_true, 'std': np.log10(sigma)}
prior = pyabc.Distribution(
theta1=pyabc.RV("uniform", theta_min, theta_maxtheta_min),
std=pyabc.RV("uniform", 2.5, 2))
def var(p):
return 10**(2*p['std']) * np.ones(n_time)
acceptor = pyabc.StochasticAcceptor()
kernel = pyabc.IndependentNormalKernel(var=var)
eps = pyabc.Temperature()
abc = pyabc.ABCSMC(model, prior, kernel, eps=eps, acceptor=acceptor, population_size=100)
abc.new(pyabc.create_sqlite_db_id(), measured_data)
history_acceptor_var = abc.run(max_nr_populations=10)
INFO:Sampler:Parallelizing the sampling on 4 cores.
INFO:History:Start <ABCSMC(id=131, start_time=20200615 19:24:04.966413, end_time=None)>
INFO:ABC:Calibration sample before t=0.
INFO:ABC:t: 0, eps: 18.853333200943837.
INFO:ABC:Acceptance rate: 100 / 315 = 3.1746e01, ESS=9.9992e+01.
INFO:ABC:t: 1, eps: 11.24653846776521.
INFO:ABC:Acceptance rate: 100 / 313 = 3.1949e01, ESS=8.5619e+01.
INFO:ABC:t: 2, eps: 7.119315454596579.
INFO:ABC:Acceptance rate: 100 / 323 = 3.0960e01, ESS=7.6864e+01.
INFO:ABC:t: 3, eps: 4.827923867251667.
INFO:ABC:Acceptance rate: 100 / 423 = 2.3641e01, ESS=8.3261e+01.
INFO:ABC:t: 4, eps: 3.7136446447402682.
INFO:ABC:Acceptance rate: 100 / 367 = 2.7248e01, ESS=6.6476e+01.
INFO:ABC:t: 5, eps: 2.8565397729146036.
INFO:ABC:Acceptance rate: 100 / 347 = 2.8818e01, ESS=6.6422e+01.
INFO:ABC:t: 6, eps: 2.1972537102600755.
INFO:ABC:Acceptance rate: 100 / 377 = 2.6525e01, ESS=7.4152e+01.
INFO:ABC:t: 7, eps: 1.6901301053216593.
INFO:ABC:Acceptance rate: 100 / 367 = 2.7248e01, ESS=6.8683e+01.
INFO:ABC:t: 8, eps: 1.1783208490227317.
INFO:ABC:Acceptance rate: 100 / 389 = 2.5707e01, ESS=6.4552e+01.
INFO:ABC:t: 9, eps: 1.0.
INFO:ABC:Acceptance rate: 100 / 266 = 3.7594e01, ESS=7.6172e+01.
INFO:ABC:Stopping: minimum epsilon.
INFO:History:Done <ABCSMC(id=131, start_time=20200615 19:24:04.966413, end_time=20200615 19:24:15.057312)>
[12]:
fig, ax = plt.subplots(1, 2)
for t in range(history_acceptor_var.max_t + 1):
pyabc.visualization.plot_kde_1d_highlevel(
history_acceptor_var, x="theta1", t=t,
refval=theta_true_var, refval_color='grey',
xmin=theta_min, xmax=theta_max, ax=ax[0], numx=200, label=f"Iteration {t}")
pyabc.visualization.plot_kde_1d_highlevel(
history_acceptor_var, x="std", t=t,
refval=theta_true_var, refval_color='grey',
xmin=2.5, xmax=0.5, ax=ax[1], numx=200, label=f"Iteration {t}")
ax[1].set_xlabel("log10(std)")
ax[1].set_ylabel(None)
ax[1].legend()
fig.set_size_inches((8, 4))
fig.tight_layout()
plt.show()
We see that we are able to estimate both parameters quite reasonably (the exact details of course depending on the data and model). For the present model, one could still derive the analytical posterior distribution, which we here omit for computational reasons. For an example, see this notebook.
PEtab import¶
PEtab is a format for specifying parameter estimation problems in systems biology. This notebook illustrates how the PEtab format can be used together with the ODE simulation toolbox AMICI to define ODE based parameter estimation problems for pyABC. Then, in pyABC we can perform exact sampling based on the algorithms introduced in this preprint.
To use this functionality, you need to have (at least) PEtab and AMICI installed. You can obtain these by installing pyABC with
pip install pyabc[amicipetab]
or installing them manually via
pip install petab amici
See also the tools’ installation guides for amici and petab.
[1]:
import petab
import pyabc
import amici.petab_import
from pyabc.petab import AmiciPetabImporter
import numpy as np
import os
os.environ["OMP_NUM_THREADS"] = "1"
/home/yannik/anaconda3/lib/python3.7/sitepackages/rpy2/robjects/pandas2ri.py:14: FutureWarning: pandas.core.index is deprecated and will be removed in a future version. The public classes are available in the toplevel namespace.
from pandas.core.index import Index as PandasIndex
We illustrate the usage of PEtab models using a model taken from the benchmark collection. Uncomment the following cell to clone the git repository.
[2]:
!git clone depth 1 https://github.com/LeonardSchmiester/BenchmarkModels.git \
tmp/benchmarkmodels  (cd tmp/benchmarkmodels && git pull)
Cloning into 'tmp/benchmarkmodels'...
remote: Enumerating objects: 3511, done.
remote: Counting objects: 100% (3511/3511), done.
remote: Compressing objects: 100% (922/922), done.
remote: Total 3511 (delta 2695), reused 3170 (delta 2564), packreused 0
Receiving objects: 100% (3511/3511), 201.90 MiB  20.18 MiB/s, done.
Resolving deltas: 100% (2695/2695), done.
Checking out files: 100% (3411/3411), done.
Now we can import a problem, here using the “Boehm_JProteomer2014” example, to AMICI and PEtab:
[2]:
# read the petab problem from yaml
petab_problem = petab.Problem.from_yaml(
"tmp/benchmarkmodels/hackathon_contributions_new_data_format/"
"Boehm_JProteomeRes2014/Boehm_JProteomeRes2014.yaml")
# compile the petab problem to an AMICI ODE model
model = amici.petab_import.import_petab_problem(petab_problem)
# the solver to numerically solve the ODE
solver = model.getSolver()
# import everything to pyABC
importer = AmiciPetabImporter(petab_problem, model, solver)
# extract what we need from the importer
prior = importer.create_prior()
model = importer.create_model()
kernel = importer.create_kernel()
Once everything has been compiled and imported, we can simply call the model:
[3]:
model(petab_problem.x_nominal_free_scaled)
[3]:
{'llh': 138.22199570334107}
By default, this only returns the log likelihood value. If also simulated data are to be returned (and stored in the pyABC datastore), pass store_simulations=True
to the importer.
That’s all. Now we can run an analysis using pyABC’s exact sequential sampler under the assumption of measurement noise. Note that the following cell takes, depending on the resources, minutes to hours to run through. Also, the resulting database is not provides here.
[6]:
# this takes some time
sampler = pyabc.MulticoreEvalParallelSampler(n_procs=20)
temperature = pyabc.Temperature()
acceptor = pyabc.StochasticAcceptor(
pdf_norm_method = pyabc.ScaledPDFNorm())
abc = pyabc.ABCSMC(model, prior, kernel,
eps=temperature,
acceptor=acceptor,
sampler=sampler,
population_size=1000)
abc.new("sqlite:///tmp/petab_amici_boehm.db", {})
abc.run()
INFO:Sampler:Parallelizing the sampling on 20 cores.
INFO:History:Start <ABCSMC(id=1, start_time=20200217 22:35:38.746712, end_time=None)>
INFO:ABC:Calibration sample before t=0.
INFO:ABC:t: 0, eps: 49386005.386943504.
INFO:ABC:Acceptance rate: 1000 / 3348 = 2.9869e01, ESS=1.0000e+03.
INFO:ABC:t: 1, eps: 3707.524037573809.
INFO:ABC:Acceptance rate: 1000 / 3582 = 2.7917e01, ESS=3.0162e+02.
INFO:ABC:t: 2, eps: 189.0069260595768.
INFO:ABC:Acceptance rate: 1000 / 3753 = 2.6645e01, ESS=2.9065e+02.
INFO:ABC:t: 3, eps: 94.5034630297884.
INFO:ABC:Acceptance rate: 1000 / 6376 = 1.5684e01, ESS=3.2931e+02.
INFO:ABC:t: 4, eps: 47.2517315148942.
INFO:ABC:Acceptance rate: 1000 / 16933 = 5.9056e02, ESS=2.7417e+02.
INFO:ABC:t: 5, eps: 23.6258657574471.
INFO:ABC:Acceptance rate: 1000 / 6990 = 1.4306e01, ESS=3.6511e+02.
INFO:ABC:t: 6, eps: 11.81293287872355.
INFO:ABC:Acceptance rate: 1000 / 50479 = 1.9810e02, ESS=2.3205e+02.
INFO:ABC:t: 7, eps: 5.906466439361775.
INFO:ABC:Acceptance rate: 1000 / 141531 = 7.0656e03, ESS=3.0510e+01.
INFO:ABC:t: 8, eps: 2.9532332196808877.
INFO:ABC:Acceptance rate: 1000 / 54532 = 1.8338e02, ESS=2.7007e+01.
INFO:ABC:t: 9, eps: 1.4766166098404439.
INFO:ABC:Acceptance rate: 1000 / 21583 = 4.6333e02, ESS=3.9807e+00.
INFO:ABC:t: 10, eps: 1.0.
INFO:ABC:Acceptance rate: 1000 / 12382 = 8.0762e02, ESS=1.3817e+01.
INFO:ABC:Stopping: minimum epsilon.
INFO:History:Done <ABCSMC(id=1, start_time=20200217 22:35:38.746712, end_time=20200217 23:03:26.661874)>
[6]:
<pyabc.storage.history.History at 0x7fa4da04f3a0>
Now we can use pyABC’s standard analysis and visualization routines to analyze the obtained posterior sample. In particular, we can extract boundaries and literature parameter values from the PEtab problem:
[4]:
h = pyabc.History("sqlite:///tmp/petab_amici_boehm.db", _id=1)
refval = {k: v for k,v in zip(petab_problem.x_free_ids, petab_problem.x_nominal_free_scaled)}
for i, par in enumerate(petab_problem.x_free_ids):
pyabc.visualization.plot_kde_1d_highlevel(
h, x=par,
xmin=petab_problem.get_lb(scaled=True,fixed=False)[i],
xmax=petab_problem.get_ub(scaled=True,fixed=False)[i],
refval=refval, refval_color='k')
Apparently, in this case seven out of the nine parameters can be estimated with high confidence, while two other parameters can only be bounded.
Download the examples as notebooks¶
Warning
Upgrade to the latest pyABC version before running the examples. If you installed pyABC some weeks (or days) a ago, some new features might have been added in the meantime. Refer to the Upgrading section on how to upgrade pyABC.
Parallel and Distributed Sampling¶
Strategies¶
The pyABC package offers a variety of different parallel and distributed sampling strategies. Singlecore, multicore and distributed execution is supported in a couple different ways. The ParticleParallel samplers and the MappingSampler implement the “Static Scheduling (STAT)” strategy. The EvalParallel samplers, the DaskDistributedSampler and the ConcurrentFutureSampler implement the “DynamicScheduling (DYN)” strategy. The batchsize argument of the DaskDistributedSampler and the ConcurrentFutureSampler allow to interpolate between dynamic and static scheduling.
Singlecore execution¶
For singlecore execution, pyABC offers
the pyabc.sampler.SingleCoreSampler
.
This one just generates sample by sample sequentially.
This sampler is intended for debugging purposes as debugging parallel
code can be hard sometimes.
Multicore only samplers¶
For multicore execution, pyABC implements two possible parallelization strategies.
First, the
pyabc.sampler.MulticoreParticleParallelSampler
implements the STAT sampling strategy.Next, the
pyabc.sampler.MulticoreEvalParallelSampler
implements the DYN strategy. This is currently the default sampler.
Both samplers are highly specialized to the multicore setting and have very little communication overhead. Even for very small model evaluation times these samplers are about as fast as the single core sampler. This is achieved circumventing object serialization by forking. As Microsoft Windows does not support forking, these samplers might not work as expected on Windows.
Distributed samplers¶
The distributed samplers can be used in a distributed setting, and of course also locally by setting up a local cluster. However, for local execution, the multicore samplers are recommended as they are easier to set up.
The pyabc.sampler.RedisEvalParallelSampler
has very low communication
overhead, and when running workers and redisserver locally is actually
competitive with the mutlicore only samplers.
The pyabc.sampler.RedisEvalParallelSampler
performs parameter sampling on a per worker basis, and can handle fast
function evaluations efficiently.
The pyabc.sampler.DaskDistributedSampler
has slightly higher
communication overhead, however this can be compensated with the batch
submission mode. As the pyabc.sampler.DaskDistributedSampler
performs parameter sampling locally on the master,
it is unsuitable for simulation functions with a runtime below 100ms,
as network communication becomes prohibitive at this point.
The Redis based sampler can require slightly more effort in setting up than the Dask based sampler, but has fewer constraints regarding simulation function runtime. The Dask sampler is in turn better suited to handle worker failures and unexpected execution host terminations.
General extensible samplers¶
Moreover, there are two more generic samplers which can be used in a multicore and distributed fashion. These samplers facilitate adaptation of pyABC to new parallel environments.
The pyabc.sampler.MappingSampler
can be used in a multicore context
if the provided map implementation is a multicore one, such as, e.g.
multiprocessing.Pool.map, or distributed if the map is a distributed one, such
as pyabc.sge.SGE.map
.
Similarly, the pyabc.sampler.ConcurrentFutureSampler
can use any
implementation of the python concurrent.futures.Executor interface. Again,
implementations are available for both multicore (e.g.
concurrent.futures.ProcessPoolExecutor) and distributed (e.g. Dask)
environments
Check the API documentation for more details.
How to set up a Redis based distributed cluster¶
Step 0: Prepare the redis server¶
To run the redis server, use a machine which is reachable both by the main application and by the workers. If you’re on Linux, you can install redis either via your package manager, or, if you’re using anaconda, via
conda install redis
Windows is currently not officially supported by the redis developers.
It is recommended to run a redis server only with password protection, since
it otherwise accepts any incoming connection. To set up password protection
on the server, you need to modify the redis.conf
file. Usually, such a file
exists under REDIS_INSTALL_DIR/etc/redis.conf
. You can however also set
up your own file. It suffices to add or uncomment the line
requirepass PASSWORD
where PASSWORD should be replaced by a more secure password.
Step 1: Start a redis server¶
In this example, we assume that the IP address of the machine running the
redis server is 111.111.111.111
(the default is localhost
),
and that the server should listen on port 6379
(the redis default).
If password protection is used, start the server via
redisserver /path/to/redis.conf port 6379
If no password protection is required, instead use
redisserver port 6379 protectedmode no
You should get an output looking similar to the one below:
30656:M 23 May 13:19:20.718 # You requested maxclients of 10000 requiring at least 10032 max file descriptors.
30656:M 23 May 13:19:20.718 # Server can't set maximum open files to 10032 because of OS error: Operation not permitted.
30656:M 23 May 13:19:20.718 # Current maximum open files is 4096. maxclients has been reduced to 4064 to compensate for low ulimit. If you need higher maxclients increase 'ulimit n'.
_._
_.``__ ''._
_.`` `. `_. ''._ Redis 3.2.9 (00000000/0) 64 bit
.`` .```. ```\/ _.,_ ''._
( ' , .`  `, ) Running in standalone mode
`._`...` __....``._'` _.' Port: 6379
 `._ `._ / _.'  PID: 30656
`._ `._ `./ _.' _.'
`._`._ `.__.' _.'_.'
 `._`._ _.'_.'  http://redis.io
`._ `._`.__.'_.' _.'
`._`._ `.__.' _.'_.'
 `._`._ _.'_.' 
`._ `._`.__.'_.' _.'
`._ `.__.' _.'
`._ _.'
`.__.'
30656:M 23 May 13:19:20.719 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
30656:M 23 May 13:19:20.719 # Server started, Redis version 3.2.9
30656:M 23 May 13:19:20.719 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
30656:M 23 May 13:19:20.719 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.
30656:M 23 May 13:19:20.719 * The server is now ready to accept connections on port 6379
Step 2 or 3: Start pyABC¶
It does not matter what you do first: starting pyABC or starting the workers. In your main program, assuming the models, priors and the distance function are defined, configure pyABC to use the redis sampler
from pyabc.sampler import RedisEvalParallelSampler
redis_sampler = RedisEvalParallelSampler(host="111.111.111.111", port=6379)
abc = pyabc.ABCSMC(models, priors, distance, sampler=redis_sampler)
If password protection is used, in addition pass the argument
password=PASSWORD
to the RedisEvalParallelSampler.
Then start the ABCSMC run as usual with
abc.run(...)
passing the stopping conditions.
Step 2 or 3: Start the workers¶
It does not matter what you do first: starting pyABC or starting the workers. You can even dynamically add workers after the sampling has started. Start as many workers as you wish on the machines you wish. Up to 10,000 workers should not pose any problem if the model evaluation times are on the scale or seconds or longer. You start workers on your cluster via
abcredisworker host=111.111.111.111 port=6379
If password protection is used, you need to append password=PASSWORD
.
You should get an output similar to
INFO:REDISWORKER:Start redis worker. Max run time 7200.0s, PID=2731
The abcredisworker
command has further options (see them via
abcredisworker help
), in particular to set the
maximal runtime of a worker, e.g. runtime=2h
, runtime=3600s
,
runtime=2d
, to start a worker running for 2 hours, 3600 seconds
or 2 days (the default is 2 hours). It is OK if a worker
stops during the sampling of a generation. You can add new workers during
the sampling process at any time.
The abcredisworker
command also has an option processes
which
allows you to start several worker procecces in parallel, e.g.
processes=12
. This might be handy in situations where you have to use
a whole cluster node with several cores.
Optional: Monitoring¶
pyABC ships with a small utility to manage the Redis based sampler setup. To monitor the ongoing sampling, execute
abcredismanager info host=111.111.111.111
If password protection is used, you need to specify the password via
password=PASSWORD
.
If no sampling has happened yet, the output should look like
Workers=None Evaluations=None Acceptances=None/None
The keys are to be read as follows:
Workers: Currently sampling workers. This will show None or zero, even if workers are connected but they are not running. This number drops to zero at the end of a generation.
Evaluations: Number of accumulated model evaluations for the current generations. This is a sum across all workers.
Acceptances: In
i/n
,i
particles out of a requested population size ofn
have been accepted already. It can bei>n
at the end of a population due to excess sampling.
Optional: Stopping workers¶
Use abcredismanager stop
to send a signal to the workers that they should
shutdown at the end of the current generation.
You can also stop workers with CtrlC
, or even send a kill signal when
pyABC has finished.
Optional: Something with the workers went wrong in the middle of a run¶
It can happen that workers get unexpectedly killed. If they are not able to communicate to the redisserver that they’ve finished working on the current population before they’re killed, the pyABC master process will wait forever. In such cases, the following can be done
Terminate all running workers (but not the pyABC master process and also not the redisserver)
Execute
abcredismanager resetworkers
to manually reset the number of registered workers to zero.Start worker processes again.
Highperformance infrastructure¶
pyABC has been successfully employed on various highperformance computing (HPC) infrastructures. There are a few things to keep in mind.
Longrunning master process¶
While most of the work happens on parallel workers, pyABC requires one longrunning master process in the background for all of the analysis (or rather two processes, namely the master process running the execution script, and in addition possibly a task scheduler like the redis server). If the HPC infrastructure does not allow for such longrunning processes with low CPU and memory requirements, one has to find a way around. Eventually, it is planned for pyABC to support lossfree automatic checkpointing and restarting, but presently this is not yet implemented. If possible, the master process can be run on external servers, login nodes, or on execution nodes while taking maximum runtimes and reliability of server and connections into consideration.
Job scheduling¶
HPC environments usually employ a job scheduler for distributing work to the execution nodes. Here, we shortly outline how pyABC can be integrated in such a setup. Exemplarily, we use a redis sampler, usage of in particular the dask sampler being similar.
Let us consider the widely used job scheduler
slurm. First, we need a script
script_redis_worker.sh
that starts the redis worker:
#!/bin/bash
# slurm settings
#SBATCH p {partition_id}
#SBATCH c {number_of_cpus}
#SBATCH t {time_in_minutes}
#SBATCH o {output_file_name}
#SBATCH e {error_file_name}
# prepare environment, e.g. set path
# run
absredisworker host={host_ip} port={port} runtime={runtime} \
processes={n_processes}
Here, n_processes
defines the number of processes started for that batch
job via multiprocessing. Some HPC setups prefer larger batch jobs, e.g. on a
node level, so here each job can already be given some parallelity. The
SBATCH
macros define the slurm setting to be used.
The above script would be submitted to the slurm job manager via sbatch
.
It makes sense to define a script for this as well:
#!/bin/bash
for i in {1..{n_jobs}}
do
sbatch script_redis_worker.sh
done
Here, n_jobs
would be the number of jobs submitted. When the job scheduler
is based on qsub, e.g. SGE/UGE, instead use a script like
#!/bin/bash
for i in {1..{n_jobs}}
do
qsub o {output_file_name} e {error_file_name} \
script_redis_worker.sh
and adapt the worker script. For both, there exist many more configuration options. For further details see the respective documentation.
Note that when planning for the number of overall redis workers, batches, and cpus per batch, also the parallelization on the level of the simulations has to be taken into account. Also, memory requirements should be checked in advance.
Pickling¶
Note
This section is of interest to developers, or if you encounter memory problems.
For most of the samplers, pyABC uses cloudpickle to serialize objects over the network and run simulations on remote nodes. In particular, this enables us to use lambda functions.
However, care must be taken w.r.t. the size of the serialized object, i.e. to only include what is really required. This is why in the pyabc.ABCSMC class we had to write some functions that prevent the whole ABCSMC object from being serialized. For developers, the following example illustrates the problem:
import cloudpickle as pickle
import numpy as np
class A:
def __init__(self):
self.big_arr = np.eye(10000)
self.small_arr = np.zeros(2)
def costly_function(self):
def fun(x):
print(self.small_arr, x)
return fun
def efficient_function(self):
small_arr = self.small_arr
def fun(x):
print(small_arr, x)
return fun
a = A()
print("The whole class:")
print(len(pickle.dumps(a)))
# 800001025
print("Costly function:")
print(len(pickle.dumps(a.costly_function())))
# 800001087
print("Efficient function:")
print(len(pickle.dumps(a.efficient_function())))
# 522
Parallel job execution on an SGE cluster environment¶
Quick start¶
The pyabc.sge package provides as most important class
the SGE
. Its map
method
automatically parallelizes across an SGE/UGE cluster.
The SGE class can be used in standalone mode or in combination
with the ABCSMC class (see below Usage notes).
Usage of the parallel package is fairly easy. For example:
from pyabc.sge import SGE
sge = SGE(priority=200, memory="3G")
def f(x):
return x * 2
tasks = [1, 2, 3, 4]
result = sge.map(f, tasks)
print(result)
[2, 4, 6, 8]
The job scheduling is either done via an SQLite database or a REDIS instance. REDIS is recommended as it works more robustly, in particular in cases where distributed file systems are rather slow.
Note
A configuration file in ~/.parallel
is required.
See SGE
.
The pyabc.sge.sge_available
can be used to check if an SGE cluster can be used on the machine.
Check the API documentation for more details.
Information about running jobs¶
Use the python m pyabc.sge.job_info_redis
to get a nicely formatted output
of the current execution state, in case the REDIS mode is used.
Check python m pyabc.sge.job_info_redis help
for more details.
Usage notes¶
The SGE
class can be used in standalone mode for
convenient parallelization of jobs across a cluster, completely independent
of the rest of the pyABC package.
The SGE
class can also be combined, for instance, with
the pyabc.sampler.MappingSampler
class for simple parallelization
of ABCSCM runs across an SGE cluster.
Exporting pyABC’s database to other file formats¶
pyABC ships with a small convenience utility, which allows to export pyABC’s SQL database to other formats. Supported formats are:
CSV
feather
json
msgpack
html
hdf
stata
An example call might look as follows:
abcexport db results.db out exported.feather format feather
This exports the database “results.db” generated by pyABC to “exported.feather”
which is written in in the feather format as indicated by the option
format feather
(the file extension of the exported file is not parsed).
Check:
abcexport help
for further options to customize the export.
Web based visualizations¶
The pyABC package comes with a web server, which displays lots of useful information on the currently running and already completed ABC tasks. You can launch it from the command line with
abcserver <databasename>
It opens per default a web server on port 5000.
You should see something similar to the following:
Via “Go to ABC Run List”, you can see all running and finished ABC runs, which you can then inspect in more detail.
You can get overviews over the models:
Information about individual model parameters for each model and time point is also displayed:
Type in the command line
abcserver help
To get more information on available options, such as selecting another port:
abcserver port=8888 <databasename>
Release Notes¶
0.10 series¶
0.10.4 (20200615)¶
Refactor __all__ imports and docs API build (#312).
Fix json export of aggregated adaptive distances (#316).
Apply additional flake8 checks on code quality (#317).
Assert model input is of type pyabc.Parameter (#318).
Extend noise notebook to estimated noise parameters (#319).
Implement optional pickling for multicore samplers; add MacOS pipeline tests (#320).
0.10.3 (20200517)¶
Speed up multivariate normal multiple sampling (#299).
Set default value for OMP_NUM_THREADS=1, stops warnings (#299).
Base default number of parallel cores on PYABC_NUM_PROCS (#309).
Update all notebooks to the latest numpy/scipy (#310).
0.10.2 (20200509)¶
Update CI test system: latest Ubuntu, python 3.8, simplify R build (#296).
Add weights logging to adaptive distances (#295).
Migrate CI tests to GitHub Actions for speedup, reliability and maintainability (#297, #298).
0.10.1 (20200317)¶
Allow separate calibration population sizes, slightly reformulate PopulationStrategy class (#278).
Allow specifying initial weights for adaptive distances, then without sampling from the prior (#279).
Check PEtab test suite in tests (#281).
0.10.0 (20200220)¶
Exact inference via stochastic acceptor finalized and tested (developed throughout the 0.9 series).
Support basic PEtab functionality using AMICI ODE simulations (#268).
Various error fixes (#265, #267).
Log number of processes used by multiprocessing samplers (#263).
Implement pyabc.acceptor.ScaledPDFNorm (#269).
Implement list population size (#274, #276).
On history loading, automatically find an id of a successful run (#273).
0.9 series¶
0.9.26 (20200124)¶
Add optional check whether database is nonexistent, to detect typos.
Set lower bound in 1dim KDEs to <= 0 to not wrongly display nearuniform distributions. (both #257)
Implement redis password protection for sampler and manage routine (#256).
Make samplers available in global namespace (#249).
Implement ListTemperature (#248).
Allow plotting the relative ESS (#245).
Allow resampling of weighted particles (#244).
Fix ABCSMC.load with rpy2 (#242).
0.9.25 (20200108)¶
Add summary statistics callback plot function (#231).
Add possibility to log employed norms in StochasticAcceptor (#231) and temperature proposals in Temperature (#232).
Implement optional early stopping in the MulticoreEvalParallelSampler and the SingleCoreSampler, when a maximum simulation number is exceeded (default behavior untouched).
Log stopping reason in ABCSMC.run (all #236).
Implement Poisson (#237) and negative binomial (#239) stochastic kernels.
Enable password protection for Redis sampler (#238).
Fix scipy deprecations (#234, #241).
0.9.24 (20191119)¶
In ABCSMC.run, allow a default infinite number of iterations, and log the ESS in each iteration.
Reformulate exponential temperature decay, allowing for a fixed number of iterations or fixed ratios.
Solve acceptance rate temperature match in log space for numeric stability.
Perform temperation of likelihood ratio in log space for numeric stability (all #221).
Fix wrong maximum density value in binomial kernel.
Allow not fixing the final temperature to 1 (all #223).
Allow passing id to history directly (#225).
Pass additional arguments to Acceptor.update.
Give optional min_rate argument to AcceptanceRateScheme (all #226).
In plot functions, add parameter specifying the reference value color (#227).
0.9.23 (20191110)¶
Fix extras_require directive.
Fix error with histogram plot arguments.
Extend test coverage for visualization (all #215).
ABCSMC.{new,load,run} all return the history with set id for convenience.
Document pickling paradigm of ABCSMC class (see doc/sampler.rst).
Always use lazy evaluation in updates (all #216).
Restructure run function of ABCSMC class (#216, #218).
Run notebooks on travis only on pull requests (#217).
Correct weighting in AcceptanceRateScheme (#219).
0.9.22 (20191105)¶
Fix error that prevented using rpy2 based summary statistics with non rpy2 based models (#213).
0.9.21 (20191105)¶
Introduce acceptor.StochasticAcceptor to encode the stochastic acceptance step generalizing the standard uniform criterion.
Introduce distance.StochasticKernel to encode noise distributions, with several concrete implementations already.
Introduce epsilon.Temperature to capture the temperature replacing the traditional epsilons. In addition, multiple concrete pyabc.epsilon.TemperatureSchemes have been implemented that handle the calculation of the next temperature value (all #197).
0.9.20 (20191030)¶
Add highlevel versions of the kde plotting routines (#204).
Add unit tests for common epsilon schemes (#207).
0.9.19 (20191023)¶
Move to cffi>=1.13.1 after that bug was surprisingly quickly fixed (#195).
Create submodule for epsilon (#189).
Add plots for sample and acceptance rate trajectories (#193).
0.9.18 (20191020)¶
Add create_sqlite_db_id convenience function to create database names.
Temporarily require cffi=1.12.2 for rpy2 on travis (all #185).
Introduce UniformAcceptor and SimpleFunctionAcceptor classes to streamline the traditional acceptance step.
Add AcceptorResult and allow weights in the acceptance step (all #184).
0.9.17 (20191010)¶
Use latest pypi rpy2 version on travis and rtd since now the relevant issues were addressed there (easier build, esp. for users).
Update rtd build to version 2 (all #179).
Render logo text for platform independence.
Prevent stochastic transition test from failing that often.
Remove deprecated pd.convert_objects call in web server.
Allow pandas.Series as summary statistics, by conversion to pandas.DataFrame (all #180).
0.9.16 (20191008)¶
Add AggregatedDistance function, and a basic selftuned version AdaptiveAggregatedDistance.
Add additional factors to PNormDistance and AggregatedDistance for flexibility. Minor API break: argument w renamed to weights.
In the adaptive_distances and the aggregated_distances notebooks, add examples where some methods can fail.
Add plot_total_sample_numbers plot (all #173).
0.9.15 (20190915)¶
Some extensions of external simulators interface (#168).
Add basic plots of summary statistics (#165).
Document highperformance infrastructure usage (#159).
Selfadministrative: Add social preview (#158), and link to zenodo (#157).
Fix external deprecations (#153).
Readd R related tests (#148).
0.9.14 (20190808)¶
Update to rpy2 3.1.0 (major change) (#140).
pandas data frames saved in database via pyarrow parquet, no longer msgpack (deprecated), with backward compatibility for old databases (#141).
Redis workers no longer stop working when encountering model errors (#133).
Minor edits, esp. color, size, axes options to plotting routines.
0.9.13 (20190625)¶
Fix dependency updates (rpy2, sklearn) and travis build.
Add option to limit number of particles for adaptive distance updates.
Rename confidence > credible intervals and plots (Bayesian context).
Extract from database and plot reference parameter values.
Allow to plot MAP value approximations in credible interval plots.
Add a general interface to external scripts that allow using pyabc in a simple way in particular with other programing languages.
0.9.12 (20190502)¶
Reorganize distance module (minor API change: distance_functions > distance, and some classes shortened accordingly)
Allow to pass parameters to Acceptor and Distance.
Make time and parameter arguments to distance functions optional.
Rewrite lazy evaluation for calibration sample in ABCSMC class.
Give default values for ABCSMC.run arguments, which set no stopping criterion.
Add function and plot for effective sample size.
0.9.11 (20190401)¶
Run some notebooks as part of the tests.
Automatize pypi upload via travis.
0.9.10 (20190327)¶
Save number of samples taken in calibration step in database.
Fix error with reported number of simulations in EpsMixin based samplers.
Fix several warnings.
0.9.9 (20190325)¶
Monitor code quality using codacy and codecov.
Extend visualization routines: Add histogram, sample number, epsilon trajectory, model probability, and credible interval plots.
Test visualization routines on travis.
Fix problem with the History.get_weighted_distances function after update to sqlalchemy>=1.3.0.
Add random walk based transition for discrete parameters.
0.9.8 (20190221)¶
Tidy up returning of rejected samples in Sample (not only summary statistics).
Recreate a population from file in History.get_population().
Speed up loading from database by eager loading.
Document the change of the contribution scheme to master+develop.
0.9.7 (20190220)¶
Allow for the database to save no summary statistics for testing purposes.
Tidy up some pyabc.History methods.
pyabc.History.id set by default to the largest index (previously 0), corresponding to the latest inserted analysis.
0.9.6 (20190201)¶
Fix several errors with the readthedocs (rtd) documentation.
Speedup rtd build by removing unnecessary conda and pip requirements.
Cleanup requirements for travis and rtd.
Change rtd design from alabaster to sphinx_rtd_theme since it implements better navigation.
0.9.5 (20190117)¶
ABCSMC can pass observed summary statistics to distance functions (required for some scale functions, and to make the methods robust to volatile summary statistics).
Implementation of more scale functions (distance_functions.scales), in particular some taking into account the bias to the observed data.
AdaptivePNormDistance accepts a Callable as scaling scheme, allowing for more flexibility.
0.9.4 (20181218)¶
Can specify kde and number of bins for all visualization routines.
Can resubmit observed sum stats to ABCSMC.load() function in case it cannot be read correctly from the db.
0.9.3 (20181201)¶
Fix serious memory problem resulting from pickling more than necessary for parallel sampling.
Update logo, readme.
Make tidying optional in abcexport (default behavior not changed).
0.9.2 (20180910)¶
Minor error and warning fixes due to API changes in pandas, seaborn (not used any more), and change of the R installation on travis.
0.9.1 (20180605)¶
Default visualizations like plot_kde_matrix() can plot reference values, useful for testing purposes.
0.9.0¶
Acceptance transferred to an Acceptor object to allow for more flexibility (i.e. not only on a single comparison as per default).
This acceptor is passed to the ABCSMC object.
Update of distance and epsilon synchronized after each iteration and moved to update() methods.
initialize() for DistanceFunction and Epsilon also called in load() method, given a time point to initialize for, and made optional via a require_initialize flag. This makes sure these objects are always correctly initialized.
PNormDistance and AdaptivePNormDistance (prev. WeightedPNormDistance) improved to allow for more customization.
ABCSMC.set_data() method removed.
API breaks for DistanceFunction, Epsilon, Model.
0.8 series¶
0.8.21¶
Implementation of adaptive distances feature. Distance functions can adapt via an update() method.
In particular add WeightedPNormDistance (special case: WeightedEuclideanDistance). Also add nonweighted versions.
Simplify Sampler.sample_until_n_accepted interface.
Extend Sampler class to allow for customization, e.g. by the distance functions.
Generalize MedianEpsilon to QuantileEpsilon.
Make Viserver work with latest bokeh version.
0.8.20¶
Add batch sampling now also to the REDIS evaluation parallel sampler (dynamic scheduling)
0.8.19¶
Bug fix. Fix a race condition in the redis evaluation parallel sampler (dynamic scheduling). An error occured if a worker tried to start to work on a population after the other workers had already terminated the population.
0.8.18¶
Minor bug fix. Ensure that the multicore samplers raise an Exception if an Exception occurs in the worker processes.
Clarify that weighted distances are not normalized in case of having more than a single simulation per proposed parameter. Also add corresponding tests.
Add n_worker method to the RedisEvalParallelSampler to enable querying of the number of connected workers.
Add inmemory database support. Useful, e.g., for benchmarking on slow filesystems or with rather slow network connections.
0.8.17¶
Make git and gitpython an optional dependency.
0.8.16¶
Add “abcredismanager resetworkers” command in case workers were unexpectedly killed.
Adapt web server to changed bkcharts API.
0.8.15¶
Bug fix. Rand seed initialization in case of starting multiple workers with –processes in redis server was not correct.
0.8.14¶
Bug fix in MulticoreEvalParallelSampler. The multiprocessing.Queue could fill up and cause a deadlock on joining the workers. This is now fixed.
Rename
population_specification
topopulation_size
.Improve
plot_kde_matrix
plot ranger are now handled in a less confusing way
0.8.13¶
Minor doc fixes
Python 3.5 support dropped. It might still work for a while with Python 3.5 but this is not guaranteed anymore.
Add kde matrix visualization function
Add 2d tumor growth example
Add Gillespie example
Change license
0.8.12¶
Minor bug fix. Visualization server produced error when JSON information was empty.
Adapt to new bkcharts packge.
0.8.11¶
Ensure R source file is reloaded when unpickling R objects.
0.8.10¶
Add id
option to abcexport to handle databases with multiple ABC runs.
0.8.9¶
Ensure that summary statistics have names.
Also add kwargs to plot_kde_2d
which are passed to pcolormesh.
0.8.8¶
Add processes
option to abcredisworker to start a number of workers
in parallel.
0.8.7¶
Make rpy2 an optional dependency. If rpy2 is installed, then R can be used if not, the rest will still work.
0.8.6¶
minor bug fixes
0.8.5¶
minor bug fix in plot_kde_2d if the axis is provided
0.8.5¶
minor bug fix. The external.R interface did not display the source code correctly.
minor doc updates
0.8.4¶
support serialization of DataFrames used as summary statistics for storage in the database. This feature is still considered experimental.
Add command line utility to export pyABC’s database to different file formats such as csv, feather, html, json and more.
0.8.3¶
Add (experimental) support for models defined in R.
Add some visualization functions for convenience.
0.8.2¶
Bug fixes for web server.
0.8.1¶
Minor internal refactorings and minor documetation updates. Nothing a user should notice.
0.8.0¶
Deprecate the “set_data” method of the ABCSMC class. Use the “new” method instead.
Add a “load” method to the ABCSMC class for easier resuming stored ABCSMC runs.
Add an example to the documentation how to resume stored ABCSMC runs.
Rename the acceptance_rate parameter form ABCSMC.run to min_acceptance_rate for clarity. Usage of acceptance_rate is deprecated.
Various documentation improvements, correcting typos, clarifications, etc.
0.7 series¶
0.7.2¶
Easier early stopping models via the IntegratedModel class. Also has now examples.
0.7.1¶
Minor refactoring for better Windows compatibility. But runs in serial on Windows
0.7.0¶
ABCSMC.run gets a new parameter “acceptance_rate” to stop sampling if the acceptance rate drops too low.
History.get_all_populations returns a DataFrame with columns “t”, “population_end_time”, “samples”, “epsilon”, “particles”. That is “nr_samples” got renamed to “samples” and “particles” is new.
0.6 series¶
0.6.4¶
Performance improvement. Use MulticoreEvalParallelSampler as default. This should bring better performance for machines with many cores and comparatively small population sizes.
0.6.3¶
Bug fix. Ensure numpy.int64 can also be passed to History methods were an integer argument is expected.
0.6.2¶
Bug fix. Forgot to add the new Multicore base class.
0.6.1¶
MulticoreEvalParallelSampler gets an n_procs parameter.
0.5 series¶
0.5.2¶
 Minor History API changes
Remove History.get_results_distribution
rename History.get_weighted_particles_dataframe to History.get_distribution
0.5.1¶
 Minor ABCSMC API changes
Mark the de facto private methods as private by prepending an underscore. This should not cause trouble as usually noone would ever use these methods.
0.5.0¶
 Usability improvements and minor API canges
ABCSMC accepts now an integer to be passed for constant population size
The maximum number populations specification has moved from the PopulationStrategy classes to the ABCSMC.run method. The ABCSMC.run method will be where it is defined when to stop.
0.4 series¶
0.4.4¶
 Improvements to adaptive population size strategy
Use same CV estimation algorithm for Transition and PopulationStrategy
Bootstrapping on full joint space for model selection
0.4.3¶
Fix edge case of models without parameters for population size adaptation
0.4.2¶
 Changes to the experimental adaptive population strategy.
Smarter update for model selection
Better CV estimation
0.4.1¶
fix minor bug in RVs wrapper. args and keyword args were not passed to the wrapper random variable.
0.4.0¶
Add local transition class which makes a local KDE fit.
Fix corner cases of adaptive population size strategy
Change the default: Do not stop if only a single model is alive.
Also include population 0, i.e. a sample from the prior, in the websever visualization
 Minor bug fixes
Fix inconsistency in ABC options if db_path given as sole string argument
 Add four evaluation parallel samplers
 Dask based implementation
More communication overhead
 Future executor evaluation parallel sampler
Very similar to the Dask implementation
 Redis based implementation
Less communication overhad
Performs also well for short running simulations
 Multicore evaluation parallel sampler
In most common cases, where the population size is much bigger than the number of cores, this sampler is not going to be faster than the multicore particle parallel sampler.
However, on machines with lots of cores and moderate sized populations this sampler might be faster
0.3 series¶
0.3.3¶
Fix SGE regression. Forgot to update a module path on refactoring.
0.3.2¶
PEP8¶
Comply with PEP8 with a few exceptions where it does not make sense. Flake8 runs now with the test. The tests do not pass if flake8 complains.
Legacy code cleanup¶
Remove legacy classes such as the MultivariateMultiTypeNormalDistributions and the legacy covariance calculation. Also remove devideas folder.
0.2 series¶
0.2.0¶
Add an efficient multicore sampler¶
The new sampler relies on forking instead of pickling for the sample_one
,
simulate_one
and accept_one
functions.
This brings a huge performance improvement for single machine multicore settings
compared to multiprocessing.Pool.map
like execution which repeatedly pickles.
About¶
pyABC version: 0.10.4
Source code: https://github.com/icbdcm/pyabc
Authors¶
The first main developer of the package was Emmanuel Klinger, with major contributions from Dennis Rickert and Nick Jagiella. Currently, the main developers are Yannik Schaelte, Emad Alamoudi and Elba Raimundez.
Contact¶
Discovered an error? Need help? Not sure if something works as intended? Please contact us!
If you think that your issue could be of general interest, please consider creating an issue on github, which will then also be helpful for other users: https://github.com/icbdcm/pyabc/issues
If you prefer to contact us via email, please use: Yannik Schaelte yannik.schaelte@helmholtzmuenchen.de
License¶
This package is licensed under a permissive 3clause BSD license.
Copyright 2017 the pyABC developers
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
1. Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
3. Neither the name of the copyright holder nor the names of its
contributors may be used to endorse or promote products derived from
this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
Contribute documentation¶
Documentation is an essential part of the software development process. We want to provide a useful piece of software. It is therefore necessary to have a good documentation, such that the user knows how to use our package. Contributions to the documentation are as welcome as contributions to the code.
Contribute tests¶
We’re committed to testing our code. Tests that are required to pass are located in the
test
folder. All files starting with test_
contain tests and are automatically run
on Travis CI. To run them manually, type:
python3 m pytest test
You can also run specific tests only.
We encourage to test whatever possible. However, it might not always be easy to test code which is based on random sampling. We still encourage to provide general sanity and integration tests. We highly encourage a testdriven development (TDD) style.
Contribute code¶
If you start working on a new feature or a fix, if not already done, please create an issue on github, shortly describing your plans, and assign it to yourself. Your starting point should not be the master branch, but the develop branch, which contains the latest updates.
Create an own branch or fork, on which you can implement your changes. To get your work merged, please:
create a pull request to the develop branch,
check that all tests on travis pass,
check that the documentation is uptodate,
request a code review from the main developers.
Document all your changes in the pull request, and make sure to appropriately resolve issues, and delete stale branches after a successful merge.
Deploy¶
New featuers and bug fixes are continually added to the develop branch. On
every merge to master, the version number in pyabc/version.py
should
be incremented as described below.
Versioning scheme¶
For version numbers, we use A.B.C
, where
C
is increased for bug fixes,B
is increased for new features and minor API breaking changes,A
is increased for major API breaking changes.
Thus, we roughly follow the versioning scheme suggested by the Python packaging guide.
Create a new release¶
After new commits have been added via pull requests to the develop branch, changes can be merged to master and a new version of pyABC can be released. Every merge to master should coincide with an incremented version number and a git tag on the respective merge commit.
Merge into master¶
create a pull request from develop to master,
check that all tests on travis pass,
check that the documentation is uptodate,
adapt the version number in
pyabc/version.py
(see above),update the release notes in
doc/releasenotes.rst
,request a code review,
merge into the origin master branch.
To be able to actually perform the merge, sufficient rights may be required. Also, at least one review is required.
Create a release on github¶
After merging into master, create a new release on GitHub. This can be done either directly on the project GitHub website, or via the CLI as described in Git Basics  Tagging. In the release form,
specify a tag with the new version as specified in
pyabc/version.py
,include the latest additions to
doc/releasenotes.rst
in the release description.
Upload to the Python package index PyPI¶
The upload to pypi has been automatized via travis since pyabc version 0.9.11, and is triggered whenever a new release tag is created on the github master branch.
To manually upload a new version to pypi, proceed as follows:
First, a so called “wheel” is created via:
python setup.py bdist_wheel
A wheel is essentially a zip archive which contains the source code and the binaries (if any).
This archive is uploaded using twine:
twine upload dist/pyabcx.y.zpy3nonany.wheel
replacing x.y.z by the appropriate version number.
For a more indepth discussion see also the section on distributing packages of the Python packaging guide
API reference¶
pyABC¶
ABCSMC algorithms for Bayesian parameter inference and model selection.
Note
pyABC allows to parallelize the sampling process via various samplers. If you want to also parallelize single model simulations, be careful that both levels of parallelization work together well. In particular, if the environment variable OMP_NUM_THREADS is not set, pyABC sets it to a default of 1. For multiprocessed sampling (the default at least on linux systems), the flag PYABC_NUM_PROCS can be used to determine on how many jobs to parallelize the sampling.
ABCSMC¶
Parallel Approximate Bayesian Computation  Sequential Monte Carlo (ABCSMC).
The ABCSMC class is the most central class of the pyABC package. Most of the other classes serve to configure it. (I.e. the other classes implement a Strategy pattern.)

class
pyabc.smc.
ABCSMC
(models: Union[List[pyabc.model.Model], pyabc.model.Model, Callable], parameter_priors: Union[List[pyabc.random_variables.Distribution], pyabc.random_variables.Distribution, Callable], distance_function: Union[pyabc.distance.base.Distance, Callable] = None, population_size: Union[pyabc.populationstrategy.PopulationStrategy, int] = 100, summary_statistics: Callable[[model_output], dict] = <function identity>, model_prior: pyabc.random_variables.RV = None, model_perturbation_kernel: pyabc.random_variables.ModelPerturbationKernel = None, transitions: Union[List[pyabc.transition.base.Transition], pyabc.transition.base.Transition] = None, eps: pyabc.epsilon.base.Epsilon = None, sampler: pyabc.sampler.base.Sampler = None, acceptor: pyabc.acceptor.acceptor.Acceptor = None, stop_if_only_single_model_alive: bool = False, max_nr_recorded_particles: int = inf)[source]¶ Bases:
object
Approximate Bayesian Computation  Sequential Monte Carlo (ABCSMC).
This is an implementation of an ABCSMC algorithm similar to 1.
 Parameters
models (list of models, single model, list of functions, or single function) –
If models is a function, then the function should have a single parameter, which is of dictionary type, and should return a single dictionary, which contains the simulated data.
If models is a list of functions, then the first point applies to each function.
Models can also be a list of Model instances or a single Model instance.
This model’s output is passed to the summary statistics calculation. Per default, the model is assumed to already return the calculated summary statistics. Accordingly, the default summary_statistics function is just the identity. Note that the sampling and evaluation of particles happens in the model’s methods, so overriding these offers a great deal of flexibility, in particular the freedom to use or ignore the distance_function, summary_statistics, and eps parameters here.
parameter_priors (List[Distribution]) – A list of prior distributions for the models’ parameters. Each list entry is the prior distribution for the corresponding model.
distance_function (Distance, optional) – Measures the distance of the tentatively sampled particle to the measured data.
population_size (int, PopulationStrategy, optional) – Specify the size of the population. If
population_specification
is anint
, then the size is constant. Adaptive population sizes are also possible by passing apyabc.populationstrategy.PopulationStrategy
object. The default is 100 particles per population.summary_statistics (Callable[[model_output], dict]) – A function which takes the raw model output as returned by any ot the models and calculates the corresponding summary statistics. Note that the default value is just the identity function. I.e. the model is assumed to already calculate the summary statistics. However, in a model selection setting it can make sense to have the model produce some kind or raw output and then take the same summary statistics function for all the models.
model_prior (RV, optional) – A random variable giving the prior weights of the model classes. The default is a uniform prior over the model classes,
RV("randint", 0, len(models))
.model_perturbation_kernel (ModelPerturbationKernel) – Kernel which governs with which probability to switch from one model to another model for a given sample while generating proposals for the subsequent population from the current population.
transitions (List[Transition], Transition, optional) – A list of
pyabc.transition.Transition
objects or a singlepyabc.transition.Transition
in case of a single model. Defaults to multivariate normal transitions for every model.eps (Epsilon, optional) – Accepts any
pyabc.epsilon.Epsilon
subclass. The default is thepyabc.epsilon.MedianEpsilon
which adapts automatically. The object passed here determines how the acceptance threshold scheduling is performed.sampler (Sampler, optional) – In some cases, a mapper implementation will require initialization to run properly, e.g. database connection, grid setup, etc.. The sampler is an object that encapsulates this information. The default sampler
pyabc.sampler.MulticoreEvalParallelSampler
will parallelize across the cores of a single machine only.acceptor (Acceptor, optional) – Takes a distance function, summary statistics and an epsilon threshold to decide about acceptance of a particle. Argument accepts any subclass of
pyabc.acceptor.Acceptor
, or a function convertible to an acceptor. Defaults to apyabc.acceptor.UniformAcceptor
.stop_if_only_single_model_alive (bool) – Defaults to False. Set this to true if you want to stop ABCSMC automatically as soon as only a single model has survived.
max_nr_recorded_particles (int) – Defaults to inf. Set this to the maximum number of accepted and rejected particles that methods like the AdaptivePNormDistance function use to update themselves each iteration.
 1
Toni, Tina, and Michael P. H. Stumpf. “SimulationBased Model Selection for Dynamical Systems in Systems and Population Biology”. Bioinformatics 26, no. 1, 104–10, 2010. doi:10.1093/bioinformatics/btp619.

__init__
(models: Union[List[pyabc.model.Model], pyabc.model.Model, Callable], parameter_priors: Union[List[pyabc.random_variables.Distribution], pyabc.random_variables.Distribution, Callable], distance_function: Union[pyabc.distance.base.Distance, Callable] = None, population_size: Union[pyabc.populationstrategy.PopulationStrategy, int] = 100, summary_statistics: Callable[[model_output], dict] = <function identity>, model_prior: pyabc.random_variables.RV = None, model_perturbation_kernel: pyabc.random_variables.ModelPerturbationKernel = None, transitions: Union[List[pyabc.transition.base.Transition], pyabc.transition.base.Transition] = None, eps: pyabc.epsilon.base.Epsilon = None, sampler: pyabc.sampler.base.Sampler = None, acceptor: pyabc.acceptor.acceptor.Acceptor = None, stop_if_only_single_model_alive: bool = False, max_nr_recorded_particles: int = inf)[source]¶ Initialize self. See help(type(self)) for accurate signature.

load
(db: str, abc_id: int = 1, observed_sum_stat: dict = None) → pyabc.storage.history.History[source]¶ Load an ABCSMC run for continuation.
 Parameters
db (str) – A SQLAlchemy database identifier pointing to the database from which to continue a run.
abc_id (int, optional) – The id of the ABCSMC run in the database which is to be continued. The default is 1. If more than one ABCSMC run is stored, use the
abc_id
parameter to indicate which one to continue.observed_sum_stat (dict, optional) – The observed summary statistics. This field should be used only if the summary statistics cannot be reproduced exactly from the database (in particular when they are no numpy or pandas objects, e.g. when they were generated in R). If None, then the summary statistics are read from the history.

new
(db: str, observed_sum_stat: dict = None, *, gt_model: int = None, gt_par: dict = None, meta_info=None) → pyabc.storage.history.History[source]¶ Make a new ABCSMC run.
 Parameters
db (str) –
Has to be a valid SQLAlchemy database identifier. This indicates the database to be used (and created if necessary and possible) for the ABCSMC run.
To use an inmemory database pass “sqlite://”. Note that inmemory databases are only available on the master mode. If workers are started on different nodes they won’t be able to access the database. This should not be a problem in most scenarios. The inmemory option is mainly useful for benchmarking (and maybe) for testing.
observed_sum_stat (dict, optional) –
This is the really important parameter here. It is of the form
{'statistic_1': val_1, 'statistic_2': val_2, ... }
.The dictionary provided here represents the measured data. Particle during ABCSMC sampling are compared against the summary statistics provided here.
The summary statistics’ values can be integers, floats, strings and everything which is a numpy array or can be converted to one (e.g. lists). In addition, pandas.DataFrames can also be used as summary statistics. Note that storage of pandas DataFrames in pyABC’s database is still considered experimental.
This parameter is optional, as the distance function might implement comparison to the observed data on its own. Not giving this parameter is equivalent to passing an empty dictionary
{}
.gt_model (int, optional) – This is only meta data stored to the database, but not actually used for the ABCSMC algorithm. If you want to predict your ABCSMC procedure against synthetic samples, you can use this parameter to indicate the ground truth model number. This helps with further analysis. If you use actually measured data (and don’t know the ground truth) you don’t have to set this.
gt_par (dict, optional) – Similar to
ground_truth_model
, this is only for recording purposes in the database, but not used in the ABCSMC algorithm. This stores the parameters of the ground truth model if it was synthetically obtained. Don’t give this parameter if you don’t know the ground truth.meta_info (dict, optional) – Can contain an arbitrary number of keys, only for recording purposes. Store arbitrary meta information in this dictionary. Can be used for really anything. This dictionary is stored in the database.
 Returns
history – The history, with set history.id, which is the id under which the generated ABCSMC run entry in the database can be identified.
 Return type

run
(minimum_epsilon: float = None, max_nr_populations: int = inf, min_acceptance_rate: float = 0.0) → pyabc.storage.history.History[source]¶ Run the ABCSMC model selection until either of the stopping criteria is met.
 Parameters
minimum_epsilon (float, optional) – Stop if epsilon is smaller than minimum epsilon specified here. Defaults in general to 0.0, and to 1.0 for a Temperature epsilon.
max_nr_populations (int, optional (default = np.inf)) – The maximum number of populations. Stop if this number is reached.
min_acceptance_rate (float, optional (default = 0.0)) – Minimal allowed acceptance rate. Sampling stops if a population has a lower rate.
Population after population is sampled and particles which are close enough to the observed data are accepted and added to the next population. If an adaptive Epsilon is specified (this is the default), then the acceptance threshold decreases from population to population automatically in a data dependent way.
Sampling of further populations is stopped, when either of the three stopping criteria is met:
the maximum number of populations
max_nr_populations
is reached,the acceptance threshold for the last sampled population was smaller than
minimum_epsilon
,or the acceptance rate dropped below
acceptance_rate
.
The value of
minimum_epsilon
determines the quality of the ABCSMC approximation. The smaller the better. But sampling time also increases with decreasingminimum_epsilon
.This method can be called repeatedly to sample further populations after sampling was stopped once.
Distance functions¶
Distance functions measure closeness of observed and sampled data. This module implements various commonly used distance functions for ABC, featuring a few advanced concepts.
For custom distance functions, either pass a plain function to ABCSMC or subclass the pyabc.Distance class.

class
pyabc.distance.
AcceptAllDistance
[source]¶ Bases:
pyabc.distance.base.Distance
Just a mock distance function which always returns 1. So any sample should be accepted for any sane epsilon object.
Can be used for testing.

__call__
(x: dict, x_0: dict, t: int = None, par: dict = None) → float[source]¶ Evaluate at time point t the distance of the summary statistics of the data simulated for the tentatively sampled particle to those of the observed data.
Abstract method. This method has to be overwritten by all concrete implementations.
 Parameters
x (dict) – Summary statistics of the data simulated for the tentatively sampled parameter.
x_0 (dict) – Summary statistics of the observed data.
t (int) – Time point at which to evaluate the distance. Usually, the distance will not depend on the time.
par (dict) – The parameters used to create the summary statistics x. These can be required by some distance functions. Usually, the distance will not depend on the parameters.
 Returns
distance – Quantifies the distance between the summary statistics of the data simulated for the tentatively sampled particle and of the observed data.
 Return type
float


class
pyabc.distance.
AdaptiveAggregatedDistance
(distances: List[pyabc.distance.base.Distance], initial_weights: List = None, factors: Union[List, dict] = None, adaptive: bool = True, scale_function: Callable = None, log_file: str = None)[source]¶ Bases:
pyabc.distance.distance.AggregatedDistance
Adapt the weights of AggregatedDistances automatically over time.
 Parameters
distances – As in AggregatedDistance.
initial_weights – Weights to be used in the initial iteration. List with a weight for each distance function.
factors – As in AggregatedDistance.
adaptive – True: Adapt weights after each iteration. False: Adapt weights only once at the beginning in initialize(). This corresponds to a precalibration.
scale_function – Function that takes a list of floats, namely the values obtained by applying one of the distances passed to a set of samples, and returns a single float, namely the weight to apply to this distance function. Default: scale_span.
log_file – A log file to store weights for each time point in. Weights are currently not stored in the database. The data are saved in json format and can be retrieved via pyabc.storage.load_dict_from_json.

__init__
(distances: List[pyabc.distance.base.Distance], initial_weights: List = None, factors: Union[List, dict] = None, adaptive: bool = True, scale_function: Callable = None, log_file: str = None)[source]¶  Parameters
distances (List) – The distance functions to apply.
weights (Union[List, dict], optional (default = [1,..])) – The weights to apply to the distances when taking the sum. Can be a list with entries in the same order as the distances, or a dictionary of lists, with the keys being the single time points (if the weights should be iterationspecific).
factors (Union[List, dict], optional (dfault = [1,..])) – Scaling factors that the weights are multiplied with. The same structure applies as to weights. If None is passed, a factor of 1 is considered for every summary statistic. Note that in this class, factors are superfluous as everything can be achieved with weights alone, however in subclsses the factors can remain static while weights adapt over time, allowing for greater flexibility.

class
pyabc.distance.
AdaptivePNormDistance
(p: float = 2, initial_weights: dict = None, factors: dict = None, adaptive: bool = True, scale_function: Callable = None, normalize_weights: bool = True, max_weight_ratio: float = None, log_file: str = None)[source]¶ Bases:
pyabc.distance.distance.PNormDistance
In the pnorm distance, adapt the weights for each generation, based on the previous simulations. This class is motivated by 1.
 Parameters
p – p for pnorm. Required p >= 1, p = np.inf allowed (infinitynorm). Default: p=2.
initial_weights – Weights to be used in the initial iteration. Dictionary with observables as keys and weights as values.
factors – As in PNormDistance.
adaptive – True: Adapt distance after each iteration. False: Adapt distance only once at the beginning in initialize(). This corresponds to a precalibration.
scale_function – (data: list, x_0: float) > scale: float. Computes the scale (i.e. inverse weight s = 1 / w) for a given summary statistic. Here, data denotes the list of simulated summary statistics, and x_0 the observed summary statistic. Implemented are absolute_median_deviation, standard_deviation (default), centered_absolute_median_deviation, centered_standard_deviation.
normalize_weights – Whether to normalize the weights to have mean 1. This just possibly smoothes the decrease of epsilon and might aid numeric stability, but is not strictly necessary.
max_weight_ratio – If not None, large weights will be bounded by the ratio times the smallest nonzero absolute weight. In practice usually not necessary, it is theoretically required to ensure convergence.
log_file – A log file to store weights for each time point in. Weights are currently not stored in the database. The data are saved in json format and can be retrieved via pyabc.storage.load_dict_from_json.
 1
Prangle, Dennis. “Adapting the ABC Distance Function”. Bayesian Analysis, 2017. doi:10.1214/16BA1002.

__init__
(p: float = 2, initial_weights: dict = None, factors: dict = None, adaptive: bool = True, scale_function: Callable = None, normalize_weights: bool = True, max_weight_ratio: float = None, log_file: str = None)[source]¶ Initialize self. See help(type(self)) for accurate signature.

configure_sampler
(sampler: pyabc.sampler.base.Sampler)[source]¶ Make the sampler return also rejected particles, because these are needed to get a better estimate of the summary statistic variabilities, avoiding a bias to accepted ones only.
 Parameters
sampler (Sampler) – The sampler employed.

get_config
() → dict[source]¶ Return configuration of the distance.
 Returns
config – Dictionary describing the distance.
 Return type
dict

class
pyabc.distance.
AggregatedDistance
(distances: List[pyabc.distance.base.Distance], weights: Union[List, dict] = None, factors: Union[List, dict] = None)[source]¶ Bases:
pyabc.distance.base.Distance
Aggregates a list of distance functions, all of which may work on subparts of the summary statistics. Then computes and returns the weighted sum of the distance values generated by the various distance functions.
All class functions are propagated to the children and the obtained results aggregated appropriately.

__call__
(x: dict, x_0: dict, t: int = None, par: dict = None) → float[source]¶ Applies all distance functions and computes the weighted sum of all obtained values.

__init__
(distances: List[pyabc.distance.base.Distance], weights: Union[List, dict] = None, factors: Union[List, dict] = None)[source]¶  Parameters
distances (List) – The distance functions to apply.
weights (Union[List, dict], optional (default = [1,..])) – The weights to apply to the distances when taking the sum. Can be a list with entries in the same order as the distances, or a dictionary of lists, with the keys being the single time points (if the weights should be iterationspecific).
factors (Union[List, dict], optional (dfault = [1,..])) – Scaling factors that the weights are multiplied with. The same structure applies as to weights. If None is passed, a factor of 1 is considered for every summary statistic. Note that in this class, factors are superfluous as everything can be achieved with weights alone, however in subclsses the factors can remain static while weights adapt over time, allowing for greater flexibility.

configure_sampler
(sampler: pyabc.sampler.base.Sampler)[source]¶ Note: configure_sampler is applied by all distances sequentially, so care must be taken that they perform no contradictory operations on the sampler.

static
format_dict
(w, t, n_distances, default_val=1.0)[source]¶ Normalize weight or factor dictionary to the employed format.

get_config
() → dict[source]¶ Return configuration of the distance.
 Returns
config – Dictionary describing the distance.
 Return type
dict

initialize
(t: int, get_all_sum_stats: Callable[], List[dict]], x_0: dict = None)[source]¶ This method is called by the ABCSMC framework before the first use of the distance (at the beginning of ABCSMC.run()), and can be used to calibrate it to the statistics of the samples.
The default is to do nothing.
 Parameters
t (int) – Time point for which to initialize the distance.
get_all_sum_stats (Callable[[], List[dict]]) – Returns on command the initial summary statistics.
x_0 (dict, optional) – The observed summary statistics.

update
(t: int, get_all_sum_stats: Callable[], List[dict]]) → bool[source]¶ The sum_stats are passed on to all distance functions, each of which may then update using these. If any update occurred, a value of True is returned indicating that e.g. the distance may need to be recalculated since the underlying distances changed.


class
pyabc.distance.
BinomialKernel
(p: Union[float, Callable], ret_scale: str = 'SCALE_LOG', keys: List[str] = None, pdf_max: float = None)[source]¶ Bases:
pyabc.distance.kernel.StochasticKernel
A kernel with a binomial probability mass function.
 Parameters
p (Union[float, Callable]) – The success probability.
keys, pdf_max (ret_scale,) –

__call__
(x: dict, x_0: dict, t: int = None, par: dict = None) → float[source]¶ Evaluate at time point t the distance of the summary statistics of the data simulated for the tentatively sampled particle to those of the observed data.
Abstract method. This method has to be overwritten by all concrete implementations.
 Parameters
x (dict) – Summary statistics of the data simulated for the tentatively sampled parameter.
x_0 (dict) – Summary statistics of the observed data.
t (int) – Time point at which to evaluate the distance. Usually, the distance will not depend on the time.
par (dict) – The parameters used to create the summary statistics x. These can be required by some distance functions. Usually, the distance will not depend on the parameters.
 Returns
distance – Quantifies the distance between the summary statistics of the data simulated for the tentatively sampled particle and of the observed data.
 Return type
float

class
pyabc.distance.
Distance
[source]¶ Bases:
abc.ABC
Abstract base class for distance objects.
Any object that computes the similarity between observed and simulated data should inherit from this class.

abstract
__call__
(x: dict, x_0: dict, t: int = None, par: dict = None) → float[source]¶ Evaluate at time point t the distance of the summary statistics of the data simulated for the tentatively sampled particle to those of the observed data.
Abstract method. This method has to be overwritten by all concrete implementations.
 Parameters
x (dict) – Summary statistics of the data simulated for the tentatively sampled parameter.
x_0 (dict) – Summary statistics of the observed data.
t (int) – Time point at which to evaluate the distance. Usually, the distance will not depend on the time.
par (dict) – The parameters used to create the summary statistics x. These can be required by some distance functions. Usually, the distance will not depend on the parameters.
 Returns
distance – Quantifies the distance between the summary statistics of the data simulated for the tentatively sampled particle and of the observed data.
 Return type
float

configure_sampler
(sampler: pyabc.sampler.base.Sampler)[source]¶ This is called by the ABCSMC class and gives the distance the opportunity to configure the sampler. For example, the distance might request the sampler to also return rejected particles in order to adapt the distance to the statistics of the sample. The method is called by the ABCSMC framework before the first used of the distance (at the beginning of ABCSMC.run()), after initialize().
The default is to do nothing.
 Parameters
sampler (Sampler) – The sampler used in ABCSMC.

get_config
() → dict[source]¶ Return configuration of the distance.
 Returns
config – Dictionary describing the distance.
 Return type
dict

initialize
(t: int, get_all_sum_stats: Callable[], List[dict]], x_0: dict = None)[source]¶ This method is called by the ABCSMC framework before the first use of the distance (at the beginning of ABCSMC.run()), and can be used to calibrate it to the statistics of the samples.
The default is to do nothing.
 Parameters
t (int) – Time point for which to initialize the distance.
get_all_sum_stats (Callable[[], List[dict]]) – Returns on command the initial summary statistics.
x_0 (dict, optional) – The observed summary statistics.

to_json
() → str[source]¶ Return JSON encoded configuration of the distance.
 Returns
json_str – JSON encoded string describing the distance. The default implementation is to try to convert the dictionary returned by
get_config
. Return type
str:

update
(t: int, get_all_sum_stats: Callable[], List[dict]]) → bool[source]¶ Update the distance for the upcoming generation t.
The default is to do nothing.
 Parameters
t (int) – Time point for which to update the distance.
get_all_sum_stats (Callable[[], List[dict]]) – Returns on demand a list of all summary statistics from the finished generation that should be used to update the distance.
 Returns
is_updated – Whether the distance has changed compared to beforehand. Depending on the result, the population needs to be updated in ABCSMC before preparing the next generation. Defaults to False.
 Return type
bool

abstract

class
pyabc.distance.
DistanceWithMeasureList
(measures_to_use='all')[source]¶ Bases:
pyabc.distance.base.Distance
Base class for distance functions with measure list. This class is not functional on its own.
 Parameters
measures_to_use (Union[str, List[str]]) –
If set to “all”, all measures are used. This is the default.
If a list is provided, the measures in the list are used.
measures refers to the summary statistics.

__init__
(measures_to_use='all')[source]¶ Initialize self. See help(type(self)) for accurate signature.

get_config
()[source]¶ Return configuration of the distance.
 Returns
config – Dictionary describing the distance.
 Return type
dict

initialize
(t: int, get_all_sum_stats: Callable[], List[dict]], x_0: dict = None)[source]¶ This method is called by the ABCSMC framework before the first use of the distance (at the beginning of ABCSMC.run()), and can be used to calibrate it to the statistics of the samples.
The default is to do nothing.
 Parameters
t (int) – Time point for which to initialize the distance.
get_all_sum_stats (Callable[[], List[dict]]) – Returns on command the initial summary statistics.
x_0 (dict, optional) – The observed summary statistics.

class
pyabc.distance.
IdentityFakeDistance
[source]¶ Bases:
pyabc.distance.base.Distance
A fake distance function, which just passes the summary statistics on. This class assumes that the model already returns the distance. This can be useful in cases where simulating can be stopped early, when during the simulation some condition is reached which makes it impossible to accept the particle.

__call__
(x: dict, x_0: dict, t: int = None, par: dict = None) → float[source]¶ Evaluate at time point t the distance of the summary statistics of the data simulated for the tentatively sampled particle to those of the observed data.
Abstract method. This method has to be overwritten by all concrete implementations.
 Parameters
x (dict) – Summary statistics of the data simulated for the tentatively sampled parameter.
x_0 (dict) – Summary statistics of the observed data.
t (int) – Time point at which to evaluate the distance. Usually, the distance will not depend on the time.
par (dict) – The parameters used to create the summary statistics x. These can be required by some distance functions. Usually, the distance will not depend on the parameters.
 Returns
distance – Quantifies the distance between the summary statistics of the data simulated for the tentatively sampled particle and of the observed data.
 Return type
float


class
pyabc.distance.
IndependentLaplaceKernel
(scale: Union[Callable, List[float], float] = None, keys: List[str] = None, pdf_max: float = None)[source]¶ Bases:
pyabc.distance.kernel.StochasticKernel
This kernel can be used for efficient computations of largescale independent Laplace distributions, performing computations directly on a logscale to avoid numeric issues. In each coordinate, a 1dim Laplace distribution
\[p(x) = \frac{1}{2b}\exp (\frac{1}{b}xa)\]is assumed.
 Parameters
scale (Union[array_like, float, Callable], optional (default = ones vector)) – Scale terms b of the distribution. Can also be a Callable taking as arguments the parameters. In that case, pdf_max should also be given if it is supposed to be used. Usually, it will then be given as the density at the observed statistics assuming the minimum allowed variance.
pdf_max (keys,) –

__call__
(x: dict, x_0: dict, t: int = None, par: dict = None)[source]¶ Evaluate at time point t the distance of the summary statistics of the data simulated for the tentatively sampled particle to those of the observed data.
Abstract method. This method has to be overwritten by all concrete implementations.
 Parameters
x (dict) – Summary statistics of the data simulated for the tentatively sampled parameter.
x_0 (dict) – Summary statistics of the observed data.
t (int) – Time point at which to evaluate the distance. Usually, the distance will not depend on the time.
par (dict) – The parameters used to create the summary statistics x. These can be required by some distance functions. Usually, the distance will not depend on the parameters.
 Returns
distance – Quantifies the distance between the summary statistics of the data simulated for the tentatively sampled particle and of the observed data.
 Return type
float

class
pyabc.distance.
IndependentNormalKernel
(var: Union[Callable, List[float], float] = None, keys: List[str] = None, pdf_max: float = None)[source]¶ Bases:
pyabc.distance.kernel.StochasticKernel
This kernel can be used for efficient computations of largescale independent normal distributions, circumventing the covariance matrix, and performing computations directly on a logscale to avoid numeric issues.
 Parameters
var (Union[array_like, float, Callable], optional (default = ones vector)) – Variances of the distribution (assuming zeros in the offdiagonal of the covariance matrix). Can also be a Callable taking as arguments the parameters. In that case, pdf_max should also be given if it is supposed to be used. Usually, it will then be given as the density at the observed statistics assuming the minimum allowed variance.
pdf_max (keys,) –

__call__
(x: dict, x_0: dict, t: int = None, par: dict = None)[source]¶ Evaluate at time point t the distance of the summary statistics of the data simulated for the tentatively sampled particle to those of the observed data.
Abstract method. This method has to be overwritten by all concrete implementations.
 Parameters
x (dict) – Summary statistics of the data simulated for the tentatively sampled parameter.
x_0 (dict) – Summary statistics of the observed data.
t (int) – Time point at which to evaluate the distance. Usually, the distance will not depend on the time.
par (dict) – The parameters used to create the summary statistics x. These can be required by some distance functions. Usually, the distance will not depend on the parameters.
 Returns
distance – Quantifies the distance between the summary statistics of the data simulated for the tentatively sampled particle and of the observed data.
 Return type
float

class
pyabc.distance.
MinMaxDistance
(measures_to_use='all')[source]¶ Bases:
pyabc.distance.distance.RangeEstimatorDistance
Calculate upper and lower margins as max and min of the parameters. This works surprisingly well for normalization in simple cases

class
pyabc.distance.
NegativeBinomialKernel
(p: float, ret_scale: str = 'SCALE_LOG', keys: List[str] = None, pdf_max: float = None)[source]¶ Bases:
pyabc.distance.kernel.StochasticKernel
A kernel with a negative binomial probability mass function.
 Parameters
p (Union[float, Callable]) – The success probability.
keys, pdf_max (ret_scale,) –

__call__
(x: dict, x_0: dict, t: int = None, par: dict = None) → float[source]¶ Evaluate at time point t the distance of the summary statistics of the data simulated for the tentatively sampled particle to those of the observed data.
Abstract method. This method has to be overwritten by all concrete implementations.
 Parameters
x (dict) – Summary statistics of the data simulated for the tentatively sampled parameter.
x_0 (dict) – Summary statistics of the observed data.
t (int) – Time point at which to evaluate the distance. Usually, the distance will not depend on the time.
par (dict) – The parameters used to create the summary statistics x. These can be required by some distance functions. Usually, the distance will not depend on the parameters.
 Returns
distance – Quantifies the distance between the summary statistics of the data simulated for the tentatively sampled particle and of the observed data.
 Return type
float

class
pyabc.distance.
NoDistance
[source]¶ Bases:
pyabc.distance.base.Distance
Implements a kind of null object as distance function. This can be used as a dummy distance function if e.g. integrated modeling is used.
Note
This distance function cannot be evaluated, so currently it is in particular not possible to use an epsilon threshold which requires initialization, because during initialization the distance function is invoked directly and not via the acceptor as usual. Conceptually, this would be possible and can be implemented on request.

__call__
(x: dict, x_0: dict, t: int = None, par: dict = None) → float[source]¶ Evaluate at time point t the distance of the summary statistics of the data simulated for the tentatively sampled particle to those of the observed data.
Abstract method. This method has to be overwritten by all concrete implementations.
 Parameters
x (dict) – Summary statistics of the data simulated for the tentatively sampled parameter.
x_0 (dict) – Summary statistics of the observed data.
t (int) – Time point at which to evaluate the distance. Usually, the distance will not depend on the time.
par (dict) – The parameters used to create the summary statistics x. These can be required by some distance functions. Usually, the distance will not depend on the parameters.
 Returns
distance – Quantifies the distance between the summary statistics of the data simulated for the tentatively sampled particle and of the observed data.
 Return type
float


class
pyabc.distance.
NormalKernel
(cov: numpy.ndarray = None, ret_scale: str = 'SCALE_LOG', keys: List[str] = None, pdf_max: float = None)[source]¶ Bases:
pyabc.distance.kernel.StochasticKernel
A kernel with a normal, i.e. Gaussian, probability density. This is just a wrapper around sp.multivariate_normal.
 Parameters
cov (array_like, optional (default = identiy matrix)) – Covariance matrix of the distribution.
keys, pdf_max (ret_scale,) –
Note
The order of the entries in the mean and cov vectors is assumed to be the same as the one in keys. If keys is None, it is assumed to be the same as the one obtained via sorted(x.keys()) for summary statistics x.

__call__
(x: dict, x_0: dict, t: int = None, par: dict = None) → float[source]¶ Return the value of the normal distribution at x  x_0, or its logarithm.

class
pyabc.distance.
PCADistance
(measures_to_use='all')[source]¶ Bases:
pyabc.distance.distance.DistanceWithMeasureList
Calculate distance in whitened coordinates.
A whitening transformation \(X\) is calculated from an initial sample. The distance is measured as euclidean distance in the transformed space. I.e
\[d(x,y) = \ Wx  Wy \\]
__call__
(x: dict, x_0: dict, t: int = None, par: dict = None) → float[source]¶ Evaluate at time point t the distance of the summary statistics of the data simulated for the tentatively sampled particle to those of the observed data.
Abstract method. This method has to be overwritten by all concrete implementations.
 Parameters
x (dict) – Summary statistics of the data simulated for the tentatively sampled parameter.
x_0 (dict) – Summary statistics of the observed data.
t (int) – Time point at which to evaluate the distance. Usually, the distance will not depend on the time.
par (dict) – The parameters used to create the summary statistics x. These can be required by some distance functions. Usually, the distance will not depend on the parameters.
 Returns
distance – Quantifies the distance between the summary statistics of the data simulated for the tentatively sampled particle and of the observed data.
 Return type
float

__init__
(measures_to_use='all')[source]¶ Initialize self. See help(type(self)) for accurate signature.

initialize
(t: int, get_all_sum_stats: Callable[], List[dict]], x_0: dict = None)[source]¶ This method is called by the ABCSMC framework before the first use of the distance (at the beginning of ABCSMC.run()), and can be used to calibrate it to the statistics of the samples.
The default is to do nothing.
 Parameters
t (int) – Time point for which to initialize the distance.
get_all_sum_stats (Callable[[], List[dict]]) – Returns on command the initial summary statistics.
x_0 (dict, optional) – The observed summary statistics.


class
pyabc.distance.
PNormDistance
(p: float = 2, weights: dict = None, factors: dict = None)[source]¶ Bases:
pyabc.distance.base.Distance
Use a weighted pnorm
\[d(x, y) = \left [\sum_{i} \left w_i ( x_iy_i ) \right^{p} \right ]^{1/p}\]to compute distances between sets of summary statistics. E.g. set p=2 to get a Euclidean distance.
 Parameters
p (float, optional (default = 2)) – p for pnorm. Required p >= 1, p = np.inf allowed (infinitynorm).
weights (dict, optional (default = 1)) – Weights. Dictionary indexed by time points. Each entry contains a dictionary of numeric weights, indexed by summary statistics labels. If None is passed, a weight of 1 is considered for every summary statistic. If no entry is available in weights for a given time point, the maximum available time point is selected. It is also possible to pass a single dictionary index by summary statistics labels, if weights do not change in time.
factors (dict, optional (default = 1)) – Scaling factors that the weights are multiplied with. The same structure applies as to weights. If None is passed, a factor of 1 is considered for every summary statistic. Note that in this class, factors are superfluous as everything can be achieved with weights alone, however in subclasses the factors can remain static while weights adapt over time, allowing for greater flexibility.

__call__
(x: dict, x_0: dict, t: int = None, par: dict = None) → float[source]¶ Evaluate at time point t the distance of the summary statistics of the data simulated for the tentatively sampled particle to those of the observed data.
Abstract method. This method has to be overwritten by all concrete implementations.
 Parameters
x (dict) – Summary statistics of the data simulated for the tentatively sampled parameter.
x_0 (dict) – Summary statistics of the observed data.
t (int) – Time point at which to evaluate the distance. Usually, the distance will not depend on the time.
par (dict) – The parameters used to create the summary statistics x. These can be required by some distance functions. Usually, the distance will not depend on the parameters.
 Returns
distance – Quantifies the distance between the summary statistics of the data simulated for the tentatively sampled particle and of the observed data.
 Return type
float

__init__
(p: float = 2, weights: dict = None, factors: dict = None)[source]¶ Initialize self. See help(type(self)) for accurate signature.

static
format_dict
(w, t, sum_stat_keys, default_val=1.0)[source]¶ Normalize weight or factor dictionary to the employed format.

get_config
() → dict[source]¶ Return configuration of the distance.
 Returns
config – Dictionary describing the distance.
 Return type
dict

initialize
(t: int, get_all_sum_stats: Callable[], List[dict]], x_0: dict = None)[source]¶ This method is called by the ABCSMC framework before the first use of the distance (at the beginning of ABCSMC.run()), and can be used to calibrate it to the statistics of the samples.
The default is to do nothing.
 Parameters
t (int) – Time point for which to initialize the distance.
get_all_sum_stats (Callable[[], List[dict]]) – Returns on command the initial summary statistics.
x_0 (dict, optional) – The observed summary statistics.

class
pyabc.distance.
PercentileDistance
(measures_to_use='all')[source]¶ Bases:
pyabc.distance.distance.RangeEstimatorDistance
Calculate normalization 20% and 80% from percentiles as lower and upper margins

PERCENTILE
= 20¶ The percentiles

get_config
()[source]¶ Return configuration of the distance.
 Returns
config – Dictionary describing the distance.
 Return type
dict


class
pyabc.distance.
PoissonKernel
(ret_scale: str = 'SCALE_LOG', keys: List[str] = None, pdf_max: float = None)[source]¶ Bases:
pyabc.distance.kernel.StochasticKernel
A kernel with a Poisson probability mass function.
 Parameters
keys, pdf_max (ret_scale,) –

__call__
(x: dict, x_0: dict, t: int = None, par: dict = None) → float[source]¶ Evaluate at time point t the distance of the summary statistics of the data simulated for the tentatively sampled particle to those of the observed data.
Abstract method. This method has to be overwritten by all concrete implementations.
 Parameters
x (dict) – Summary statistics of the data simulated for the tentatively sampled parameter.
x_0 (dict) – Summary statistics of the observed data.
t (int) – Time point at which to evaluate the distance. Usually, the distance will not depend on the time.
par (dict) – The parameters used to create the summary statistics x. These can be required by some distance functions. Usually, the distance will not depend on the parameters.
 Returns
distance – Quantifies the distance between the summary statistics of the data simulated for the tentatively sampled particle and of the observed data.
 Return type
float

class
pyabc.distance.
RangeEstimatorDistance
(measures_to_use='all')[source]¶ Bases:
pyabc.distance.distance.DistanceWithMeasureList
Abstract base class for distance functions which estimate is based on a range.
It defines the two template methods
lower
andupper
.Hence
\[d(x, y) = \sum_{i \in \text{measures}} \left  \frac{x_i  y_i}{u_i  l_i} \right \]where \(l_i\) and \(u_i\) are the lower and upper margin for measure \(i\).

__call__
(x: dict, x_0: dict, t: int = None, par: dict = None) → float[source]¶ Evaluate at time point t the distance of the summary statistics of the data simulated for the tentatively sampled particle to those of the observed data.
Abstract method. This method has to be overwritten by all concrete implementations.
 Parameters
x (dict) – Summary statistics of the data simulated for the tentatively sampled parameter.
x_0 (dict) – Summary statistics of the observed data.
t (int) – Time point at which to evaluate the distance. Usually, the distance will not depend on the time.
par (dict) – The parameters used to create the summary statistics x. These can be required by some distance functions. Usually, the distance will not depend on the parameters.
 Returns
distance – Quantifies the distance between the summary statistics of the data simulated for the tentatively sampled particle and of the observed data.
 Return type
float

__init__
(measures_to_use='all')[source]¶ Initialize self. See help(type(self)) for accurate signature.

get_config
()[source]¶ Return configuration of the distance.
 Returns
config – Dictionary describing the distance.
 Return type
dict

initialize
(t: int, get_all_sum_stats: Callable[], List[dict]], x_0: dict = None)[source]¶ This method is called by the ABCSMC framework before the first use of the distance (at the beginning of ABCSMC.run()), and can be used to calibrate it to the statistics of the samples.
The default is to do nothing.
 Parameters
t (int) – Time point for which to initialize the distance.
get_all_sum_stats (Callable[[], List[dict]]) – Returns on command the initial summary statistics.
x_0 (dict, optional) – The observed summary statistics.


class
pyabc.distance.
SimpleFunctionDistance
(fun)[source]¶ Bases:
pyabc.distance.base.Distance
This is a wrapper around a simple function which calculates the distance. If a function/callable is passed to the ABCSMC class, which is not subclassed from pyabc.Distance, then it is converted to an instance of the SimpleFunctionDistance class.
 Parameters
fun (Callable[[dict, dict], float]) – A Callable accepting as parameters (a subset of) the arguments of the pyabc.Distance.__call__ function. Usually at least the summary statistics x and x_0. Returns the distance between both.

__call__
(x: dict, x_0: dict, t: int = None, par: dict = None) → float[source]¶ Evaluate at time point t the distance of the summary statistics of the data simulated for the tentatively sampled particle to those of the observed data.
Abstract method. This method has to be overwritten by all concrete implementations.
 Parameters
x (dict) – Summary statistics of the data simulated for the tentatively sampled parameter.
x_0 (dict) – Summary statistics of the observed data.
t (int) – Time point at which to evaluate the distance. Usually, the distance will not depend on the time.
par (dict) – The parameters used to create the summary statistics x. These can be required by some distance functions. Usually, the distance will not depend on the parameters.
 Returns
distance – Quantifies the distance between the summary statistics of the data simulated for the tentatively sampled particle and of the observed data.
 Return type
float

class
pyabc.distance.
SimpleFunctionKernel
(fun: Callable, ret_scale: str = 'SCALE_LIN', keys: List[str] = None, pdf_max: float = None)[source]¶ Bases:
pyabc.distance.kernel.StochasticKernel
This is a wrapper around a simple function which calculates the probability density.
 Parameters
fun (Callable) – A Callable accepting __call__’s parameters. The function should be a pdf or pmf.
keys, pdf_max (ret_scale,) –

__call__
(x: dict, x_0: dict, t: int = None, par: dict = None) → float[source]¶ Evaluate at time point t the distance of the summary statistics of the data simulated for the tentatively sampled particle to those of the observed data.
Abstract method. This method has to be overwritten by all concrete implementations.
 Parameters
x (dict) – Summary statistics of the data simulated for the tentatively sampled parameter.
x_0 (dict) – Summary statistics of the observed data.
t (int) – Time point at which to evaluate the distance. Usually, the distance will not depend on the time.
par (dict) – The parameters used to create the summary statistics x. These can be required by some distance functions. Usually, the distance will not depend on the parameters.
 Returns
distance – Quantifies the distance between the summary statistics of the data simulated for the tentatively sampled particle and of the observed data.
 Return type
float

class
pyabc.distance.
StochasticKernel
(ret_scale: str = 'SCALE_LIN', keys: List[str] = None, pdf_max: float = None)[source]¶ Bases:
pyabc.distance.base.Distance
A stochastic kernel assesses the similarity between observed and simulated summary statistics or data via a probability measure.
Note
The returned value cannot be interpreted as a distance function, but rather as an inverse distance, as it increases as the similarity between observed and simulated summary statistics increases. Thus, a StochasticKernel should only be used together with a StochasticAcceptor.
 Parameters
ret_scale (str, optional (default = SCALE_LIN)) – The scale of the value returned in __call__: Given a proability density p(x,x_0), the returned value can be either of p(x,x_0), or log(p(x,x_0)).
keys (List[str], optional) – The keys of the summary statistics, specifying the order to be used.
pdf_max (float, optional) – The maximum possible probability density function value. Defaults to None and is then computed as the density at (x_0, x_0), where x_0 denotes the observed summary statistics. Must be overridden if pdf_max is to be used in the analysis by the acceptor and the default is not applicable. This value should be in the scale specified by ret_scale already.

class
pyabc.distance.
ZScoreDistance
(measures_to_use='all')[source]¶ Bases:
pyabc.distance.distance.DistanceWithMeasureList
Calculate distance as sum of ZScore over the selected measures. The measured Data is the reference for the ZScore.
Hence
\[d(x, y) = \sum_{i \in \text{measures}} \left \frac{x_iy_i}{y_i} \right\]
__call__
(x: dict, x_0: dict, t: int = None, par: dict = None) → float[source]¶ Evaluate at time point t the distance of the summary statistics of the data simulated for the tentatively sampled particle to those of the observed data.
Abstract method. This method has to be overwritten by all concrete implementations.
 Parameters
x (dict) – Summary statistics of the data simulated for the tentatively sampled parameter.
x_0 (dict) – Summary statistics of the observed data.
t (int) – Time point at which to evaluate the distance. Usually, the distance will not depend on the time.
par (dict) – The parameters used to create the summary statistics x. These can be required by some distance functions. Usually, the distance will not depend on the parameters.
 Returns
distance – Quantifies the distance between the summary statistics of the data simulated for the tentatively sampled particle and of the observed data.
 Return type
float


pyabc.distance.
combined_mean_absolute_deviation
(data, x_0, **kwargs)[source]¶ Compute the sum of the mean absolute deviations to the mean of the samples and to the observed value.

pyabc.distance.
combined_median_absolute_deviation
(data, x_0, **kwargs)[source]¶ Compute the sum of the median absolute deviations to the median of the samples and to the observed value.

pyabc.distance.
mean_absolute_deviation
(data, **kwargs)[source]¶ Calculate the mean absolute deviation from the mean.

pyabc.distance.
mean_absolute_deviation_to_observation
(data, x_0, **kwargs)[source]¶ Mean absolute deviation of data w.r.t. the observation x_0.

pyabc.distance.
median_absolute_deviation
(data, **kwargs)[source]¶ Calculate the sample median absolute deviation (MAD) from the median, defined as median(abs(data  median(data)).

pyabc.distance.
median_absolute_deviation_to_observation
(data, x_0, **kwargs)[source]¶ Median absolute deviation of data w.r.t. the observation x_0.

pyabc.distance.
root_mean_square_deviation
(data, x_0, **kwargs)[source]¶ Square root of the mean squared error, i.e. of the bias squared plus the variance.

pyabc.distance.
span
(data, **kwargs)[source]¶ Compute the difference of largest and smallest data point.

pyabc.distance.
standard_deviation
(data, **kwargs)[source]¶ Calculate the sample standard deviation (SD).
Acceptors¶
Acceptors handle the acceptance step.

class
pyabc.acceptor.
Acceptor
[source]¶ Bases:
object
The acceptor class encodes the acceptance step. This class is abstract and cannot be invoked itself.

__call__
(distance_function: pyabc.distance.base.Distance, eps: pyabc.epsilon.base.Epsilon, x: dict, x_0: dict, t: int, par: pyabc.parameters.Parameter)[source]¶ Compute distance between summary statistics and evaluate whether to accept or reject. All concrete implementations must implement this method.
 Parameters
distance_function (pyabc.Distance) – The distance function. The user is free to use or ignore this function.
eps (pyabc.Epsilon) – The acceptance thresholds. The user is free to use or ignore this object.
x (dict) – Current summary statistics to evaluate.
x_0 (dict) – The observed summary statistics.
t (int) – Time point for which to check.
par (pyabc.Parameter) – The model parameters used to simulate x.
 Returns
An AcceptorResult.
.. note:: – Currently, only one value encoding the distance is returned (and stored in the database), namely that at time t, even if also other distances affect the acceptance decision, e.g. distances from previous iterations, or in general if the distance is not scalar. This is because the last distance is likely to be most informative for the further process, and because in some parts of ABC a scalar distance value is required.

get_epsilon_config
(t: int) → dict[source]¶ Create a configuration object that contains all values of interest for the update of the Epsilon object.
 Parameters
t (int) – The timepoint for which to get the config.
 Returns
config – The relevant information.
 Return type
dict

initialize
(t: int, get_weighted_distances: Callable[], pandas.core.frame.DataFrame], distance_function: pyabc.distance.base.Distance, x_0: dict)[source]¶ Initialize. This method is called by the ABCSMC framework initially, and can be used to calibrate the acceptor to initial statistics. The default is to do nothing.
 Parameters
t (int) – The timepoint to initialize the acceptor for.
get_weighted_distances (Callable[[], pd.DataFrame]) – Returns on demand the distances for initializing the acceptor.
distance_function (Distance) – Distance object. The acceptor should not modify it, but might extract some meta information.
x_0 (dict) – The observed summary statistics.

update
(t: int, get_weighted_distances: Callable[], pandas.core.frame.DataFrame], prev_temp: float, acceptance_rate: float)[source]¶ Update the acceptance criterion.
 Parameters
t (int) – The timepoint to initialize the acceptor for.
get_weighted_distances (Callable[[], pd.DataFrame]) – The past generation’s weighted distances.
prev_temp (float) – The past generation’s temperature.
acceptance_rate (float) – The past generation’s acceptance rate.


class
pyabc.acceptor.
AcceptorResult
(distance: float, accept: bool, weight: float = 1.0)[source]¶ Bases:
dict
Result of an acceptance step.
 Parameters
distance (float) – Distance value obtained.
accept (bool) – A flag indicating the recommendation to accept or reject. More specifically: True: The distance is below the acceptance threshold. False: The distance is above the acceptance threshold.
weight (float, optional (default = 1.0)) – Weight associated with the evaluation, which may need to be taken into account via importance sampling when calculating the parameter weight. Defaults to 1.0.

class
pyabc.acceptor.
ScaledPDFNorm
(factor: float = 10, alpha: float = 0.5, min_acceptance_rate: bool = 0.1)[source]¶ Bases:
object
Finds the previously found maximum density value, but then scales it by a factor factor**T such that the probability of particles getting accepted is increased by y value of up to factor.
Some additional rules are applied to make the scheme stable. The scaling is in particular applied only after a minimum acceptance rate has been violated.
 Parameters
factor – The factor by which to effectively rescale the acceptance step’s normalization constant.
alpha – The ratio by which the subsequent temperature is assumed to be reduced relative to the current one. This is only accurate if a pyabc.ExponentialDecayFixedRatioScheme with corresponding ratio is employed.
min_acceptance_rate – The scaling is applied once the acceptance rates fall below this value for the first time.

class
pyabc.acceptor.
SimpleFunctionAcceptor
(fun: Callable)[source]¶ Bases:
pyabc.acceptor.acceptor.Acceptor
Initialize from function.
 Parameters
fun (Callable, optional) – Callable with the same signature as the __call__ method.

__call__
(distance_function, eps, x, x_0, t, par)[source]¶ Compute distance between summary statistics and evaluate whether to accept or reject. All concrete implementations must implement this method.
 Parameters
distance_function (pyabc.Distance) – The distance function. The user is free to use or ignore this function.
eps (pyabc.Epsilon) – The acceptance thresholds. The user is free to use or ignore this object.
x (dict) – Current summary statistics to evaluate.
x_0 (dict) – The observed summary statistics.
t (int) – Time point for which to check.
par (pyabc.Parameter) – The model parameters used to simulate x.
 Returns
An AcceptorResult.
.. note:: – Currently, only one value encoding the distance is returned (and stored in the database), namely that at time t, even if also other distances affect the acceptance decision, e.g. distances from previous iterations, or in general if the distance is not scalar. This is because the last distance is likely to be most informative for the further process, and because in some parts of ABC a scalar distance value is required.

class
pyabc.acceptor.
StochasticAcceptor
(pdf_norm_method: Callable = None, apply_importance_weighting: bool = True, log_file: str = None)[source]¶ Bases:
pyabc.acceptor.acceptor.Acceptor
This acceptor implements a stochastic acceptance step based on a probability density, generalizing from the uniform acceptance kernel. A particle is accepted if for the simulated summary statistics x, observed summary statistics x_0 and parameters theta holds
\[\frac{\text{pdf}(x_0x,\theta)}{c}\geq u\]where u ~ U[0,1], and c is a normalizing constant.
The concept is based on 1. In addition, we introduce acceptance kernel temperation and rejection control importance sampling to permit a more flexible choice and adaptation of c.
 1
Wilkinson, Richard David; “Approximate Bayesian computation (ABC) gives exact results under the assumption of model error”; Statistical applications in genetics and molecular biology 12.2 (2013): 129141.

__call__
(distance_function: pyabc.distance.kernel.StochasticKernel, eps, x, x_0, t, par)[source]¶ Compute distance between summary statistics and evaluate whether to accept or reject. All concrete implementations must implement this method.
 Parameters
distance_function (pyabc.Distance) – The distance function. The user is free to use or ignore this function.
eps (pyabc.Epsilon) – The acceptance thresholds. The user is free to use or ignore this object.
x (dict) – Current summary statistics to evaluate.
x_0 (dict) – The observed summary statistics.
t (int) – Time point for which to check.
par (pyabc.Parameter) – The model parameters used to simulate x.
 Returns
An AcceptorResult.
.. note:: – Currently, only one value encoding the distance is returned (and stored in the database), namely that at time t, even if also other distances affect the acceptance decision, e.g. distances from previous iterations, or in general if the distance is not scalar. This is because the last distance is likely to be most informative for the further process, and because in some parts of ABC a scalar distance value is required.

__init__
(pdf_norm_method: Callable = None, apply_importance_weighting: bool = True, log_file: str = None)[source]¶  Parameters
pdf_norm_method (Callable, optional) – Function to calculate a pdf normalization (denoted c above). Shipped are pyabc.acceptor.pdf_norm_from_kernel to use the value provided by the StochasticKernel, and pyabc.acceptor.pdf_norm_max_found (default) to always use the maximum value among accepted particles so far. Note that reweighting based on ideas from rejection control importance sampling to handle the normalization constant being insufficient, and thus avoiding an importance sampling bias, is included either way.
apply_importance_weighting (bool, optional) – Whether to apply weights to correct for a bias induced by samples exceeding the density normalization. This may be False usually only for testing purposes.
log_file (str, optional) – A log file for storing data of the acceptor that are currently not saved in the database. The data are saved in json format and can be retrieved via pyabc.storage.load_dict_from_json.

initialize
(t: int, get_weighted_distances: Callable[], pandas.core.frame.DataFrame], distance_function: pyabc.distance.kernel.StochasticKernel, x_0: dict)[source]¶ Initialize temperature and maximum pdf.

update
(t: int, get_weighted_distances: Callable[], pandas.core.frame.DataFrame], prev_temp: float, acceptance_rate: float)[source]¶ Update the acceptance criterion.
 Parameters
t (int) – The timepoint to initialize the acceptor for.
get_weighted_distances (Callable[[], pd.DataFrame]) – The past generation’s weighted distances.
prev_temp (float) – The past generation’s temperature.
acceptance_rate (float) – The past generation’s acceptance rate.

class
pyabc.acceptor.
UniformAcceptor
(use_complete_history: bool = False)[source]¶ Bases:
pyabc.acceptor.acceptor.Acceptor
Base acceptance on the distance function and a uniform error distribution between eps and +eps. This is the most common acceptance criterion in ABC.

__call__
(distance_function, eps, x, x_0, t, par)[source]¶ Compute distance between summary statistics and evaluate whether to accept or reject. All concrete implementations must implement this method.
 Parameters
distance_function (pyabc.Distance) – The distance function. The user is free to use or ignore this function.
eps (pyabc.Epsilon) – The acceptance thresholds. The user is free to use or ignore this object.
x (dict) – Current summary statistics to evaluate.
x_0 (dict) – The observed summary statistics.
t (int) – Time point for which to check.
par (pyabc.Parameter) – The model parameters used to simulate x.
 Returns
An AcceptorResult.
.. note:: – Currently, only one value encoding the distance is returned (and stored in the database), namely that at time t, even if also other distances affect the acceptance decision, e.g. distances from previous iterations, or in general if the distance is not scalar. This is because the last distance is likely to be most informative for the further process, and because in some parts of ABC a scalar distance value is required.

__init__
(use_complete_history: bool = False)[source]¶  Parameters
use_complete_history (bool, optional) – Whether to compare to all previous distances and epsilons, or use only the current distance time (default). This can be of interest with adaptive distances, in order to guarantee nested acceptance regions.

Models¶
A model defines how input parameters relate to output simulated data.

class
pyabc.model.
IntegratedModel
(name: str = 'Model')[source]¶ Bases:
pyabc.model.Model
A model class which integrates simulation, distance calculation and rejection/acceptance.
This can bring performance improvements if the user can calculate the distance function on the fly during model simulation and interrupt the simulation if the current acceptance threshold cannot be satisfied anymore.
Subclass this model and implement
integrated_simulate
to define your own integrated model..
accept
(t: int, pars: pyabc.parameters.Parameter, sum_stats_calculator: Callable, distance_calculator: pyabc.distance.base.Distance, eps_calculator: pyabc.epsilon.base.Epsilon, acceptor: pyabc.acceptor.acceptor.Acceptor, x_0: dict)[source]¶ Sample, calculate summary statistics, calculate distance, and then accept or not accept a parameter.
Called from within ABCSMC in each iteration to evaluate a parameter.
 Parameters
t (int) – Current time point.
pars (Parameter) – The model parameters.
sum_stats_calculator (Callable) – A function which calculates summary statistics. The user is free to use or ignore this function.
distance_calculator (pyabc.Distance) – The distance function. The user is free to use or ignore this function.
eps_calculator (pyabc.Epsilon) – The acceptance thresholds.
acceptor (pyabc.Acceptor) – The acceptor judging whether to accept, based on distance and epsilon.
x_0 (dict) – The observed summary statistics.
 Returns
model_result – The result with filled accepted field.
 Return type

integrated_simulate
(pars: pyabc.parameters.Parameter, eps: float) → pyabc.model.ModelResult[source]¶ Method which integrates simulation and acceptance/rejection in a single method.
 Parameters
pars (Parameter) – Parameters at which to evaluate the model
eps (float) – Current acceptance threshold. If required, it is effortlessly possible to instead use the entire epsilon_calculator object passed to accept().
 Returns
model_result – In case the parameter evaluation is rejected, this method should simply return
ModelResult(accepted=False)
. If the parameter was accepted, this method should return eitherModelResult(accepted=True, distance=distance)
orModelResult(accepted=True, distance=distance, sum_stats=sum_stats)
in whichdistance
denotes the achieved distance andsum_stats
the summary statistics (e.g. simulated data) of the run. Note that providing the summary statistics is optional. If they are provided, then they are also logged in the database. Return type


class
pyabc.model.
Model
(name: str = 'Model')[source]¶ Bases:
object
General model. This is the most flexible model class, but also the most complicated one to use. This is an abstract class and not functional on its own. Derive concrete subclasses for actual usage.
The individual steps
sample
summary_statistics
distance
accept
can be overwritten.
To use this class, at least the sample method has to be overriden.
Note
Most likely you do not want to use this class directly, but the
SimpleModel
instead, or even just pass a plain function as model. Parameters
name (str, optional (default = "model")) – A descriptive name of the model. This name can simplify further analysis for the user as it is stored in the database.

__init__
(name: str = 'Model')[source]¶ Initialize self. See help(type(self)) for accurate signature.

accept
(t: int, pars: pyabc.parameters.Parameter, sum_stats_calculator: Callable, distance_calculator: pyabc.distance.base.Distance, eps_calculator: pyabc.epsilon.base.Epsilon, acceptor: pyabc.acceptor.acceptor.Acceptor, x_0: dict)[source]¶ Sample, calculate summary statistics, calculate distance, and then accept or not accept a parameter.
Called from within ABCSMC in each iteration to evaluate a parameter.
 Parameters
t (int) – Current time point.
pars (Parameter) – The model parameters.
sum_stats_calculator (Callable) – A function which calculates summary statistics. The user is free to use or ignore this function.
distance_calculator (pyabc.Distance) – The distance function. The user is free to use or ignore this function.
eps_calculator (pyabc.Epsilon) – The acceptance thresholds.
acceptor (pyabc.Acceptor) – The acceptor judging whether to accept, based on distance and epsilon.
x_0 (dict) – The observed summary statistics.
 Returns
model_result – The result with filled accepted field.
 Return type

distance
(t: int, pars: pyabc.parameters.Parameter, sum_stats_calculator: Callable, distance_calculator: pyabc.distance.base.Distance, x_0: dict) → pyabc.model.ModelResult[source]¶ Sample, calculate summary statistics, and then calculate the distance.
Not required in the current implementation.
 Parameters
t (int) – Current time point.
pars (Parameter) – Model parameters.
sum_stats_calculator (Callable) – A function which calculates summary statistics, as passed to
pyabc.smc.ABCSMC
. The user is free to use or ignore this function.distance_calculator (Callable) – A function which calculates the distance, as passed to
pyabc.smc.ABCSMC
. The user is free to use or ignore this function.x_0 (dict) – Observed summary statistics.
 Returns
model_result – The result with filled distance.
 Return type

sample
(pars: pyabc.parameters.Parameter)[source]¶ Return a sample from the model evaluated at parameters
pars
. This can be raw data, or already summarized statistics thereof.This method has to be implemented by any subclass.
 Parameters
pars (Parameter) – Dictionary of parameters.
 Returns
sample – The sampled data.
 Return type
any

summary_statistics
(t: int, pars: pyabc.parameters.Parameter, sum_stats_calculator: Callable) → pyabc.model.ModelResult[source]¶ Sample, and then calculate the summary statistics.
Called from within ABCSMC during the initialization process.
 Parameters
t (int) – Current time point.
pars (Parameter) – Model parameters.
sum_stats_calculator (Callable) – A function which calculates summary statistics, as passed to
pyabc.smc.ABCSMC
. The user is free to use or ignore this function.
 Returns
model_result – The result with filled summary statistics.
 Return type

class
pyabc.model.
ModelResult
(sum_stats: dict = None, distance: float = None, accepted: bool = None, weight: float = 1.0)[source]¶ Bases:
object
Result of a model evaluation. Allows to flexibly return summary statistics, distances and accepted/rejected.

class
pyabc.model.
SimpleModel
(sample_function: Callable[[pyabc.parameters.Parameter], Any], name: str = None)[source]¶ Bases:
pyabc.model.Model
A model which is initialized with a function which generates the samples. For most cases this class will be adequate. Note that you can also pass a plain function to the ABCSMC class, which then gets automatically converted to a SimpleModel.
 Parameters
sample_function (Callable[[Parameter], Any]) – Returns the sample to be passed to the summary statistics method. This function as a single argument which is a Parameter.
name (str. optional) – The name of the model. If not provided, the names if inferred from the function name of sample_function.

__init__
(sample_function: Callable[[pyabc.parameters.Parameter], Any], name: str = None)[source]¶ Initialize self. See help(type(self)) for accurate signature.

static
assert_model
(model_or_function)[source]¶ Alternative constructor. Accepts either a Model instance or a function and returns always a Model instance.
 Parameters
model_or_function (Model, function) – Constructs a SimpleModel instance if a function is passed. If a Model instance is passed, the Model instance itself is returned.
 Returns
model
 Return type
SimpleModel or Model

sample
(pars: pyabc.parameters.Parameter)[source]¶ Return a sample from the model evaluated at parameters
pars
. This can be raw data, or already summarized statistics thereof.This method has to be implemented by any subclass.
 Parameters
pars (Parameter) – Dictionary of parameters.
 Returns
sample – The sampled data.
 Return type
any
Epsilons¶
Epsilon threshold updating strategies.
Acceptance thresholds (= epsilon) can be calculated based on the distances from the observed data, can follow a predefined list, can be constant, or can have a userdefined implementation.

class
pyabc.epsilon.
AcceptanceRateScheme
(target_rate: float = 0.3, min_rate: float = None)[source]¶ Bases:
pyabc.epsilon.temperature.TemperatureScheme
Try to keep the acceptance rate constant at a value of target_rate. Note that this scheme will fail to reduce the temperature sufficiently in later iterations, if the problem’s inherent acceptance rate is lower, but it has been observed to give big feasible temperature leaps in early iterations. In particular, this scheme can be used to propose an initial temperature.
 Parameters
target_rate (float, optional) – The target acceptance rate to match.
min_rate (float, optional) – The minimum rate below which not to apply the acceptance step scheme any more. Setting this to a value of e.g. 0.05 can make sense 1) because it may be unlikely that the acceptance rate scheme will propose a useful temperature at such low acceptance levels, and 2) to avoid uneccessary computations.

__call__
(t: int, get_weighted_distances: Callable[], pandas.core.frame.DataFrame], get_all_records: Callable[], List[dict]], max_nr_populations: int, pdf_norm: float, kernel_scale: str, prev_temperature: float, acceptance_rate: float)[source]¶ Call self as a function.

__init__
(target_rate: float = 0.3, min_rate: float = None)[source]¶ Initialize self. See help(type(self)) for accurate signature.

configure_sampler
(sampler: pyabc.sampler.base.Sampler)[source]¶ Modify the sampler. As in, and redirected from,
pyabc.epsilon.Temperature.configure_sampler()
.

class
pyabc.epsilon.
ConstantEpsilon
(constant_epsilon_value: float)[source]¶ Bases:
pyabc.epsilon.base.Epsilon
Keep epsilon constant over all populations. This acceptance threshold scheduling strategy is most likely only interesting for debugging purposes.
 Parameters
constant_epsilon_value (float) – The epsilon value for all populations

class
pyabc.epsilon.
DalyScheme
(alpha: float = 0.5, min_rate: float = 0.0001)[source]¶ Bases:
pyabc.epsilon.temperature.TemperatureScheme
This scheme is loosely based on 1, however note that it does not try to replicate it entirely. In particular, the implementation of pyABC does not allow the sampling to be stopped when encountering too low acceptance rates, such that this can only be done exposteriori here.
 Parameters
alpha (float, optional) – The ratio by which to decrease the temperature value. More specifically, the next temperature is given as (1alpha) * temperature.
min_rate (float, optional) – A minimum acceptance rate. If this rate has been violated in the previous iteration, the alpha value is decreased.
 1
Daly Aidan C., Cooper Jonathan, Gavaghan David J., and Holmes Chris. “Comparing two sequential Monte Carlo samplers for exact and approximate Bayesian inference on biological models”. Journal of The Royal Society Interface, 2017.

class
pyabc.epsilon.
Epsilon
[source]¶ Bases:
abc.ABC
Abstract epsilon base class.
This class encapsulates a strategy for setting a new epsilon for each new population.

abstract
__call__
(t: int) → float[source]¶ Get epsilon value for generation t.
 Parameters
t (int) – The time point to get the epsilon threshold for.
 Returns
eps – The epsilon for population t.
 Return type
float

configure_sampler
(sampler: pyabc.sampler.base.Sampler)[source]¶ This is called by the ABCSMC class and gives the epsilon the opportunity to configure the sampler. For example, it might request the sampler to also return rejected particles in order to adapt the epsilon to the statistics of the sample. The method is called by the ABCSMC framework before the first use of the epsilon (at the beginning of ABCSMC.run()), after initialize().
The default is to do nothing.
 Parameters
sampler (Sampler) – The sampler used in ABCSMC.

get_config
()[source]¶ Return configuration of the distance function.
 Returns
config – Dictionary describing the distance function.
 Return type
dict

initialize
(t: int, get_weighted_distances: Callable[], pandas.core.frame.DataFrame], get_all_records: Callable[], List[dict]], max_nr_populations: int, acceptor_config: dict)[source]¶ This method is called by the ABCSMC framework before the first usage of the epsilon and can be used to calibrate it to the statistics of the samples.
Default: Do nothing.
 Parameters
t (int) – The time point to initialize the epsilon for.
get_weighted_distances (Callable[[], pd.DataFrame]) – Returns on demand the distances for initializing the epsilon.
get_all_records (Callable[[], List[dict]]) – Returns on demand a list of information obtained from all particles sampled in the previous iteration.
max_nr_populations (int) – The maximum number of populations.
acceptor_config (dict) – An object provided by the Acceptor class.

to_json
()[source]¶ Return JSON encoded configuration of the distance function.
 Returns
json_str – JSON encoded string describing the distance function. The default implementation is to try to convert the dictionary returned my
get_config
. Return type
str

update
(t: int, get_weighted_distances: Callable[], pandas.core.frame.DataFrame], get_all_records: Callable[], List[dict]], acceptance_rate: float, acceptor_config: dict)[source]¶ Update epsilon value to be used as acceptance criterion for generation t.
Default: Do nothing.
 Parameters
t (int) – The generation index to update / set epsilon for. Counting is zerobased. So the first population has t=0.
get_weighted_distances (Callable[[], pd.DataFrame]) – The distances that should be used to update epsilon, as returned by Population.get_weighted_distances(). These are usually the distances of samples accepted in population t1. The distances may differ from those used for acceptance in population t1, if the distance function for population t has been updated.
get_all_records (Callable[[], List[dict]]) – Returns on demand a list of information obtained from all particles.
acceptance_rate (float) – The current generation’s acceptance rate.
acceptor_config (dict) – An object provided by the Acceptor class.

abstract

class
pyabc.epsilon.
EssScheme
(target_relative_ess: float = 0.8)[source]¶ Bases:
pyabc.epsilon.temperature.TemperatureScheme
Try to keep the effective sample size (ESS) constant.
 Parameters
target_relative_ess (float) – Targe relative effective sample size.

class
pyabc.epsilon.
ExpDecayFixedIterScheme
[source]¶ Bases:
pyabc.epsilon.temperature.TemperatureScheme
The next temperature is set as
\[T_j = T_{max}^{(nj)/n}\]where n denotes the number of populations, and j=1,…,n the iteration. This translates to
\[T_j = T_{j1}^{(nj)/(n(j1))}.\]This ensures that a temperature of 1.0 is reached after exactly the remaining number of steps.
So, in both cases the sequence of temperatures follows an exponential decay, also known as a geometric progression, or a linear progression in logspace.
Note that the formula is applied anew in each iteration. This is advantageous if also other schemes are used s.t. T_{j1} is smaller than by the above.
 Parameters
alpha (float) – Factor by which to reduce the temperature, if max_nr_populations is infinite.

class
pyabc.epsilon.
ExpDecayFixedRatioScheme
(alpha: float = 0.5, min_rate: float = 0.0001, max_rate: float = 0.5)[source]¶ Bases:
pyabc.epsilon.temperature.TemperatureScheme
The next temperature is chosen as
\[T_j = \alpha \cdot T_{j1}.\]Like the
pyabc.epsilon.ExpDecayFixedIterScheme
, this yields a geometric progression, however with a fixed ratio, irrespective of the number of iterations. If a finite number of iterations is specified in ABCSMC, there is no influence on the final jump to a temperature of 1.0.This is quite similar to the
pyabc.epsilon.DalyScheme
, although simpler in implementation. The alpha value here corresponds to a value of 1  alpha there. Parameters
alpha (float, optional) – The ratio of subsequent temperatures.
min_rate (float, optional) – A minimum acceptance rate. If this rate has been violated in the previous iteration, the alpha value is increased.
max_rate (float, optional) – Maximum rate to not be exceeded, otherwise the alpha value is decreased.

class
pyabc.epsilon.
FrielPettittScheme
[source]¶ Bases:
pyabc.epsilon.temperature.TemperatureScheme
Basically takes linear steps in logspace. See 2.
 2
Vyshemirsky, Vladislav, and Mark A. Girolami. “Bayesian ranking of biochemical system models.” Bioinformatics 24.6 (2007): 833839.

class
pyabc.epsilon.
ListEpsilon
(values: List[float])[source]¶ Bases:
pyabc.epsilon.base.Epsilon
Return epsilon values from a predefined list. For every time point enquired later, an epsilon value must exist in the list.
 Parameters
values (List[float]) – List of epsilon values.
values[t]
is the value for population t.

class
pyabc.epsilon.
ListTemperature
(values: List[float])[source]¶ Bases:
pyabc.epsilon.temperature.TemperatureBase
Pass a list of temperature values to use successively.
 Parameters
values – The array of temperatures to use successively. For exact inference, finish with 1.

class
pyabc.epsilon.
MedianEpsilon
(initial_epsilon: Union[str, int, float] = 'from_sample', median_multiplier: float = 1, weighted: bool = True)[source]¶ Bases:
pyabc.epsilon.epsilon.QuantileEpsilon
Calculate epsilon as median of the distances from the last population.

class
pyabc.epsilon.
NoEpsilon
[source]¶ Bases:
pyabc.epsilon.base.Epsilon
Implements a kind of null object as epsilon.
This can be used as a dummy epsilon when the Acceptor integrates the acceptance threshold.

class
pyabc.epsilon.
PolynomialDecayFixedIterScheme
(exponent: float = 3)[source]¶ Bases:
pyabc.epsilon.temperature.TemperatureScheme
Compute next temperature as prelast entry in
>>> np.linspace(1, (temp_base)**(1 / temp_decay_exponent), >>> t_to_go + 1) ** temp_decay_exponent)
Requires finite max_nr_populations.
Note that this is similar to the
pyabc.epsilon.ExpDecayFixedIterScheme
, which is indeed the limit for exponent > infinity. For smaller exponent, the sequence makes larger steps for low temperatures. This can be useful in cases, where lower temperatures (which are usually more expensive) can be traversed in few larger steps, however also the opposite may be true, i.e. that more steps at low temperatures are advantageous. Parameters
exponent (float, optional) – The exponent to use in the scheme.

class
pyabc.epsilon.
QuantileEpsilon
(initial_epsilon: Union[str, int, float] = 'from_sample', alpha: float = 0.5, quantile_multiplier: float = 1, weighted: bool = True)[source]¶ Bases:
pyabc.epsilon.base.Epsilon
Calculate epsilon as alphaquantile of the distances from the last population.
This strategy works even if the posterior is multimodal. Note that the acceptance threshold calculation is based on the distance to the observation, not on the parameters which generated data with that distance.
If completely different parameter sets produce equally good samples, the distances of their samples to the ground truth data should be comparable.
The idea behind weighting is that the probability p_k of obtaining a distance eps_k in the next generation should be proportional to the weight w_k of respective particle k in the current generation. Both weighted and nonweighted median should lead to correct results.
 Parameters
initial_epsilon (Union[str, int]) –
If ‘from_sample’, then the initial quantile is calculated from a sample of the current population size from the prior distribution.
If a number is given, this number is used.
alpha (float) – The alphaquantile to be used, e.g. alpha=0.5 means median.
quantile_multiplier (float) – Multiplies the quantile by that number. also applies it to the initial quantile if it is calculated from samples. However, it does not apply to the initial quantile if it is given as a number.
weighted (bool) – Flag indicating whether the new epsilon should be computed using weighted (True, default) or nonweighted (False) distances.

__call__
(t: int) → float[source]¶ Epsilon value for time t, set before via update() method.
 Returns
eps – The epsilon value for time t (throws error if not existent).
 Return type
float

__init__
(initial_epsilon: Union[str, int, float] = 'from_sample', alpha: float = 0.5, quantile_multiplier: float = 1, weighted: bool = True)[source]¶ Constructor.

get_config
()[source]¶ Return configuration of the distance function.
 Returns
config – Dictionary describing the distance function.
 Return type
dict

initialize
(t: int, get_weighted_distances: Callable[], pandas.core.frame.DataFrame], get_all_records: Callable[], List[dict]], max_nr_populations: int, acceptor_config: dict)[source]¶ This method is called by the ABCSMC framework before the first usage of the epsilon and can be used to calibrate it to the statistics of the samples.
Default: Do nothing.
 Parameters
t (int) – The time point to initialize the epsilon for.
get_weighted_distances (Callable[[], pd.DataFrame]) – Returns on demand the distances for initializing the epsilon.
get_all_records (Callable[[], List[dict]]) – Returns on demand a list of information obtained from all particles sampled in the previous iteration.
max_nr_populations (int) – The maximum number of populations.
acceptor_config (dict) – An object provided by the Acceptor class.

class
pyabc.epsilon.
Temperature
(schemes: Union[Callable, List[Callable]] = None, aggregate_fun: Callable[[List[float]], float] = None, initial_temperature: float = None, enforce_exact_final_temperature: bool = True, log_file: str = None)[source]¶ Bases:
pyabc.epsilon.temperature.TemperatureBase
This class implements a highly adaptive and configurable temperature scheme. Via the argument schemes, arbitrary temperature schemes can be passed to calculate the next generation’s temperature, via aggregate_fun one can define how to combine multiple guesses, via initial_temperature the initial temperature can be set.
 Parameters
schemes (Union[Callable, List[Callable]], optional) – Temperature schemes returning proposed temperatures for the next time point, e.g. instances of
pyabc.epsilon.TemperatureScheme
.aggregate_fun (Callable[List[float], float], optional) – The function to aggregate the schemes by, of the form
Callable[List[float], float]
. Defaults to taking the minimum.initial_temperature (float, optional) – The initial temperature. If None provided, an AcceptanceRateScheme is used.
enforce_exact_final_temperature (bool, optional) – Whether to force the final temperature (if max_nr_populations < inf) to be 1.0, giving exact inference.
log_file (str, optional) – A log file for storing data of the temperature that are currently not saved in the database. The data are saved in json format.
Properties –
 –
max_nr_populations (int) – The maximum number of iterations as passed to ABCSMC. May be inf, but not all schemes can handle that (and will complain).
temperatures (Dict[int, float]) – Times as keys and temperatures as values.

__call__
(t: int) → float[source]¶ Get epsilon value for generation t.
 Parameters
t (int) – The time point to get the epsilon threshold for.
 Returns
eps – The epsilon for population t.
 Return type
float

__init__
(schemes: Union[Callable, List[Callable]] = None, aggregate_fun: Callable[[List[float]], float] = None, initial_temperature: float = None, enforce_exact_final_temperature: bool = True, log_file: str = None)[source]¶ Constructor.

configure_sampler
(sampler: pyabc.sampler.base.Sampler)[source]¶ This is called by the ABCSMC class and gives the epsilon the opportunity to configure the sampler. For example, it might request the sampler to also return rejected particles in order to adapt the epsilon to the statistics of the sample. The method is called by the ABCSMC framework before the first use of the epsilon (at the beginning of ABCSMC.run()), after initialize().
The default is to do nothing.
 Parameters
sampler (Sampler) – The sampler used in ABCSMC.

initialize
(t: int, get_weighted_distances: Callable[], pandas.core.frame.DataFrame], get_all_records: Callable[], List[dict]], max_nr_populations: int, acceptor_config: dict)[source]¶ This method is called by the ABCSMC framework before the first usage of the epsilon and can be used to calibrate it to the statistics of the samples.
Default: Do nothing.
 Parameters
t (int) – The time point to initialize the epsilon for.
get_weighted_distances (Callable[[], pd.DataFrame]) – Returns on demand the distances for initializing the epsilon.
get_all_records (Callable[[], List[dict]]) – Returns on demand a list of information obtained from all particles sampled in the previous iteration.
max_nr_populations (int) – The maximum number of populations.
acceptor_config (dict) – An object provided by the Acceptor class.

update
(t: int, get_weighted_distances: Callable[], pandas.core.frame.DataFrame], get_all_records: Callable[], List[dict]], acceptance_rate: float, acceptor_config: dict)[source]¶ Update epsilon value to be used as acceptance criterion for generation t.
Default: Do nothing.
 Parameters
t (int) – The generation index to update / set epsilon for. Counting is zerobased. So the first population has t=0.
get_weighted_distances (Callable[[], pd.DataFrame]) – The distances that should be used to update epsilon, as returned by Population.get_weighted_distances(). These are usually the distances of samples accepted in population t1. The distances may differ from those used for acceptance in population t1, if the distance function for population t has been updated.
get_all_records (Callable[[], List[dict]]) – Returns on demand a list of information obtained from all particles.
acceptance_rate (float) – The current generation’s acceptance rate.
acceptor_config (dict) – An object provided by the Acceptor class.

class
pyabc.epsilon.
TemperatureBase
[source]¶ Bases:
pyabc.epsilon.base.Epsilon
A temperature scheme handles the decrease of the temperatures employed by a
pyabc.acceptor.StochasticAcceptor
over time.This class is not functional on its own, its derivatives must be used.

class
pyabc.epsilon.
TemperatureScheme
[source]¶ Bases:
object
A TemperatureScheme suggests the next temperature value. It is used as one of potentially multiple schemes employed in the Temperature class. This class is abstract.
 Parameters
t – The time to compute for.
get_weighted_distances – Callable to obtain the weights and kernel values to be used for the scheme.
get_all_records – Callable returning a List[dict] of all recorded particles.
max_nr_populations – The maximum number of populations that are supposed to be taken.
pdf_norm – The normalization constant c that will be used in the acceptance step.
kernel_scale – Scale on which the pdf values are (linear or logarithmic).
prev_temperature – The temperature that was used last time (or None if not applicable).
acceptance_rate – The recently obtained rate.

__call__
(t: int, get_weighted_distances: Callable[], pandas.core.frame.DataFrame], get_all_records: Callable[], List[dict]], max_nr_populations: int, pdf_norm: float, kernel_scale: str, prev_temperature: float, acceptance_rate: float)[source]¶ Call self as a function.

configure_sampler
(sampler: pyabc.sampler.base.Sampler)[source]¶ Modify the sampler. As in, and redirected from,
pyabc.epsilon.Temperature.configure_sampler()
.
Data store¶
Purpose of the data store¶
The most important class here is the History class. The History class is the interface to the database in which pyABC stores and logs information during the ABCSMC run, but also the interface which allows you to query that information later on.
Initializing the database interface from a file¶
For querying, you initialize a History object with a valid SQLAlchemy database identifier. For example, if your ABCSMC data is stored in a file “data.db”, you initialize the History with:
history = History("sqlite:///data.db")
Don’t mind the three slashes. This is SQLAlchemy syntax.
If more than one ABCSMC run is stored in your database file, these runs will have ids. The first run has id=1, the second run id=2, and so on. Per default, the first run found in the database is automatically selected. To select a specific run n (e.g. n=3), do
history.id = n
Querying the database¶
The History class has a number of methods which are relevant for querying the stored data. The most important ones are:
History.get_distribution
to retrieve information on the parameter posteriors,History.get_model_probabilities
to retrieve information on the model probabilities in case you’re doing model selection,History.get_all_populations
, to retrieve information on the evolution of the acceptance threshold and the number of sample attempts per population,History.get_nr_particles_per_population
, to retrieve the number of particles per population (this number os not necessariliy constant),History.get_weighted_distances
, to retrieve the distances the parameter samples achieved,History.n_populations
to get the total number of populations, andHistory.total_nr_simulations
to get the total number of simulations, i.e. sample attempts.
Use get_distribution
to retrieve your posterior particle population. For
example,
df, w = history.get_distribution(m)
will return a DataFrame df of parameters and an array w of weights of the particles of model m in the last available population. If you’re interested in intermediate populations, add the optional t parameter, which indicates the population number (the first population is t=0)
df, w = history.get_distribution(m, t)
What can be stored as summary statistics¶
Currently, integers, floats, strings, and in general everything that can be converted to a numpy array, can be stored. In addition, it is also possible to store pandas DataFrames.
Warning
Storage of pandas DataFrames is considered experimental at this point.

class
pyabc.storage.
History
(db: str, stores_sum_stats: bool = True, _id: int = None, create: bool = True)[source]¶ Bases:
object
History for ABCSMC.
This class records the evolution of the populations and stores the ABCSMC results.

db
¶ SQLalchemy database identifier. For a relative path use the template “sqlite:///file.db”, for an absolute path “sqlite:////path/to/file.db”, and for an inmemory database “sqlite://”.
 Type
str

stores_sum_stats
¶ Whether to store summary statistics to the database. Note: this is True by default, and should be set to False only for testing purposes (i.e. to speed up the writing to the file system), as it can not be guaranteed that all methods of pyabc work correctly if the summary statistics are not stored.
 Type
bool, optional (default = True)

id
¶ The id of the ABCSMC analysis that is currently in use. If there are analyses in the database already, this defaults to the latest id. Manually set if another run is wanted.
 Type
int

__init__
(db: str, stores_sum_stats: bool = True, _id: int = None, create: bool = True)[source]¶ Initialize history object.
 Parameters
create – If False, an error is thrown if the database does not exist.

alive_models
(*args, **kwargs) → List[source]¶ Get the models which are still alive at time t.
 Parameters
t (int, optional (default = self.max_t)) – Population index.
 Returns
alive – A list which contains the indices of those models which are still alive.
 Return type
List

append_population
(t: int, current_epsilon: float, population: pyabc.storage.db_model.Population, nr_simulations: int, model_names)[source]¶ Append population to database.
 Parameters
t (int) – Population number.
current_epsilon (float) – Current epsilon value.
population (Population) – List of sampled particles.
nr_simulations (int) – The number of model evaluations for this population.
model_names (list) – The model names.
Note. This function is called by the
pyabc.ABCSMC
class internally. You should most likely not find it necessary to call this method under normal circumstances.

property
db_size
¶ Size of the database.
 Returns
db_size – Size of the SQLite database in MB. Currently this only works for SQLite databases.
Returns an error string if the DB size cannot be calculated.
 Return type
int, str

done
(*args, **kwargs)[source]¶ Close database sessions and store end time of population.
Note. This function is called by the
pyabc.ABCSMC
class internally. You should most likely not find it necessary to call this method under normal circumstances.

get_all_populations
(*args, **kwargs)[source]¶ Returns a pandas DataFrame with columns
t: Population number
population_end_time: The end time of the population
 samples: The number of sample attempts performed
for a population
epsilon: The acceptance threshold for the population.
 Returns
all_populations – DataFrame with population info
 Return type
pd.DataFrame

get_distribution
(*args, **kwargs) > (<class 'pandas.core.frame.DataFrame'>, <class 'numpy.ndarray'>)[source]¶ Returns the weighted population sample for model m and timepoint t as a pandas DataFrame.
 Parameters
m (int, optional (default = 0)) – Model index.
t (int, optional (default = self.max_t)) – Population index. If t is not specified, then the last population is returned.
 Returns
df, w –
df: a DataFrame of parameters
w: are the weights associated with each parameter
 Return type
pandas.DataFrame, np.ndarray

get_ground_truth_parameter
(*args, **kwargs) → pyabc.parameters.Parameter[source]¶ Create a pyabc.Parameter object from the ground truth parameters saved in the database, if existent.
 Returns
 Return type
A PyParameter dictionary.

get_model_probabilities
(*args, **kwargs) → pandas.core.frame.DataFrame[source]¶ Model probabilities.
 Parameters
t (int or None (default = None)) – Population index. If None, all populations of indices >= 0 are considered.
 Returns
probabilities – Model probabilities.
 Return type
np.ndarray

get_nr_particles_per_population
(*args, **kwargs) → pandas.core.series.Series[source]¶ Get the number of particles per population.
 Returns
nr_particles_per_population – A pandas DataFrame containing the number of particles for each population.
 Return type
pd.DataFrame

get_population
(*args, **kwargs)[source]¶ Create a pyabc.Population object containing all particles, as far as those can be recreated from the database. In particular, rejected particles are currently not stored.
 Parameters
t (int, optional (default = self.max_t)) – The population index.

get_population_extended
(*args, **kwargs) → pandas.core.frame.DataFrame[source]¶ Get extended population information, including parameters, distances, summary statistics, weights and more.
 Parameters
m (int or None, optional (default = None)) – The model to query. If omitted, all models are returned.
t (int or str, optional (default = "last")) – Can be “last” or “all”, or a population index (i.e. an int). In case of “all”, all populations are returned. If “last”, only the last population is returned, for an int value only the corresponding population at that time index.
tidy (bool, optional) – If True, try to return a tidy DataFrame, where the individual parameters and summary statistics are pivoted. Setting tidy to true will only work for a single model and a single population.
 Returns
full_population
 Return type
DataFrame

get_population_strategy
(*args, **kwargs)[source]¶  Returns
The population strategy.
 Return type
population_strategy

get_weighted_distances
(*args, **kwargs) → pandas.core.frame.DataFrame[source]¶ Population’s weighted distances to the measured sample. These weights do not necessarily sum up to 1. In case more than one simulation per parameter is performed and accepted the sum might be larger.
 Parameters
t (int, optional (default = self.max_t)) – Population index. If t is None, the last population is selected.
 Returns
df_weighted – Weighted distances. The dataframe has column “w” for the weights and column “distance” for the distances.
 Return type
pd.DataFrame

get_weighted_sum_stats
(*args, **kwargs)[source]¶ Population’s weighted summary statistics. These weights do not necessarily sum up to 1. In case more than one simulation per parameter is performed and accepted, the sum might be larger.
 Parameters
t (int, optional (default = self.max_t)) – Population index. If t is None, the latest population is selected.
 Returns
(weights, sum_stats) – In the same order in the first array the weights (multiplied by the model probabilities), and tin the second array the summary statistics.
 Return type
(List[float], List[dict])

get_weighted_sum_stats_for_model
(*args, **kwargs) > (<class 'numpy.ndarray'>, typing.List)[source]¶ Summary statistics for model m. The weights sum to 1, unless there were multiple acceptances per particle.
 Parameters
m (int, optional (default = 0)) – Model index.
t (int, optional (default = self.max_t)) – Population index.
 Returns
w, sum_stats –
w: the weights associated with the summary statistics
sum_stats: list of summary statistics
 Return type
np.ndarray, list

property
max_t
¶ The population number of the last populations. This is equivalent to
n_populations  1
.

model_names
(*args, **kwargs)[source]¶ Get the names of alive models for population t.
 Parameters
t (int, optional (default = 1)) – Population index.
 Returns
model_names – List of model names.
 Return type
List[str]

property
n_populations
¶ Number of populations stored in the database. This is equivalent to
max_t + 1
.

nr_of_models_alive
(t: int = None) → int[source]¶ Number of models still alive.
 Parameters
t (int, optional (default = self.max_t)) – Population index.
 Returns
nr_alive – Number of models still alive. None is for the last population
 Return type
int >= 0 or None

observed_sum_stat
(*args, **kwargs)[source]¶ Get the observed summary statistics.
 Returns
sum_stats_dct – The observed summary statistics.
 Return type
dict

store_initial_data
(*args, **kwargs)[source]¶ Store the initial configuration data.
 Parameters
ground_truth_model (int) – Index of the ground truth model.
options (dict) – Of ABC metadata.
observed_summary_statistics (dict) – The measured summary statistics.
ground_truth_parameter (dict) – The ground truth parameters.
model_names (List) – A list of model names.
distance_function_json_str (str) – The distance function represented as json string.
eps_function_json_str (str) – The epsilon represented as json string.
population_strategy_json_str (str) – The population strategy represented as json string.
Note. This function is called by the
pyabc.ABCSMC
class internally. You should most likely not find it necessary to call this method under normal circumstances.

store_pre_population
(*args, **kwargs)[source]¶ Store a dummy prepopulation containing some configuration data and in particular some ground truth values.
For the parameters, see store_initial_data.
Note. This function is called by the
pyabc.ABCSMC
class internally. You should most likely not find it necessary to call this method under normal circumstances.

property
total_nr_simulations
¶ Number of sample attempts for the ABC run.
 Returns
nr_sim – Total nr of sample attempts for the ABC run.
 Return type
int

update_nr_samples
(*args, **kwargs)[source]¶ Update the number of samples used in iteration t. The default time of PRE_TIME implies an update of the number of samples used in calibration.
 Parameters
t (int, optional (default = 1)) – Time to update for.
nr_samples (int, optional (default = 0)) – Number of samples reported.
Note. This function is called by the
pyabc.ABCSMC
class internally. You should most likely not find it necessary to call this method under normal circumstances.


pyabc.storage.
create_sqlite_db_id
(dir_: str = None, file_: str = 'pyabc_test.db')[source]¶ Convenience function to create an sqlite database identifier which can be understood by sqlalchemy.
 Parameters
dir – The base folder name. Optional, defaults to the system’s temporary directory, i.e. “/tmp/” on Linux. While this makes sense for testing purposes, for productive use a nontemporary location should be used.
file – The database file name. Optional, defaults to “pyabc_test.db”.

pyabc.storage.
load_dict_from_json
(file_: str, key_type: type = <class 'int'>)[source]¶ Read in json file. Convert keys to key_type’. Inverse to `save_dict_to_json.
 Parameters
file – Name of the file to read in.
key_type – Type to convert the keys into.
 Returns
dct
 Return type
The json file contents.
Transitions (Perturbation kernels)¶
Perturbation strategies. The classes defined here transition the current population to the next one. pyABC implements global and local transitions. Proposals for the subsequent generation are generated from the current generation density estimates of the current generations. This is equivalent to perturbing randomly chosen particles.
These can be passed to pyabc.smc.ABCSMC
via the transitions
keyword argument.

class
pyabc.transition.
DiscreteRandomWalkTransition
(n_steps: int = 1, p_l: float = 0.3333333333333333, p_r: float = 0.3333333333333333, p_c: float = 0.3333333333333333)[source]¶ Bases:
pyabc.transition.base.DiscreteTransition
This transition is based on a discrete random walk. This may be useful for discrete ordinal parameter distributions that can be described as lieing on the grid of integers.
Note
This transition does not adapt to the problem structure and thus has potentially slow convergence. Further, the transition does not satisfy proposal >> prior, so that it is indeed not valid as an importance sampling distribution. This can be overcome by selecting the number of steps as a random variable.
 Parameters
n_steps (int, optional (default = 1)) – Number of random walk steps to take.

__init__
(n_steps: int = 1, p_l: float = 0.3333333333333333, p_r: float = 0.3333333333333333, p_c: float = 0.3333333333333333)[source]¶ Initialize self. See help(type(self)) for accurate signature.

fit
(X: pandas.core.frame.DataFrame, w: numpy.ndarray)[source]¶ Fit the density estimator (perturber) to the sampled data. Concrete implementations might do something like fitting a KDE.
The parameters given as
X
andw
are automatically stored inself.X
andself.w
. Parameters
X (pd.DataFrame) – The parameters.
w (array) – The corresponding weights

pdf
(x: Union[pandas.core.series.Series, pandas.core.frame.DataFrame]) → Union[float, numpy.ndarray][source]¶ Evaluate the probability mass function (PMF) at x.

rvs
(size: int = None) → Union[pandas.core.series.Series, pandas.core.frame.DataFrame]¶ Sample from the density.
 Parameters
size (int, optional) – Number of independent samples to draw. Defaults to 1 and is in this case equivalent to calling “rvs_single”.
 Returns
samples
 Return type
The samples as pandas DataFrame
Note
This method can be overridden for efficient implementations. The default is to call rvs_single repeatedly (which might not be the most efficient way).

class
pyabc.transition.
DiscreteTransition
[source]¶ Bases:
pyabc.transition.base.Transition
This is a base class for discrete transition kernels.

abstract
fit
(X: pandas.core.frame.DataFrame, w: numpy.ndarray) → None¶ Fit the density estimator (perturber) to the sampled data. Concrete implementations might do something like fitting a KDE.
The parameters given as
X
andw
are automatically stored inself.X
andself.w
. Parameters
X (pd.DataFrame) – The parameters.
w (array) – The corresponding weights

abstract
pdf
(x: Union[pandas.core.series.Series, pandas.core.frame.DataFrame]) → Union[float, numpy.ndarray]¶ Evaluate the probability density function (PDF) at x.
 Parameters
x (pd.Series, pd.DataFrame) – Parameter. If x is a series, then x should have the the columns from X passed to the fit method as indices. If x is a DataFrame, then x should have the same columns as X passed before to the fit method. The order of the columns is not important
 Returns
density – Probability density at x.
 Return type
float

rvs
(size: int = None) → Union[pandas.core.series.Series, pandas.core.frame.DataFrame]¶ Sample from the density.
 Parameters
size (int, optional) – Number of independent samples to draw. Defaults to 1 and is in this case equivalent to calling “rvs_single”.
 Returns
samples
 Return type
The samples as pandas DataFrame
Note
This method can be overridden for efficient implementations. The default is to call rvs_single repeatedly (which might not be the most efficient way).

abstract
rvs_single
() → pandas.core.series.Series¶ Random variable sample (rvs).
Sample from the fitted distribution.
 Returns
sample – A sample from the fitted model.
 Return type
pd.Series

abstract

class
pyabc.transition.
GridSearchCV
(estimator=None, param_grid=None, scoring=None, n_jobs=1, iid=True, refit=True, cv=5, verbose=0, pre_dispatch='2*n_jobs', error_score='raise', return_train_score=True)[source]¶ Bases:
sklearn.model_selection._search.GridSearchCV
Do a grid search to automatically select the best parameters for transition classes such as the
pyabc.transition.MultivariateNormalTransition
.This is essentially a thin wrapper around ‘sklearn.model_selection.GridSearchCV’. It translates the scikitlearn interface to the interface used in pyABC. It implements hence a thin adapter pattern.
The parameters are just as for sklearn.model_selection.GridSearchCV. Major default values:
estimator = MultivariateNormalTransition()
param_grid = {‘scaling’: np.linspace(0.05, 1.0, 5)}
cv = 5

class
pyabc.transition.
LocalTransition
(k=None, k_fraction=0.25, scaling=1)[source]¶ Bases:
pyabc.transition.base.Transition
Local KDE fit. Takes into account only the k nearest neighbors, similar to [Filippi].
 Parameters
k (int) – Number of nearest neighbors for local covariance calculation.
scaling (float) – Scaling factor for the local covariance matrices.
k_fraction (float, optional) – Calculate number of nearest neighbors to use according to
k = k_fraction * population_size
(and rounds it).

EPS
¶ Scaling of the identity matrix to be added to the covariance in case the covariances are not invertible.
 Type
float
 Filippi
Filippi, Sarah, Chris P. Barnes, Julien Cornebise, and Michael P.H. Stumpf. “On Optimality of Kernels for Approximate Bayesian Computation Using Sequential Monte Carlo.” Statistical Applications in Genetics and Molecular Biology 12, no. 1 (2013): 87–107. doi:10.1515/sagmb20120069.

__init__
(k=None, k_fraction=0.25, scaling=1)[source]¶ Initialize self. See help(type(self)) for accurate signature.

fit
(X, w)[source]¶ Fit the density estimator (perturber) to the sampled data. Concrete implementations might do something like fitting a KDE.
The parameters given as
X
andw
are automatically stored inself.X
andself.w
. Parameters
X (pd.DataFrame) – The parameters.
w (array) – The corresponding weights

pdf
(x)[source]¶ Evaluate the probability density function (PDF) at x.
 Parameters
x (pd.Series, pd.DataFrame) – Parameter. If x is a series, then x should have the the columns from X passed to the fit method as indices. If x is a DataFrame, then x should have the same columns as X passed before to the fit method. The order of the columns is not important
 Returns
density – Probability density at x.
 Return type
float

rvs
(size: int = None) → Union[pandas.core.series.Series, pandas.core.frame.DataFrame]¶ Sample from the density.
 Parameters
size (int, optional) – Number of independent samples to draw. Defaults to 1 and is in this case equivalent to calling “rvs_single”.
 Returns
samples
 Return type
The samples as pandas DataFrame
Note
This method can be overridden for efficient implementations. The default is to call rvs_single repeatedly (which might not be the most efficient way).

class
pyabc.transition.
MultivariateNormalTransition
(scaling: float = 1, bandwidth_selector: Callable[[int, int], float] = <function silverman_rule_of_thumb>)[source]¶ Bases:
pyabc.transition.base.Transition
Transition via a multivariate Gaussian KDE estimate.
 Parameters
scaling (float) – Scaling is a factor which additionally multiplies the covariance with. Since Silverman and Scott usually have too large bandwidths, it should make most sense to have 0 < scaling <= 1
bandwidth_selector (optional) – Defaults to silverman_rule_of_thumb. The bandwidth selector is a function of the form f(n_samples: float, dimension: int), where n_samples denotes the (effective) samples size (and is therefore) a float and dimension is the parameter dimension.

__init__
(scaling: float = 1, bandwidth_selector: Callable[[int, int], float] = <function silverman_rule_of_thumb>)[source]¶ Initialize self. See help(type(self)) for accurate signature.

fit
(X: pandas.core.frame.DataFrame, w: numpy.ndarray) → None[source]¶ Fit the density estimator (perturber) to the sampled data. Concrete implementations might do something like fitting a KDE.
The parameters given as
X
andw
are automatically stored inself.X
andself.w
. Parameters
X (pd.DataFrame) – The parameters.
w (array) – The corresponding weights

pdf
(x: Union[pandas.core.series.Series, pandas.core.frame.DataFrame]) → Union[float, numpy.ndarray][source]¶ Evaluate the probability density function (PDF) at x.
 Parameters
x (pd.Series, pd.DataFrame) – Parameter. If x is a series, then x should have the the columns from X passed to the fit method as indices. If x is a DataFrame, then x should have the same columns as X passed before to the fit method. The order of the columns is not important
 Returns
density – Probability density at x.
 Return type
float

rvs
(size: int = None) → Union[pandas.core.series.Series, pandas.core.frame.DataFrame][source]¶ Sample from the density.
 Parameters
size (int, optional) – Number of independent samples to draw. Defaults to 1 and is in this case equivalent to calling “rvs_single”.
 Returns
samples
 Return type
The samples as pandas DataFrame
Note
This method can be overridden for efficient implementations. The default is to call rvs_single repeatedly (which might not be the most efficient way).

class
pyabc.transition.
Transition
[source]¶ Bases:
sklearn.base.BaseEstimator
Abstract Transition base class. Derive all Transitions from this class
Note
This class does a little bit of metaprogramming.
The fit, pdf and rvs methods are automatically wrapped to handle the special case of no parameters.
Hence, you can safely assume that you encounter at least one parameter. All the defined transitions will then automatically generalize to the case of no parameter.

abstract
fit
(X: pandas.core.frame.DataFrame, w: numpy.ndarray) → None[source]¶ Fit the density estimator (perturber) to the sampled data. Concrete implementations might do something like fitting a KDE.
The parameters given as
X
andw
are automatically stored inself.X
andself.w
. Parameters
X (pd.DataFrame) – The parameters.
w (array) – The corresponding weights

mean_cv
(n_samples: Union[None, int] = None) → float[source]¶ Estimate the uncertainty on the KDE.
 Parameters
n_samples (int, optional) – Estimate the CV for
n_samples
samples. If this parameter is not given, the sample size of the last fit is used. Returns
mean_cv – The estimated average coefficient of variation.
 Return type
float
Note
A call to this method, as a side effect, also sets the attributes
test_points_
,test_weights_
andvariation_at_test_points_
. These are the individual points, weights and variations used to calculate the mean.

abstract
pdf
(x: Union[pandas.core.series.Series, pandas.core.frame.DataFrame]) → Union[float, numpy.ndarray][source]¶ Evaluate the probability density function (PDF) at x.
 Parameters
x (pd.Series, pd.DataFrame) – Parameter. If x is a series, then x should have the the columns from X passed to the fit method as indices. If x is a DataFrame, then x should have the same columns as X passed before to the fit method. The order of the columns is not important
 Returns
density – Probability density at x.
 Return type
float

rvs
(size: int = None) → Union[pandas.core.series.Series, pandas.core.frame.DataFrame][source]¶ Sample from the density.
 Parameters
size (int, optional) – Number of independent samples to draw. Defaults to 1 and is in this case equivalent to calling “rvs_single”.
 Returns
samples
 Return type
The samples as pandas DataFrame
Note
This method can be overridden for efficient implementations. The default is to call rvs_single repeatedly (which might not be the most efficient way).

abstract
Population strategies¶
Strategies to choose the population size.
The population size can be constant or can change over the course of the generations.

class
pyabc.populationstrategy.
AdaptivePopulationSize
(start_nr_particles, mean_cv: float = 0.05, max_population_size: int = inf, min_population_size: int = 10, nr_samples_per_parameter: int = 1, n_bootstrap: int = 10, nr_calibration_particles: int = None)[source]¶ Bases:
pyabc.populationstrategy.PopulationStrategy
Adapt the population size according to the mean coefficient of variation error criterion, as detailed in 1 . This strategy tries to respond to the shape of the current posterior approximation by selecting the population size such that the variation of the density estimates matches the target variation given via the mean_cv argument.
 Parameters
start_nr_particles – Number of particles in the first populations
mean_cv – The error criterion. Defaults to 0.05. A smaller value leads generally to larger populations. The error criterion is the mean coefficient of variation of the estimated KDE.
max_population_size – Max nr of allowed particles in a population. Defaults to infinity.
min_population_size – Min number of particles allowed in a population. Defaults to 10
nr_samples_per_parameter – Defaults to 1.
n_bootstrap – Number of bootstrapped populations to use to estimate the CV. Defaults to 10.
nr_calibration_particles – Number of calibration particles.
 1
Klinger, Emmanuel, and Jan Hasenauer. “A Scheme for Adaptive Selection of Population Sizes in ” Approximate Bayesian Computation  Sequential Monte Carlo.” Computational Methods in Systems Biology, 12844. Lecture Notes in Computer Science. Springer, Cham, 2017. https://doi.org/10.1007/9783319674711_8.

__init__
(start_nr_particles, mean_cv: float = 0.05, max_population_size: int = inf, min_population_size: int = 10, nr_samples_per_parameter: int = 1, n_bootstrap: int = 10, nr_calibration_particles: int = None)[source]¶ Initialize self. See help(type(self)) for accurate signature.

class
pyabc.populationstrategy.
ConstantPopulationSize
(nr_particles: int, nr_calibration_particles: int = None, nr_samples_per_parameter: int = 1)[source]¶ Bases:
pyabc.populationstrategy.PopulationStrategy
Constant size of the different populations
 Parameters
nr_particles – Number of particles per population.
nr_calibration_particles – Number of calibration particles.
nr_samples_per_parameter – Number of samples to draw for a proposed parameter.

class
pyabc.populationstrategy.
ListPopulationSize
(values: Union[List[int], Dict[int, int]], nr_calibration_particles: int = None, nr_samples_per_parameter: int = 1)[source]¶ Bases:
pyabc.populationstrategy.PopulationStrategy
Return population size values from a predefined list. For every time point enquired later (specified by time t), an entry must exist in the list.
 Parameters
values (List[float]) – List of population size values.
values[t]
is the value for population t.nr_calibration_particles – Number of calibration particles.

class
pyabc.populationstrategy.
PopulationStrategy
(nr_calibration_particles: int = None, nr_samples_per_parameter: int = 1)[source]¶ Bases:
abc.ABC
Strategy to select the sizes of the populations.
This is a nonfunctional abstract base implementation. Do not use this class directly. Subclasses must override the update method.
 Parameters
nr_calibration_particles – Number of calibration particles.
nr_samples_per_parameter – Number of samples to draw for a proposed parameter. Default is 1.