# Script based external simulators¶

In this notebook, we demonstrate the usage of the script based external simulators, summary statistics, and distance functions features.

These allow to use near-arbitrary programing languages and output for pyABC. The main concept is that all communication happens via the file system. This comes at the cost of a considerable overhead, making this feature not optimal for models with a low simulation time. For more expensive models, the overhead should be negligible.

This notebook is similar to the using_R notebook, but circumvents usage of the rpy2 package.

[1]:

import pyabc
import pyabc.external

/home/yannik/anaconda3/lib/python3.7/site-packages/distributed/config.py:20: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.


Here, we define model, summary statistics and distance. Note that if possible, alternatively FileIdSumStat can be used to read in the summary statistics directly to python and then use a python based distance function.

[2]:

model = pyabc.external.ExternalModel(executable="Rscript", file="rmodel/model.r")
sumstat = pyabc.external.ExternalSumStat(executable="Rscript", file="rmodel/sumstat.r")
distance = pyabc.external.ExternalDistance(executable="Rscript", file="rmodel/distance.r")

dummy_sumstat = pyabc.external.create_sum_stat()  # can also use a real file here

Note that here we have the observed summary statistics encoded directly in the distance.r file, such that we don't need any here. Alternatively, we could also specifiy the summary statistics for real here via a file. Example of how the functions can be called:
[3]:

pars = {'meanX': 3, 'meanY': 4}
mf = model(pars)
print(mf)
sf = sumstat(mf)
print(sf)
distance(sf, dummy_sumstat)

{'loc': '/tmp/modelsim_n6oahdmc', 'returncode': 0}
{'loc': '/tmp/sumstat_nys4kf_l', 'returncode': 0}

[3]:

2.994722

[4]:

from pyabc import Distribution, RV, ABCSMC

prior = Distribution(meanX=RV("uniform", 0, 10),
meanY=RV("uniform", 0, 10))
abc = ABCSMC(model, prior, distance,
summary_statistics=sumstat,
population_size=20)

import os
from tempfile import gettempdir

db = "sqlite:///" + os.path.join(gettempdir(), "test.db")
abc.new(db, dummy_sumstat)

history = abc.run(minimum_epsilon=0.9, max_nr_populations=4)

INFO:History:Start <ABCSMC(id=1, start_time=2019-09-23 18:36:16.815211, end_time=None)>
INFO:Epsilon:initial epsilon is 5.276756500000001
INFO:ABC:t:0 eps:5.276756500000001
INFO:ABC:t:1 eps:4.03523
INFO:ABC:t:2 eps:2.3500524869456942
INFO:ABC:t:3 eps:1.857179536835703
INFO:History:Done <ABCSMC(id=1, start_time=2019-09-23 18:36:16.815211, end_time=2019-09-23 18:37:14.892234)>


Note the significantly longer runtimes compared to using rpy2. This is because the simulation time of this model is very short, such that repeatedly accessing the file system creates a notable overhead. For more expensive models, this overhead however becomes less notable. Still, if applicable, more efficient ways of communication between model and simulator are preferable.

[5]:

from pyabc.visualization import plot_kde_2d

for t in range(history.n_populations):
df, w = abc.history.get_distribution(0, t)
ax = plot_kde_2d(df, w, "meanX", "meanY",
xmin=0, xmax=10,
ymin=0, ymax=10,
numx=100, numy=100)
ax.scatter([4], [8],
edgecolor="black",
facecolor="white",
label="Observation");
ax.legend();
ax.set_title("PDF t={}".format(t))