Introduction to RBi.helpers

Sebastian Funk on July 10, 2019

RBi.helpers is collection of helper functions to use with RBi, an R interface to LibBi, a library for Bayesian Inference.

Latest Version: 0.3.1

This vignette builds on the RBi vignette, applying the higher-level functions contained in RBi.helpers to the same model introduced there. For the lower-level functions to run LibBi and read the results, please refer to the documentation and vignette that comes with RBi.

Installation

The RBi.helpers package requires R (>= 3.2.0) as well as the packages:

Most functions also require a working installation of LibBi. Please see the RBi vignette for how to get one via homebrew or linuxbrew.

The easiest way to install the latest stable version of RBi.helpers is via CRAN. The package name is rbi.helpers (all lowercase):

install.packages('rbi.helpers')

Alternatively, the current development version can be installed using the devtools package

# install.packages("devtools")
library('devtools')
install_github("sbfnk/rbi.helpers")

Loading the package

Use

library('rbi.helpers')

to load the package.

Loading the model and generating a synthetic dataset

These steps are reproduced from the RBi vignette, where there is more information on the individual steps

model_file <- system.file(package="rbi", "SIR.bi") # get full file name from package
SIRmodel <- bi_model(model_file) # load model
set.seed(1001912)
SIRdata <- bi_generate_dataset(SIRmodel, end_time=16*7, noutputs=16)

Adapt the number of particles

In the RBi vignette, a stochastic SIR model is fitted to simulated data from the same model using particle Markov-chain Monte Carlo with 32 particles. Given a model and data, how do we know how many particles we need? This question does not have a simple answer, as the “optimal” number of particles may depend on the state of the Markov chain. A possible rule-of-thumb is to choose the number of particles such that the variance of the log-likelihood near the mode is approximately one. This suggests a strategy by which first and approximate location of the mode or mean of the posterior distribution is obtained in a trial run, before the numer of particles is adjusted by monitoring the variance of the log-likelihood while keeping the parameters fixed. RBi.helpers implements the second part of this strategy (adjusting the number of particles at a given location in parameter space) with the adapt_particles method. For the first part (finding the mode), a crude method is to take a fixed number of samples from the prior distribution and choose the one that maximises the posterior. In RBi, this can be achieved with

bi_prior <- sample(proposal="prior", SIRmodel, nsamples=1000, end_time=16*7, nparticles=4, obs=SIRdata, seed=1234)

This runs particle MCMC with the prior distribution as proposal distribution (because proposal="prior"), in this case with 4 particles. In other words, when sampling from the posterior the proposals will be drawn independently from the prior distribution. Note that we set a seed to make the results reproducible. It is worth trying the commands with a different seed and seeing the difference to the results obtained below. The location in parameters of the sampler at the end of the 1000 iterations will give an approximation of the mode of the posterior distribution. This can then be used to adjust the number of particles using

adapted <- adapt_particles(bi_prior)
## Wed Jul 10 11:50:32 2019 Adapting the number of particles
## Wed Jul 10 11:50:42 2019 4 particles, loglikelihod variance: 1023.55412595736
## Wed Jul 10 11:50:46 2019 8 particles, loglikelihod variance: 33.6754090462797
## Wed Jul 10 11:50:51 2019 16 particles, loglikelihod variance: 10.2735034458929
## Wed Jul 10 11:50:59 2019 32 particles, loglikelihod variance: 1.83073049163988
## Wed Jul 10 11:51:14 2019 64 particles, loglikelihod variance: 1.80600281507059
## Wed Jul 10 11:51:44 2019 128 particles, loglikelihod variance: 1.33257553201926
## Wed Jul 10 11:52:39 2019 256 particles, loglikelihod variance: 0.872477311579686
## Wed Jul 10 11:52:39 2019 Choosing 256 particles.

This will take the last sample of the output file contained in the libbi object bi_prior, and use it to adjust the number of particles by starting with 1 particle (or a given min) and doubling it until the variance of the loglikelihood crosses 1. The number of particles is then saved in the adapted object:

adapted$options$nparticles
## [1] 256

Adapt the proposal distribution

Having adjusted the number of particles, the second important information to give the posterior sampler is the proposal distribution. This can, again, be obtained using a sequence of trial runs, whereby the proposal distribution is sequentially adjusted from previous samples to be proportional to the empirical covariance of the posterior samples. The way this is implemented in the adapt_proposal function in RBi.helpers is that the proposal distribution is adjusted to come from a multivariate normal taking into account the covariance of samples obtained so far, until the acceptance rate lies between the given minimum and maximumad. For example, to adjust the proposal distribution for an acceptance rate between 0.05 and 0.4, we can run:

adapted <- adapt_proposal(adapted, min=0.05, max=0.4)
## Wed Jul 10 11:52:39 2019 Adapting the proposal distribution
## Wed Jul 10 11:52:40 2019 Initial trial run
## Wed Jul 10 11:53:46 2019 Acceptance rate 0.406406406406406, adapting size with scale 1
## Wed Jul 10 11:54:43 2019 Acceptance rate: 0.387387387387387

The covariance matrices for parameters and initial conditions are stored in the input file contained in the libbi object adapted

bi_read(adapted, file="input")
## $`__proposal_parameter_cov`
##   __dim_parameter_cov.1 __dim_parameter_cov.2      value
## 1                 p_rep                 p_rep 0.08880033
## 2                 p_rep                  p_R0 0.00000000
## 3                  p_R0                 p_rep 0.00000000
## 4                  p_R0                  p_R0 0.35165480

Compute DIC

To compute the Deviance Information Criterion (DIC), use DIC:

posterior <- sample(adapted)
DIC(posterior)
## [1] 126.5054

This samples from the posterior distribution using the adapted number of particles and proposal distribution and then calculates the DIC.

Convert between LibBi times and actual times or dates

LibBi uses real-valued times. To convert these to time or date objects for use in R, use the numeric_to_time function:

res <- numeric_to_time(posterior, unit="day", origin=as.Date("2018-04-01"))
head(res$Z)
##   np       time      value
## 1  0 2018-04-01   1.000000
## 2  0 2018-04-08   9.567902
## 3  0 2018-04-15  95.638054
## 4  0 2018-04-22  85.915564
## 5  0 2018-04-29 280.458185
## 6  0 2018-05-06 134.560334

The function time_to_numeric does the converse, converting R times or dates into numeric values for use by LibBi:

orig <- time_to_numeric(res, unit="day", origin=as.Date("2018-04-01"))
head(orig$Z)
##   np time      value
## 1  0    0   1.000000
## 2  0    7   9.567902
## 3  0   14  95.638054
## 4  0   21  85.915564
## 5  0   28 280.458185
## 6  0   35 134.560334

Create inference chains

In combination with the magrittr package, the it is possible to construct inference chains more elegantly. For example, to get posterior samples from adapted Metropolis-Hastings including sampled observations, we could have written

library('magrittr')
posterior <- sample(proposal="prior", SIRmodel, nsamples=1000, end_time=16*7, nparticles=4, obs=SIRdata, seed=1234) %>%
  adapt_particles %>%
  adapt_proposal(min=0.05, max=0.4) %>%
  sample(nsamples=5000) %>%
  sample_obs