Title: | Data Cloning and MCMC Tools for Maximum Likelihood Methods |
---|---|
Description: | Low level functions for implementing maximum likelihood estimating procedures for complex models using data cloning and Bayesian Markov chain Monte Carlo methods as described in Solymos 2010 <doi:10.32614/RJ-2010-011>. Sequential and parallel MCMC support for 'JAGS', 'WinBUGS', 'OpenBUGS', and 'Stan'. |
Authors: | Peter Solymos [aut, cre] |
Maintainer: | Peter Solymos <[email protected]> |
License: | GPL-2 |
Version: | 2.3-3 |
Built: | 2024-11-23 02:46:28 UTC |
Source: | https://github.com/datacloning/dclone |
Low level functions for implementing maximum likelihood estimating procedures for complex models using data cloning and Bayesian Markov chain Monte Carlo methods. Sequential and parallel MCMC support for JAGS, WinBUGS, OpenBUGS, and Stan.
Main functions include:
dclone
, dcdim
, dciid
,
dctr
:
cloning R objects in various ways.
jags.fit
, bugs.fit
, stan.fit
:
conveniently fit JAGS/BUGS/Stan models.
jags.parfit
, bugs.parfit
, stan.parfit
fits chains on parallel workers.
dc.fit
: iterative model fitting by
the data cloning algorithm.
dc.parfit
is the parallelized version.
dctable
, dcdiag
:
helps evaluating data cloning
convergence by descriptive statistics and diagnostic tools.
(These are based on e.g. chisq.diag
and lambdamax.diag
.)
coef.mcmc.list
, confint.mcmc.list.dc
,
dcsd.mcmc.list
, quantile.mcmc.list
,
vcov.mcmc.list.dc
, mcmcapply
,
stack.mcmc.list
:
methods for mcmc.list
objects.
write.jags.model
, clean.jags.model
,
custommodel
:
convenient functions for handling JAGS/BUGS/Stan
models.
jagsModel
, codaSamples
: basic functions
from rjags package rewrote to recognize data cloning
attributes from data (parJagsModel
,
parUpdate
, parCodaSamples
are the parallel versions).
Author: Peter Solymos
Maintainer: Peter Solymos
Forum: https://groups.google.com/forum/#!forum/dclone-users
Issues: https://github.com/datacloning/dcmle/issues
Data cloning website: https://datacloning.org
Solymos, P., 2010. dclone: Data Cloning in R. The R Journal 2(2), 29–37. URL: https://journal.r-project.org/archive/2010-2/RJournal_2010-2_Solymos.pdf
Lele, S.R., B. Dennis and F. Lutscher, 2007. Data cloning: easy maximum likelihood estimation for complex ecological models using Bayesian Markov chain Monte Carlo methods. Ecology Letters 10, 551–563.
Lele, S. R., K. Nadeem and B. Schmuland, 2010. Estimability and likelihood inference for generalized linear mixed models using data cloning. Journal of the American Statistical Association 105, 1617–1625.
This is the workhorse for dc.fit
and
dc.parfit
.
.dcFit(data, params, model, inits, n.clones, multiply = NULL, unchanged = NULL, update = NULL, updatefun = NULL, initsfun = NULL, flavour = c("jags", "bugs", "stan"), n.chains=3, cl = NULL, parchains = FALSE, return.all=FALSE, check.nclones=TRUE, ...)
.dcFit(data, params, model, inits, n.clones, multiply = NULL, unchanged = NULL, update = NULL, updatefun = NULL, initsfun = NULL, flavour = c("jags", "bugs", "stan"), n.chains=3, cl = NULL, parchains = FALSE, return.all=FALSE, check.nclones=TRUE, ...)
data |
A named list (or environment) containing the data. |
params |
Character vector of parameters to be sampled. It can be a list of 2 vectors, 1st element is used as parameters to monitor, the 2nd is used as parameters to use in calculating the data cloning diagnostics. |
model |
Character string (name of the model file), a function containing
the model, or a |
inits |
Optional specification of initial values in the form of a list or a
function (see Initialization at |
n.clones |
An integer vector containing the numbers of clones to use iteratively. |
multiply |
Numeric or character index for list element(s) in the |
unchanged |
Numeric or character index for list element(s) in the |
update |
Numeric or character index for list element(s) in the |
updatefun |
A function to use for updating |
initsfun |
A function to use for generating initial values, |
flavour |
If |
n.chains |
Number of chains to generate. |
cl |
A cluster object created by |
parchains |
Logical, whether parallel chains should be run. |
return.all |
Logical. If |
check.nclones |
Logical, whether to check and ensure that values of |
... |
Other values supplied to |
An object inheriting from the class 'mcmc.list'.
Peter Solymos, implementation is based on many discussions with Khurram Nadeem and Subhash Lele.
Convenient functions designed to work well with cloned data arguments and WinBUGS and OpenBUGS.
bugs.fit(data, params, model, inits = NULL, n.chains = 3, format = c("mcmc.list", "bugs"), program = c("winbugs", "openbugs", "brugs"), seed, ...) ## S3 method for class 'bugs' as.mcmc.list(x, ...)
bugs.fit(data, params, model, inits = NULL, n.chains = 3, format = c("mcmc.list", "bugs"), program = c("winbugs", "openbugs", "brugs"), seed, ...) ## S3 method for class 'bugs' as.mcmc.list(x, ...)
data |
A list (or environment) containing the data. |
params |
Character vector of parameters to be sampled. |
model |
Character string (name of the model file), a function containing
the model, or a |
inits |
Optional specification of initial values in the
form of a list or a function.
If |
n.chains |
number of Markov chains. |
format |
Required output format. |
program |
The program to use, not case sensitive.
|
seed |
Random seed ( |
x |
A fitted 'bugs' object. |
... |
Further arguments of the |
By default, an mcmc.list
object. If data cloning is used via the
data
argument, summary
returns a modified summary
containing scaled data cloning standard errors
(scaled by sqrt(n.clones)
), and
values (as returned by
gelman.diag
).
bugs.fit
can return a bugs
object if
format = "bugs"
.
In this case, summary is not changed, but the number of clones
used is attached as attribute
and can be retrieved by the function nclones
.
The function as.mcmc.list.bugs
converts a 'bugs' object
into 'mcmc.list' and retrieves
data cloning information as well.
Peter Solymos
Underlying functions:
bugs
in package R2WinBUGS,
openbugs
in package R2WinBUGS,
bugs
in package R2OpenBUGS
Methods: dcsd
, confint.mcmc.list.dc
,
coef.mcmc.list
, quantile.mcmc.list
,
vcov.mcmc.list.dc
## Not run: ## fitting with WinBUGS, bugs example if (require(R2WinBUGS)) { data(schools) dat <- list(J = nrow(schools), y = schools$estimate, sigma.y = schools$sd) bugs.model <- function(){ for (j in 1:J){ y[j] ~ dnorm (theta[j], tau.y[j]) theta[j] ~ dnorm (mu.theta, tau.theta) tau.y[j] <- pow(sigma.y[j], -2) } mu.theta ~ dnorm (0.0, 1.0E-6) tau.theta <- pow(sigma.theta, -2) sigma.theta ~ dunif (0, 1000) } inits <- function(){ list(theta=rnorm(nrow(schools), 0, 100), mu.theta=rnorm(1, 0, 100), sigma.theta=runif(1, 0, 100)) } param <- c("mu.theta", "sigma.theta") if (.Platform$OS.type == "windows") { sim <- bugs.fit(dat, param, bugs.model, inits) summary(sim) } dat2 <- dclone(dat, 2, multiply="J") if (.Platform$OS.type == "windows") { sim2 <- bugs.fit(dat2, param, bugs.model, program="winbugs", n.iter=2000, n.thin=1) summary(sim2) } } if (require(BRugs)) { ## fitting the model with OpenBUGS ## using the less preferred BRugs interface sim3 <- bugs.fit(dat2, param, bugs.model, program="brugs", n.iter=2000, n.thin=1) summary(sim3) } if (require(R2OpenBUGS)) { ## fitting the model with OpenBUGS ## using the preferred R2OpenBUGS interface sim4 <- bugs.fit(dat2, param, bugs.model, program="openbugs", n.iter=2000, n.thin=1) summary(sim4) } if (require(rjags)) { ## fitting the model with JAGS sim5 <- jags.fit(dat2, param, bugs.model) summary(sim5) } ## End(Not run)
## Not run: ## fitting with WinBUGS, bugs example if (require(R2WinBUGS)) { data(schools) dat <- list(J = nrow(schools), y = schools$estimate, sigma.y = schools$sd) bugs.model <- function(){ for (j in 1:J){ y[j] ~ dnorm (theta[j], tau.y[j]) theta[j] ~ dnorm (mu.theta, tau.theta) tau.y[j] <- pow(sigma.y[j], -2) } mu.theta ~ dnorm (0.0, 1.0E-6) tau.theta <- pow(sigma.theta, -2) sigma.theta ~ dunif (0, 1000) } inits <- function(){ list(theta=rnorm(nrow(schools), 0, 100), mu.theta=rnorm(1, 0, 100), sigma.theta=runif(1, 0, 100)) } param <- c("mu.theta", "sigma.theta") if (.Platform$OS.type == "windows") { sim <- bugs.fit(dat, param, bugs.model, inits) summary(sim) } dat2 <- dclone(dat, 2, multiply="J") if (.Platform$OS.type == "windows") { sim2 <- bugs.fit(dat2, param, bugs.model, program="winbugs", n.iter=2000, n.thin=1) summary(sim2) } } if (require(BRugs)) { ## fitting the model with OpenBUGS ## using the less preferred BRugs interface sim3 <- bugs.fit(dat2, param, bugs.model, program="brugs", n.iter=2000, n.thin=1) summary(sim3) } if (require(R2OpenBUGS)) { ## fitting the model with OpenBUGS ## using the preferred R2OpenBUGS interface sim4 <- bugs.fit(dat2, param, bugs.model, program="openbugs", n.iter=2000, n.thin=1) summary(sim4) } if (require(rjags)) { ## fitting the model with JAGS sim5 <- jags.fit(dat2, param, bugs.model) summary(sim5) } ## End(Not run)
Does the same job as bugs.fit
,
but parallel chains are run on parallel workers, thus
computations can be faster (up to 1/n.chains
) for long MCMC runs.
bugs.parfit(cl, data, params, model, inits=NULL, n.chains = 3, seed, program=c("winbugs", "openbugs", "brugs"), ...)
bugs.parfit(cl, data, params, model, inits=NULL, n.chains = 3, seed, program=c("winbugs", "openbugs", "brugs"), ...)
cl |
A cluster object created by |
data |
A named list or environment containing the data. If an environment,
|
params |
Character vector of parameters to be sampled. |
model |
Character string (name of the model file), a function
containing the model, or a or |
inits |
Specification of initial values in the form of a
list or a function, can be missing.
If this is a function and using 'snow' type
cluster as |
n.chains |
Number of chains to generate, must be higher than 1. Ideally, this is equal to the number of parallel workers in the cluster. |
seed |
Vector of random seeds, must have |
program |
The program to use, not case sensitive. See |
... |
Other arguments passed to |
Chains are run on parallel workers, and the results are combined in the end.
The seed must be supplied, as it is the user's responsibility to make sure that pseudo random sequences do not seriously overlap.
The WinBUGS implementation is quite unsafe from this regard,
because the pseudo-random number generator used by WinBUGS
generates a finite (albeit very long) sequence of distinct numbers,
which would eventually be repeated if the sampler
were run for a sufficiently long time.
Thus it's usage must be discouraged. That is the reason for the
warning that is issued when program = "winbugs"
.
OpenBUGS (starting from version 3.2.2) implemented a system
where internal state of the pseudo random number generator can be
set to one of 14 predefined states (seed values in 1:14
).
Each predefined state is 10^12 draws apart to avoid overlap in
pseudo random number sequences.
Note: the default setting working.directory = NULL
cannot be changed
when running parallel chains with bugs.parfit
because
the multiple instances would try to read/write the same directory.
An mcmc.list
object.
Peter Solymos
Sequential version: bugs.fit
## Not run: ## fitting with WinBUGS, bugs example if (require(R2WinBUGS)) { data(schools) dat <- list(J = nrow(schools), y = schools$estimate, sigma.y = schools$sd) bugs.model <- function(){ for (j in 1:J){ y[j] ~ dnorm (theta[j], tau.y[j]) theta[j] ~ dnorm (mu.theta, tau.theta) tau.y[j] <- pow(sigma.y[j], -2) } mu.theta ~ dnorm (0.0, 1.0E-6) tau.theta <- pow(sigma.theta, -2) sigma.theta ~ dunif (0, 1000) } param <- c("mu.theta", "sigma.theta") SEED <- floor(runif(3, 100000, 999999)) cl <- makePSOCKcluster(3) if (.Platform$OS.type == "windows") { sim <- bugs.parfit(cl, dat, param, bugs.model, seed=SEED) summary(sim) } dat2 <- dclone(dat, 2, multiply="J") if (.Platform$OS.type == "windows") { sim2 <- bugs.parfit(cl, dat2, param, bugs.model, program="winbugs", n.iter=2000, n.thin=1, seed=SEED) summary(sim2) } } if (require(BRugs)) { ## fitting the model with OpenBUGS ## using the less preferred BRugs interface sim3 <- bugs.parfit(cl, dat2, param, bugs.model, program="brugs", n.iter=2000, n.thin=1, seed=1:3) summary(sim3) } if (require(R2OpenBUGS)) { ## fitting the model with OpenBUGS ## using the preferred R2OpenBUGS interface sim4 <- bugs.parfit(cl, dat2, param, bugs.model, program="openbugs", n.iter=2000, n.thin=1, seed=1:3) summary(sim4) } stopCluster(cl) ## multicore type forking if (require(R2OpenBUGS) && .Platform$OS.type != "windows") { sim7 <- bugs.parfit(3, dat2, param, bugs.model, program="openbugs", n.iter=2000, n.thin=1, seed=1:3) summary(sim7) } ## End(Not run)
## Not run: ## fitting with WinBUGS, bugs example if (require(R2WinBUGS)) { data(schools) dat <- list(J = nrow(schools), y = schools$estimate, sigma.y = schools$sd) bugs.model <- function(){ for (j in 1:J){ y[j] ~ dnorm (theta[j], tau.y[j]) theta[j] ~ dnorm (mu.theta, tau.theta) tau.y[j] <- pow(sigma.y[j], -2) } mu.theta ~ dnorm (0.0, 1.0E-6) tau.theta <- pow(sigma.theta, -2) sigma.theta ~ dunif (0, 1000) } param <- c("mu.theta", "sigma.theta") SEED <- floor(runif(3, 100000, 999999)) cl <- makePSOCKcluster(3) if (.Platform$OS.type == "windows") { sim <- bugs.parfit(cl, dat, param, bugs.model, seed=SEED) summary(sim) } dat2 <- dclone(dat, 2, multiply="J") if (.Platform$OS.type == "windows") { sim2 <- bugs.parfit(cl, dat2, param, bugs.model, program="winbugs", n.iter=2000, n.thin=1, seed=SEED) summary(sim2) } } if (require(BRugs)) { ## fitting the model with OpenBUGS ## using the less preferred BRugs interface sim3 <- bugs.parfit(cl, dat2, param, bugs.model, program="brugs", n.iter=2000, n.thin=1, seed=1:3) summary(sim3) } if (require(R2OpenBUGS)) { ## fitting the model with OpenBUGS ## using the preferred R2OpenBUGS interface sim4 <- bugs.parfit(cl, dat2, param, bugs.model, program="openbugs", n.iter=2000, n.thin=1, seed=1:3) summary(sim4) } stopCluster(cl) ## multicore type forking if (require(R2OpenBUGS) && .Platform$OS.type != "windows") { sim7 <- bugs.parfit(3, dat2, param, bugs.model, program="openbugs", n.iter=2000, n.thin=1, seed=1:3) summary(sim7) } ## End(Not run)
These functions help in optimizing workload for the workers if problems are of different size.
clusterSize(size) plotClusterSize(n, size, balancing = c("none", "load", "size", "both"), plot = TRUE, col = NA, xlim = NULL, ylim = NULL, main, ...)
clusterSize(size) plotClusterSize(n, size, balancing = c("none", "load", "size", "both"), plot = TRUE, col = NA, xlim = NULL, ylim = NULL, main, ...)
n |
Number of workers. |
size |
Vector of problem sizes (recycled if needed).
The default |
balancing |
Character, type of balancing to perform, one of
|
plot |
Logical, if a plot should be drawn. |
col |
Color of the polygons for work load pieces. |
xlim , ylim
|
Limits for the x and the y axis, respectively (optional). |
main |
Title of the plot, can be missing. |
... |
Other arguments passed to |
These functions help determine the optimal number of workers needed for different sized problems ('size' indicates approximate processing time here). The number of workers needed depends on the type of balancing.
For the description of the balancing types, see
parDosa
.
clusterSize
returns a data frame with approximate
processing time as the function of
the number of workers (rows, in 1:length(size)
) and
the type of balancing (c("none", "load", "size", "both")
).
Approximate processing time is calculated from values in size
without taking into account any communication overhead.
plotClusterSize
invisibly returns the total
processing time needed for a setting given
its arguments. As a side effect, a plot is produced
(if plot = TRUE
).
Peter Solymos
## determine the number of workers needed clusterSize(1:5) ## visually compare balancing options opar <- par(mfrow=c(2, 2)) plotClusterSize(2,1:5, "none") plotClusterSize(2,1:5, "load") plotClusterSize(2,1:5, "size") plotClusterSize(2,1:5, "both") par(opar)
## determine the number of workers needed clusterSize(1:5) ## visually compare balancing options opar <- par(mfrow=c(2, 2)) plotClusterSize(2,1:5, "none") plotClusterSize(2,1:5, "load") plotClusterSize(2,1:5, "size") plotClusterSize(2,1:5, "both") par(opar)
Functions for size balancing.
clusterSplitSB(cl = NULL, seq, size = 1) parLapplySB(cl = NULL, x, size = 1, fun, ...) parLapplySLB(cl = NULL, x, size = 1, fun, ...)
clusterSplitSB(cl = NULL, seq, size = 1) parLapplySB(cl = NULL, x, size = 1, fun, ...) parLapplySLB(cl = NULL, x, size = 1, fun, ...)
cl |
A cluster object created by |
x , seq
|
A vector to split. |
fun |
A function or character string naming a function. |
size |
Vector of problem sizes (approximate processing times)
corresponding to elements of |
... |
Other arguments of |
clusterSplitSB
splits seq
into subsets,
with respect to size
.
In size balancing, the problem is re-ordered from
largest to smallest, and then subsets are
determined by minimizing the total approximate processing time.
This splitting is deterministic (reproducible).
parLapplySB
and parLapplySLB
evaluates fun
on elements of x
in parallel, similarly to
parLapply
. parLapplySB
uses size balancing (via clusterSplitSB
).
parLapplySLB
uses size and load balancing.
This means that the problem is re-ordered from largest to smallest,
and then undeterministic load balancing
is used (see clusterApplyLB
). If size
is
correct, this is identical to size balancing.
This splitting is non-deterministic (might not be reproducible).
clusterSplitSB
returns a list of subsets
split with respect to size
.
parLapplySB
and parLapplySLB
evaluates
fun
on elements of x
, and return a result
corresponding to x
. Usually a list with results
returned by the cluster.
Peter Solymos
Related functions without size balancing:
clusterSplit
, parLapply
.
Underlying functions: clusterApply
,
clusterApplyLB
.
Optimizing the number of workers: clusterSize
,
plotClusterSize
.
## Not run: cl <- makePSOCKcluster(2) ## equal sizes, same as clusterSplit(cl, 1:5) clusterSplitSB(cl, 1:5) ## different sizes clusterSplitSB(cl, 1:5, 5:1) x <- list(1, 2, 3, 4) parLapplySB(cl, x, function(z) z^2, size=1:4) stopCluster(cl) ## End(Not run)
## Not run: cl <- makePSOCKcluster(2) ## equal sizes, same as clusterSplit(cl, 1:5) clusterSplitSB(cl, 1:5) ## different sizes clusterSplitSB(cl, 1:5, 5:1) x <- list(1, 2, 3, 4) parLapplySB(cl, x, function(z) z^2, size=1:4) stopCluster(cl) ## End(Not run)
This function sets a trace
monitor for all requested nodes, updates the model and coerces the
output to a single mcmc.list
object.
This function uses coda.samples
but keeps track
of data cloning information supplied via the model
argument.
codaSamples(model, variable.names, n.iter, thin = 1, na.rm = TRUE, ...)
codaSamples(model, variable.names, n.iter, thin = 1, na.rm = TRUE, ...)
model |
a jags model object |
variable.names |
a character vector giving the names of variables to be monitored |
n.iter |
number of iterations to monitor |
thin |
thinning interval for monitors |
na.rm |
logical flag that indicates whether variables containing
missing values should be omitted. See details in help page
of |
... |
optional arguments that are passed to the update method for jags model objects |
An mcmc.list
object. An n.clones
attribute is attached
to the object, but unlike in jags.fit
there is no
updated.model
attribute as it is equivalent to the
input jags model object.
Peter Solymos
coda.samples
,
update.jags
,
jags.model
Parallel version: parCodaSamples
## Not run: model <- function() { for (i in 1:N) { Y[i] ~ dnorm(mu[i], tau) mu[i] <- alpha + beta * (x[i] - x.bar) } x.bar <- mean(x[]) alpha ~ dnorm(0.0, 1.0E-4) beta ~ dnorm(0.0, 1.0E-4) sigma <- 1.0/sqrt(tau) tau ~ dgamma(1.0E-3, 1.0E-3) } ## data generation set.seed(1234) N <- 100 alpha <- 1 beta <- -1 sigma <- 0.5 x <- runif(N) linpred <- crossprod(t(model.matrix(~x)), c(alpha, beta)) Y <- rnorm(N, mean = linpred, sd = sigma) jdata <- dclone(list(N = N, Y = Y, x = x), 2, multiply="N") jpara <- c("alpha", "beta", "sigma") ## jags model res <- jagsModel(file=model, data=jdata, n.chains = 3, n.adapt=1000) nclones(res) update(res, n.iter=1000) nclones(res) m <- codaSamples(res, jpara, n.iter=2000) summary(m) nclones(m) ## End(Not run)
## Not run: model <- function() { for (i in 1:N) { Y[i] ~ dnorm(mu[i], tau) mu[i] <- alpha + beta * (x[i] - x.bar) } x.bar <- mean(x[]) alpha ~ dnorm(0.0, 1.0E-4) beta ~ dnorm(0.0, 1.0E-4) sigma <- 1.0/sqrt(tau) tau ~ dgamma(1.0E-3, 1.0E-3) } ## data generation set.seed(1234) N <- 100 alpha <- 1 beta <- -1 sigma <- 0.5 x <- runif(N) linpred <- crossprod(t(model.matrix(~x)), c(alpha, beta)) Y <- rnorm(N, mean = linpred, sd = sigma) jdata <- dclone(list(N = N, Y = Y, x = x), 2, multiply="N") jpara <- c("alpha", "beta", "sigma") ## jags model res <- jagsModel(file=model, data=jdata, n.chains = 3, n.adapt=1000) nclones(res) update(res, n.iter=1000) nclones(res) m <- codaSamples(res, jpara, n.iter=2000) summary(m) nclones(m) ## End(Not run)
jags.fit
or bugs.fit
is
iteratively used to fit a model with
increasing the number of clones.
dc.fit(data, params, model, inits, n.clones, multiply = NULL, unchanged = NULL, update = NULL, updatefun = NULL, initsfun = NULL, flavour = c("jags", "bugs", "stan"), n.chains = 3, return.all=FALSE, check.nclones=TRUE, ...)
dc.fit(data, params, model, inits, n.clones, multiply = NULL, unchanged = NULL, update = NULL, updatefun = NULL, initsfun = NULL, flavour = c("jags", "bugs", "stan"), n.chains = 3, return.all=FALSE, check.nclones=TRUE, ...)
data |
A named list (or environment) containing the data. |
params |
Character vector of parameters to be sampled. It can be a list of 2 vectors, 1st element is used as parameters to monitor, the 2nd is used as parameters to use in calculating the data cloning diagnostics. |
model |
Character string (name of the model file), a function containing
the model, or a |
inits |
Optional specification of initial values in the form of a list or a
function (see Initialization at |
n.clones |
An integer vector containing the numbers of clones to use iteratively. |
multiply |
Numeric or character index for list element(s) in the |
unchanged |
Numeric or character index for list element(s) in the |
update |
Character, the name of the list element(s) in the |
updatefun |
A function to use for updating named elements in |
initsfun |
A function to use for generating initial values, |
flavour |
If |
n.chains |
Number of chains to generate. |
return.all |
Logical. If |
check.nclones |
Logical, whether to check and ensure that values of |
... |
Other values supplied to |
The function fits a JAGS/BUGS model with increasing numbers of clones,
as supplied by the argument n.clones
. Data cloning is done by the
function dclone
using
the arguments multiply
and unchanged
.
An updating function can be provided, see Examples.
An object inheriting from the class 'mcmc.list'.
Peter Solymos, implementation is based on many discussions with Khurram Nadeem and Subhash Lele.
Lele, S.R., B. Dennis and F. Lutscher, 2007. Data cloning: easy maximum likelihood estimation for complex ecological models using Bayesian Markov chain Monte Carlo methods. Ecology Letters 10, 551–563.
Lele, S. R., K. Nadeem and B. Schmuland, 2010. Estimability and likelihood inference for generalized linear mixed models using data cloning. Journal of the American Statistical Association 105, 1617–1625.
Solymos, P., 2010. dclone: Data Cloning in R. The R Journal 2(2), 29–37. URL: https://journal.r-project.org/archive/2010-2/RJournal_2010-2_Solymos.pdf
Data cloning: dclone
.
Parallel computations: dc.parfit
Model fitting: jags.fit
, bugs.fit
Convergence diagnostics: dctable
, dcdiag
## Not run: ## simulation for Poisson GLMM set.seed(1234) n <- 20 beta <- c(2, -1) sigma <- 0.1 alpha <- rnorm(n, 0, sigma) x <- runif(n) X <- model.matrix(~x) linpred <- crossprod(t(X), beta) + alpha Y <- rpois(n, exp(linpred)) ## JAGS model as a function jfun1 <- function() { for (i in 1:n) { Y[i] ~ dpois(lambda[i]) log(lambda[i]) <- alpha[i] + inprod(X[i,], beta) alpha[i] ~ dnorm(0, 1/sigma^2) } for (j in 1:np) { beta[j] ~ dnorm(0, 0.001) } sigma ~ dlnorm(0, 0.001) } ## data jdata <- list(n = n, Y = Y, X = X, np = NCOL(X)) ## inits with latent variable and parameters ini <- list(alpha=rep(0,n), beta=rep(0, NCOL(X))) ## function to update inits ifun <- function(model, n.clones) { list(alpha=dclone(rep(0,n), n.clones), beta=coef(model)[-length(coef(model))]) } ## iteartive fitting jmod <- dc.fit(jdata, c("beta", "sigma"), jfun1, ini, n.clones = 1:5, multiply = "n", unchanged = "np", initsfun=ifun) ## summary with DC SE and R hat summary(jmod) dct <- dctable(jmod) plot(dct) ## How to use estimates to make priors more informative? glmm.model.up <- function() { for (i in 1:n) { Y[i] ~ dpois(lambda[i]) log(lambda[i]) <- alpha[i] + inprod(X[i,], beta[1,]) alpha[i] ~ dnorm(0, 1/sigma^2) } for (j in 1:p) { beta[1,j] ~ dnorm(priors[j,1], priors[j,2]) } sigma ~ dgamma(priors[(p+1),2], priors[(p+1),1]) } ## function for updating, x is an MCMC object upfun <- function(x) { if (missing(x)) { p <- ncol(X) return(cbind(c(rep(0, p), 0.001), rep(0.001, p+1))) } else { par <- coef(x) return(cbind(par, rep(0.01, length(par)))) } } updat <- list(n = n, Y = Y, X = X, p = ncol(X), priors = upfun()) dcmod <- dc.fit(updat, c("beta", "sigma"), glmm.model.up, n.clones = 1:5, multiply = "n", unchanged = "p", update = "priors", updatefun = upfun) summary(dcmod) ## time series example ## data and model taken from Ponciano et al. 2009 ## Ecology 90, 356-362. paurelia <- c(17,29,39,63,185,258,267,392,510, 570,650,560,575,650,550,480,520,500) dat <- list(ncl=1, n=length(paurelia), Y=dcdim(data.matrix(paurelia))) beverton.holt <- function() { for (k in 1:ncl) { for(i in 2:(n+1)){ ## observations Y[(i-1), k] ~ dpois(exp(X[i, k])) ## state X[i, k] ~ dnorm(mu[i, k], 1 / sigma^2) mu[i, k] <- X[(i-1), k] + log(lambda) - log(1 + beta * exp(X[(i-1), k])) } ## state at t0 X[1, k] ~ dnorm(mu0, 1 / sigma^2) } # Priors on model parameters beta ~ dlnorm(-1, 1) sigma ~ dlnorm(0, 1) tmp ~ dlnorm(0, 1) lambda <- tmp + 1 mu0 <- log(2) + log(lambda) - log(1 + beta * 2) } mod <- dc.fit(dat, c("lambda","beta","sigma"), beverton.holt, n.clones=c(1, 2, 5, 10), multiply="ncl", unchanged="n") ## compare with results from the paper: ## beta = 0.00235 ## lambda = 2.274 ## sigma = 0.1274 summary(mod) ## Using WinBUGS/OpenBUGS library(R2WinBUGS) data(schools) dat <- list(J = nrow(schools), y = schools$estimate, sigma.y = schools$sd) bugs.model <- function(){ for (j in 1:J){ y[j] ~ dnorm (theta[j], tau.y[j]) theta[j] ~ dnorm (mu.theta, tau.theta) tau.y[j] <- pow(sigma.y[j], -2) } mu.theta ~ dnorm (0.0, 1.0E-6) tau.theta <- pow(sigma.theta, -2) sigma.theta ~ dunif (0, 1000) } inits <- function(){ list(theta=rnorm(nrow(schools), 0, 100), mu.theta=rnorm(1, 0, 100), sigma.theta=runif(1, 0, 100)) } param <- c("mu.theta", "sigma.theta") if (.Platform$OS.type == "windows") { sim2 <- dc.fit(dat, param, bugs.model, n.clones=1:2, flavour="bugs", program="WinBUGS", multiply="J", n.iter=2000, n.thin=1) summary(sim2) } sim3 <- dc.fit(dat, param, bugs.model, n.clones=1:2, flavour="bugs", program="brugs", multiply="J", n.iter=2000, n.thin=1) summary(sim3) library(R2OpenBUGS) sim4 <- dc.fit(dat, param, bugs.model, n.clones=1:2, flavour="bugs", program="openbugs", multiply="J", n.iter=2000, n.thin=1) summary(sim4) ## Using Stan if (require(rstan)) { model <- custommodel("data { int<lower=0> N; vector[N] y; vector[N] x; } parameters { real alpha; real beta; real<lower=0> sigma; } model { alpha ~ normal(0,10); beta ~ normal(0,10); sigma ~ cauchy(0,5); for (n in 1:N) y[n] ~ normal(alpha + beta * x[n], sigma); }") N <- 100 alpha <- 1 beta <- -1 sigma <- 0.5 x <- runif(N) y <- rnorm(N, alpha + beta * x, sigma) dat <- list(N=N, y=y, x=x) params <- c("alpha", "beta", "sigma") ## compile on 1st time only fit0 <- stan.fit(dat, params, model) ## reuse compiled fit0 dcfit <- dc.fit(dat, params, model, n.clones=1:2, flavour="stan", multiply="N", fit=fit0) summary(dcfit) stan.model(dcfit) dcdiag(dcfit) } ## End(Not run)
## Not run: ## simulation for Poisson GLMM set.seed(1234) n <- 20 beta <- c(2, -1) sigma <- 0.1 alpha <- rnorm(n, 0, sigma) x <- runif(n) X <- model.matrix(~x) linpred <- crossprod(t(X), beta) + alpha Y <- rpois(n, exp(linpred)) ## JAGS model as a function jfun1 <- function() { for (i in 1:n) { Y[i] ~ dpois(lambda[i]) log(lambda[i]) <- alpha[i] + inprod(X[i,], beta) alpha[i] ~ dnorm(0, 1/sigma^2) } for (j in 1:np) { beta[j] ~ dnorm(0, 0.001) } sigma ~ dlnorm(0, 0.001) } ## data jdata <- list(n = n, Y = Y, X = X, np = NCOL(X)) ## inits with latent variable and parameters ini <- list(alpha=rep(0,n), beta=rep(0, NCOL(X))) ## function to update inits ifun <- function(model, n.clones) { list(alpha=dclone(rep(0,n), n.clones), beta=coef(model)[-length(coef(model))]) } ## iteartive fitting jmod <- dc.fit(jdata, c("beta", "sigma"), jfun1, ini, n.clones = 1:5, multiply = "n", unchanged = "np", initsfun=ifun) ## summary with DC SE and R hat summary(jmod) dct <- dctable(jmod) plot(dct) ## How to use estimates to make priors more informative? glmm.model.up <- function() { for (i in 1:n) { Y[i] ~ dpois(lambda[i]) log(lambda[i]) <- alpha[i] + inprod(X[i,], beta[1,]) alpha[i] ~ dnorm(0, 1/sigma^2) } for (j in 1:p) { beta[1,j] ~ dnorm(priors[j,1], priors[j,2]) } sigma ~ dgamma(priors[(p+1),2], priors[(p+1),1]) } ## function for updating, x is an MCMC object upfun <- function(x) { if (missing(x)) { p <- ncol(X) return(cbind(c(rep(0, p), 0.001), rep(0.001, p+1))) } else { par <- coef(x) return(cbind(par, rep(0.01, length(par)))) } } updat <- list(n = n, Y = Y, X = X, p = ncol(X), priors = upfun()) dcmod <- dc.fit(updat, c("beta", "sigma"), glmm.model.up, n.clones = 1:5, multiply = "n", unchanged = "p", update = "priors", updatefun = upfun) summary(dcmod) ## time series example ## data and model taken from Ponciano et al. 2009 ## Ecology 90, 356-362. paurelia <- c(17,29,39,63,185,258,267,392,510, 570,650,560,575,650,550,480,520,500) dat <- list(ncl=1, n=length(paurelia), Y=dcdim(data.matrix(paurelia))) beverton.holt <- function() { for (k in 1:ncl) { for(i in 2:(n+1)){ ## observations Y[(i-1), k] ~ dpois(exp(X[i, k])) ## state X[i, k] ~ dnorm(mu[i, k], 1 / sigma^2) mu[i, k] <- X[(i-1), k] + log(lambda) - log(1 + beta * exp(X[(i-1), k])) } ## state at t0 X[1, k] ~ dnorm(mu0, 1 / sigma^2) } # Priors on model parameters beta ~ dlnorm(-1, 1) sigma ~ dlnorm(0, 1) tmp ~ dlnorm(0, 1) lambda <- tmp + 1 mu0 <- log(2) + log(lambda) - log(1 + beta * 2) } mod <- dc.fit(dat, c("lambda","beta","sigma"), beverton.holt, n.clones=c(1, 2, 5, 10), multiply="ncl", unchanged="n") ## compare with results from the paper: ## beta = 0.00235 ## lambda = 2.274 ## sigma = 0.1274 summary(mod) ## Using WinBUGS/OpenBUGS library(R2WinBUGS) data(schools) dat <- list(J = nrow(schools), y = schools$estimate, sigma.y = schools$sd) bugs.model <- function(){ for (j in 1:J){ y[j] ~ dnorm (theta[j], tau.y[j]) theta[j] ~ dnorm (mu.theta, tau.theta) tau.y[j] <- pow(sigma.y[j], -2) } mu.theta ~ dnorm (0.0, 1.0E-6) tau.theta <- pow(sigma.theta, -2) sigma.theta ~ dunif (0, 1000) } inits <- function(){ list(theta=rnorm(nrow(schools), 0, 100), mu.theta=rnorm(1, 0, 100), sigma.theta=runif(1, 0, 100)) } param <- c("mu.theta", "sigma.theta") if (.Platform$OS.type == "windows") { sim2 <- dc.fit(dat, param, bugs.model, n.clones=1:2, flavour="bugs", program="WinBUGS", multiply="J", n.iter=2000, n.thin=1) summary(sim2) } sim3 <- dc.fit(dat, param, bugs.model, n.clones=1:2, flavour="bugs", program="brugs", multiply="J", n.iter=2000, n.thin=1) summary(sim3) library(R2OpenBUGS) sim4 <- dc.fit(dat, param, bugs.model, n.clones=1:2, flavour="bugs", program="openbugs", multiply="J", n.iter=2000, n.thin=1) summary(sim4) ## Using Stan if (require(rstan)) { model <- custommodel("data { int<lower=0> N; vector[N] y; vector[N] x; } parameters { real alpha; real beta; real<lower=0> sigma; } model { alpha ~ normal(0,10); beta ~ normal(0,10); sigma ~ cauchy(0,5); for (n in 1:N) y[n] ~ normal(alpha + beta * x[n], sigma); }") N <- 100 alpha <- 1 beta <- -1 sigma <- 0.5 x <- runif(N) y <- rnorm(N, alpha + beta * x, sigma) dat <- list(N=N, y=y, x=x) params <- c("alpha", "beta", "sigma") ## compile on 1st time only fit0 <- stan.fit(dat, params, model) ## reuse compiled fit0 dcfit <- dc.fit(dat, params, model, n.clones=1:2, flavour="stan", multiply="N", fit=fit0) summary(dcfit) stan.model(dcfit) dcdiag(dcfit) } ## End(Not run)
Iterative model fitting on parallel workers with different numbers of clones.
dc.parfit(cl, data, params, model, inits, n.clones, multiply=NULL, unchanged=NULL, update = NULL, updatefun = NULL, initsfun = NULL, flavour = c("jags", "bugs", "stan"), n.chains = 3, partype=c("balancing", "parchains", "both"), return.all=FALSE, check.nclones=TRUE, ...)
dc.parfit(cl, data, params, model, inits, n.clones, multiply=NULL, unchanged=NULL, update = NULL, updatefun = NULL, initsfun = NULL, flavour = c("jags", "bugs", "stan"), n.chains = 3, partype=c("balancing", "parchains", "both"), return.all=FALSE, check.nclones=TRUE, ...)
cl |
A cluster object created by |
data |
A named list (or environment) containing the data. |
params |
Character vector of parameters to be sampled.
It can be a list of 2 vectors, 1st element
is used as parameters to monitor, the 2nd is used
as parameters to use in calculating the data cloning
diagnostics. ( |
model |
Character string (name of the model file), a function containing the
model, or a or |
inits |
Optional specification of initial values in the form of a list
or a function (see Initialization at |
n.clones |
An integer vector containing the numbers of clones to use iteratively. |
multiply |
Numeric or character index for list element(s) in the |
unchanged |
Numeric or character index for list element(s) in the |
update |
Numeric or character index for list element(s) in the |
updatefun |
A function to use for updating |
initsfun |
A function to use for generating initial values, |
flavour |
If |
partype |
Type of parallel workload distribution, see Details. |
n.chains |
Number of chains to generate. |
return.all |
Logical. If |
check.nclones |
Logical, whether to check and ensure that values of |
... |
Other values supplied to |
The dc.parfit
is a parallel computing version of
dc.fit
.
After parallel computations, temporary objects passed to workers and
the dclone package is cleaned up. It is not guaranteed
that objects already on the workers and independently loaded packages
are not affected. Best to start new instances beforehand.
partype="balancing"
distributes each model corresponding
to values in n.clones
as jobs to workers according to size
balancing (see parDosa
). partype="parchains"
makes repeated calls
to jags.parfit
for each value in n.clones
.
partype="both"
also calls jags.parfit
but
each chain of each cloned model is distributed as separate job to the
workers.
The vector n.clones
is used to determine size balancing.
If load balancing is also desired
besides of size balancing (e.g. due to unequal performance of
the workers, the option "dclone.LB"
should be set to TRUE
(by using options("dclone.LB" = TRUE)
).
By default, the "dclone.LB"
option is FALSE
for reproducibility reasons.
Some arguments from dc.fit
are not available in parallel
version (update
, updatefun
, initsfun
)
when size balancing is used
(partype
is "balancing"
or "both"
).
These arguments are evaluated only when partype="parchains"
.
Size balancing is recommended if n.clones
is a relatively long
vector, while parallel chains might be more efficient when
n.clones
has few elements.
For efficiency reasons, a combination of the two
(partype="both"
) is preferred if cluster size allows it.
Additionally loaded JAGS modules (e.g. "glm"
) need to be loaded
to the workers.
An object inheriting from the class 'mcmc.list'.
Peter Solymos
Lele, S.R., B. Dennis and F. Lutscher, 2007. Data cloning: easy maximum likelihood estimation for complex ecological models using Bayesian Markov chain Monte Carlo methods. Ecology Letters 10, 551–563.
Lele, S. R., K. Nadeem and B. Schmuland, 2010. Estimability and likelihood inference for generalized linear mixed models using data cloning. Journal of the American Statistical Association 105, 1617–1625.
Solymos, P., 2010. dclone: Data Cloning in R. The R Journal 2(2), 29–37. URL: https://journal.r-project.org/archive/2010-2/RJournal_2010-2_Solymos.pdf
Sequential version: dc.fit
.
Optimizing the number of workers: clusterSize
,
plotClusterSize
.
Underlying functions: jags.fit
,
bugs.fit
.
## Not run: set.seed(1234) n <- 20 x <- runif(n, -1, 1) X <- model.matrix(~x) beta <- c(2, -1) mu <- crossprod(t(X), beta) Y <- rpois(n, exp(mu)) glm.model <- function() { for (i in 1:n) { Y[i] ~ dpois(lambda[i]) log(lambda[i]) <- inprod(X[i,], beta[1,]) } for (j in 1:np) { beta[1,j] ~ dnorm(0, 0.001) } } dat <- list(Y=Y, X=X, n=n, np=ncol(X)) k <- 1:3 ## sequential version dcm <- dc.fit(dat, "beta", glm.model, n.clones=k, multiply="n", unchanged="np") ## parallel version cl <- makePSOCKcluster(3) pdcm1 <- dc.parfit(cl, dat, "beta", glm.model, n.clones=k, multiply="n", unchanged="np", partype="balancing") pdcm2 <- dc.parfit(cl, dat, "beta", glm.model, n.clones=k, multiply="n", unchanged="np", partype="parchains") pdcm3 <- dc.parfit(cl, dat, "beta", glm.model, n.clones=k, multiply="n", unchanged="np", partype="both") summary(dcm) summary(pdcm1) summary(pdcm2) summary(pdcm3) stopCluster(cl) ## multicore type forking if (.Platform$OS.type != "windows") { mcdcm1 <- dc.parfit(3, dat, "beta", glm.model, n.clones=k, multiply="n", unchanged="np", partype="balancing") mcdcm2 <- dc.parfit(3, dat, "beta", glm.model, n.clones=k, multiply="n", unchanged="np", partype="parchains") mcdcm3 <- dc.parfit(3, dat, "beta", glm.model, n.clones=k, multiply="n", unchanged="np", partype="both") } ## Using WinBUGS/OpenBUGS library(R2WinBUGS) data(schools) dat <- list(J = nrow(schools), y = schools$estimate, sigma.y = schools$sd) bugs.model <- function(){ for (j in 1:J){ y[j] ~ dnorm (theta[j], tau.y[j]) theta[j] ~ dnorm (mu.theta, tau.theta) tau.y[j] <- pow(sigma.y[j], -2) } mu.theta ~ dnorm (0.0, 1.0E-6) tau.theta <- pow(sigma.theta, -2) sigma.theta ~ dunif (0, 1000) } inits <- function(){ list(theta=rnorm(nrow(schools), 0, 100), mu.theta=rnorm(1, 0, 100), sigma.theta=runif(1, 0, 100)) } param <- c("mu.theta", "sigma.theta") cl <- makePSOCKcluster(2) if (.Platform$OS.type == "windows") { sim2 <- dc.parfit(cl, dat, param, bugs.model, n.clones=1:2, flavour="bugs", program="WinBUGS", multiply="J", n.iter=2000, n.thin=1) summary(sim2) } sim3 <- dc.parfit(cl, dat, param, bugs.model, n.clones=1:2, flavour="bugs", program="brugs", multiply="J", n.iter=2000, n.thin=1) summary(sim3) library(R2OpenBUGS) sim4 <- dc.parfit(cl, dat, param, bugs.model, n.clones=1:2, flavour="bugs", program="openbugs", multiply="J", n.iter=2000, n.thin=1) summary(sim4) stopCluster(cl) ## simulation for Poisson GLMM with inits set.seed(1234) n <- 5 beta <- c(2, -1) sigma <- 0.1 alpha <- rnorm(n, 0, sigma) x <- runif(n) X <- model.matrix(~x) linpred <- crossprod(t(X), beta) + alpha Y <- rpois(n, exp(linpred)) ## JAGS model as a function jfun1 <- function() { for (i in 1:n) { Y[i] ~ dpois(lambda[i]) log(lambda[i]) <- alpha[i] + inprod(X[i,], beta) alpha[i] ~ dnorm(0, 1/sigma^2) } for (j in 1:np) { beta[j] ~ dnorm(0, 0.001) } sigma ~ dlnorm(0, 0.001) } ## data jdata <- list(n = n, Y = Y, X = X, np = NCOL(X)) ## inits with latent variable and parameters ini <- list(alpha=rep(0,n), beta=rep(0, NCOL(X))) ## model arg is necessary as 1st arg, ## but not used when partype!=parchains ifun <- function(model, n.clones) { list(alpha=dclone(rep(0,n), n.clones), beta=c(0,0)) } ## make cluster cl <- makePSOCKcluster(2) ## pass global n variable used in ifun to workers tmp <- clusterExport(cl, "n") ## fit the model jmod2 <- dc.parfit(cl, jdata, c("beta", "sigma"), jfun1, ini, n.clones = 1:2, multiply = "n", unchanged = "np", initsfun=ifun, partype="balancing") stopCluster(cl) ## Using Stan if (require(rstan)) { model <- custommodel("data { int<lower=0> N; vector[N] y; vector[N] x; } parameters { real alpha; real beta; real<lower=0> sigma; } model { alpha ~ normal(0,10); beta ~ normal(0,10); sigma ~ cauchy(0,5); for (n in 1:N) y[n] ~ normal(alpha + beta * x[n], sigma); }") N <- 100 alpha <- 1 beta <- -1 sigma <- 0.5 x <- runif(N) y <- rnorm(N, alpha + beta * x, sigma) dat <- list(N=N, y=y, x=x) params <- c("alpha", "beta", "sigma") ## compile on 1st time only fit0 <- stan.fit(dat, params, model) if (.Platform$OS.type != "windows") { ## utilize compiled fit0 dcfit <- dc.parfit(cl=2, dat, params, model, n.clones=1:2, flavour="stan", multiply="N", fit=fit0) summary(dcfit) stan.model(dcfit) dcdiag(dcfit) } } ## End(Not run)
## Not run: set.seed(1234) n <- 20 x <- runif(n, -1, 1) X <- model.matrix(~x) beta <- c(2, -1) mu <- crossprod(t(X), beta) Y <- rpois(n, exp(mu)) glm.model <- function() { for (i in 1:n) { Y[i] ~ dpois(lambda[i]) log(lambda[i]) <- inprod(X[i,], beta[1,]) } for (j in 1:np) { beta[1,j] ~ dnorm(0, 0.001) } } dat <- list(Y=Y, X=X, n=n, np=ncol(X)) k <- 1:3 ## sequential version dcm <- dc.fit(dat, "beta", glm.model, n.clones=k, multiply="n", unchanged="np") ## parallel version cl <- makePSOCKcluster(3) pdcm1 <- dc.parfit(cl, dat, "beta", glm.model, n.clones=k, multiply="n", unchanged="np", partype="balancing") pdcm2 <- dc.parfit(cl, dat, "beta", glm.model, n.clones=k, multiply="n", unchanged="np", partype="parchains") pdcm3 <- dc.parfit(cl, dat, "beta", glm.model, n.clones=k, multiply="n", unchanged="np", partype="both") summary(dcm) summary(pdcm1) summary(pdcm2) summary(pdcm3) stopCluster(cl) ## multicore type forking if (.Platform$OS.type != "windows") { mcdcm1 <- dc.parfit(3, dat, "beta", glm.model, n.clones=k, multiply="n", unchanged="np", partype="balancing") mcdcm2 <- dc.parfit(3, dat, "beta", glm.model, n.clones=k, multiply="n", unchanged="np", partype="parchains") mcdcm3 <- dc.parfit(3, dat, "beta", glm.model, n.clones=k, multiply="n", unchanged="np", partype="both") } ## Using WinBUGS/OpenBUGS library(R2WinBUGS) data(schools) dat <- list(J = nrow(schools), y = schools$estimate, sigma.y = schools$sd) bugs.model <- function(){ for (j in 1:J){ y[j] ~ dnorm (theta[j], tau.y[j]) theta[j] ~ dnorm (mu.theta, tau.theta) tau.y[j] <- pow(sigma.y[j], -2) } mu.theta ~ dnorm (0.0, 1.0E-6) tau.theta <- pow(sigma.theta, -2) sigma.theta ~ dunif (0, 1000) } inits <- function(){ list(theta=rnorm(nrow(schools), 0, 100), mu.theta=rnorm(1, 0, 100), sigma.theta=runif(1, 0, 100)) } param <- c("mu.theta", "sigma.theta") cl <- makePSOCKcluster(2) if (.Platform$OS.type == "windows") { sim2 <- dc.parfit(cl, dat, param, bugs.model, n.clones=1:2, flavour="bugs", program="WinBUGS", multiply="J", n.iter=2000, n.thin=1) summary(sim2) } sim3 <- dc.parfit(cl, dat, param, bugs.model, n.clones=1:2, flavour="bugs", program="brugs", multiply="J", n.iter=2000, n.thin=1) summary(sim3) library(R2OpenBUGS) sim4 <- dc.parfit(cl, dat, param, bugs.model, n.clones=1:2, flavour="bugs", program="openbugs", multiply="J", n.iter=2000, n.thin=1) summary(sim4) stopCluster(cl) ## simulation for Poisson GLMM with inits set.seed(1234) n <- 5 beta <- c(2, -1) sigma <- 0.1 alpha <- rnorm(n, 0, sigma) x <- runif(n) X <- model.matrix(~x) linpred <- crossprod(t(X), beta) + alpha Y <- rpois(n, exp(linpred)) ## JAGS model as a function jfun1 <- function() { for (i in 1:n) { Y[i] ~ dpois(lambda[i]) log(lambda[i]) <- alpha[i] + inprod(X[i,], beta) alpha[i] ~ dnorm(0, 1/sigma^2) } for (j in 1:np) { beta[j] ~ dnorm(0, 0.001) } sigma ~ dlnorm(0, 0.001) } ## data jdata <- list(n = n, Y = Y, X = X, np = NCOL(X)) ## inits with latent variable and parameters ini <- list(alpha=rep(0,n), beta=rep(0, NCOL(X))) ## model arg is necessary as 1st arg, ## but not used when partype!=parchains ifun <- function(model, n.clones) { list(alpha=dclone(rep(0,n), n.clones), beta=c(0,0)) } ## make cluster cl <- makePSOCKcluster(2) ## pass global n variable used in ifun to workers tmp <- clusterExport(cl, "n") ## fit the model jmod2 <- dc.parfit(cl, jdata, c("beta", "sigma"), jfun1, ini, n.clones = 1:2, multiply = "n", unchanged = "np", initsfun=ifun, partype="balancing") stopCluster(cl) ## Using Stan if (require(rstan)) { model <- custommodel("data { int<lower=0> N; vector[N] y; vector[N] x; } parameters { real alpha; real beta; real<lower=0> sigma; } model { alpha ~ normal(0,10); beta ~ normal(0,10); sigma ~ cauchy(0,5); for (n in 1:N) y[n] ~ normal(alpha + beta * x[n], sigma); }") N <- 100 alpha <- 1 beta <- -1 sigma <- 0.5 x <- runif(N) y <- rnorm(N, alpha + beta * x, sigma) dat <- list(N=N, y=y, x=x) params <- c("alpha", "beta", "sigma") ## compile on 1st time only fit0 <- stan.fit(dat, params, model) if (.Platform$OS.type != "windows") { ## utilize compiled fit0 dcfit <- dc.parfit(cl=2, dat, params, model, n.clones=1:2, flavour="stan", multiply="N", fit=fit0) summary(dcfit) stan.model(dcfit) dcdiag(dcfit) } } ## End(Not run)
Makes clones of R objects, that is values in the object are repeated times,
leaving the original structure of the object intact (in most of the cases).
dclone(x, n.clones=1, ...) ## Default S3 method: dclone(x, n.clones = 1, attrib=TRUE, ...) ## S3 method for class 'dcdim' dclone(x, n.clones = 1, attrib=TRUE, ...) ## S3 method for class 'dciid' dclone(x, n.clones = 1, attrib=TRUE, ...) ## S3 method for class 'dctr' dclone(x, n.clones = 1, attrib=TRUE, ...) ## S3 method for class 'list' dclone(x, n.clones = 1, multiply = NULL, unchanged = NULL, attrib=TRUE, ...) ## S3 method for class 'environment' dclone(x, n.clones = 1, multiply = NULL, unchanged = NULL, attrib=TRUE, ...) dcdim(x, drop = TRUE, perm = NULL) dciid(x, iid=character(0)) dctr(x)
dclone(x, n.clones=1, ...) ## Default S3 method: dclone(x, n.clones = 1, attrib=TRUE, ...) ## S3 method for class 'dcdim' dclone(x, n.clones = 1, attrib=TRUE, ...) ## S3 method for class 'dciid' dclone(x, n.clones = 1, attrib=TRUE, ...) ## S3 method for class 'dctr' dclone(x, n.clones = 1, attrib=TRUE, ...) ## S3 method for class 'list' dclone(x, n.clones = 1, multiply = NULL, unchanged = NULL, attrib=TRUE, ...) ## S3 method for class 'environment' dclone(x, n.clones = 1, multiply = NULL, unchanged = NULL, attrib=TRUE, ...) dcdim(x, drop = TRUE, perm = NULL) dciid(x, iid=character(0)) dctr(x)
x |
An R object to be cloned, or a cloned object to print. |
n.clones |
Number of clones. |
multiply |
Numeric or character index for list element(s) to be multiplied by
|
unchanged |
Numeric or character index for list element(s) to be left unchanged. |
attrib |
Logical, |
drop |
Logical, if |
perm |
The subscript permutation value, if the cloning dimension is not the last. |
iid |
Character (or optionally numeric or logical).
Column(s) to be treated as i.i.d. observations.
Ignored when |
... |
Other arguments passed to function. |
dclone
is a generic function for cloning objects.
It is separate from rep
,
because there are different ways of cloning, depending on the
BUGS
code implementation:
(1) Unchanged: no cloning at all (for e.g. constants).
(2) Repeat: this is the most often used cloning method,
repeating the observations row-wise as if there
were more samples. The dctr
option allows repeating
the data column-wise.
(3) Multiply: sometimes it is enough to multiply the numbers (e.g. for Binomial distribution).
(4) Add dimension: under specific circumstances, it is easier to
add another dimension for clones,
but otherwise repeat the observations (e.g. in case of time series,
or for addressing special
indexing conventions in the BUGS
code, see examples
dcdim
and dclone.dcdim
).
(5) Repeat pattern (i.i.d.): this is useful for example when
a grouping variable is considered, and more i.i.d. groups are to be
added to the data set. E.g. c(1, 1, 2, 2)
is to be cloned as
c(1, 1, 2, 2, 3, 3, 4, 4)
instead of
c(1, 1, 2, 2, 1, 1, 2, 2)
.
An object with class attributes "dclone"
plus the original
one(s). Dimensions of the original object might change according to
n.clones
. The function tries to take care of names, sometimes
replacing those with the combination of the original names and an
integer for number of clones.
dcdim
sets the class attribute of an object to "dcdim"
,
thus dclone
will clone the object by adding an extra dimension
for the clones.
dciid
sets the class attribute of an object to "dciid"
,
thus dclone
will clone the object by treating columns
defined by the iid
argument as i.i.d. observations.
These columns must be numeric.
This aims to facilitates working with the INLA
package to generate approximate marginals based on DC.
Columns specified by iid
will be replaced by an increasing
sequence of values respecting possible grouping structure (see
Examples).
Lists (i.e. BUGS data objects) are handled differently to enable element
specific determination of the mode of cloning. This can be done via the
unchanged
and multiply
arguments, or by setting the
behaviour by the dcdim
function.
Environments are coerced into a list, and return value is identical to
dclone(as.list(x), ...)
.
Peter Solymos, implementation is based on many discussions with Khurram Nadeem and Subhash Lele.
Lele, S.R., B. Dennis and F. Lutscher, 2007. Data cloning: easy maximum likelihood estimation for complex ecological models using Bayesian Markov chain Monte Carlo methods. Ecology Letters 10, 551–563.
Lele, S. R., K. Nadeem and B. Schmuland, 2010. Estimability and likelihood inference for generalized linear mixed models using data cloning. Journal of the American Statistical Association 105, 1617–1625.
Solymos, P., 2010. dclone: Data Cloning in R. The R Journal 2(2), 29–37. URL: https://journal.r-project.org/archive/2010-2/RJournal_2010-2_Solymos.pdf
## scalar dclone(4, 2) ## vector (x <- 1:6) dclone(x, 2) ## matrix (m <- matrix(x, 2, 3)) dclone(m, 2) ## data frame (dfr <- as.data.frame(t(m))) dclone(dfr, 2) ## list (l <- list(n = 10, y = 1:10, x = 1:10, p = 1)) dclone(l, 2) dclone(as.environment(l), 2) dclone(l, 2, attrib = FALSE) dclone(l, 2, multiply = "n", unchanged = "p") ## effect of dcdim l$y <- dcdim(l$y) dclone(l, 2, multiply = "n", unchanged = "p") ## time series like usage of dcdim z <- data.matrix(rnorm(10)) dclone(dcdim(z), 2) ## usage if dciid ll <- dciid(data.frame(x=1:10, y=1:10), iid="y") dclone(ll, 2) ## respecting grouping structure in iid ll$y <- rep(1:5, each=2) (dci <- dclone(ll, 2)) nclones(dci) ## repeating the data column-wise dclone(dctr(m), 2)
## scalar dclone(4, 2) ## vector (x <- 1:6) dclone(x, 2) ## matrix (m <- matrix(x, 2, 3)) dclone(m, 2) ## data frame (dfr <- as.data.frame(t(m))) dclone(dfr, 2) ## list (l <- list(n = 10, y = 1:10, x = 1:10, p = 1)) dclone(l, 2) dclone(as.environment(l), 2) dclone(l, 2, attrib = FALSE) dclone(l, 2, multiply = "n", unchanged = "p") ## effect of dcdim l$y <- dcdim(l$y) dclone(l, 2, multiply = "n", unchanged = "p") ## time series like usage of dcdim z <- data.matrix(rnorm(10)) dclone(dcdim(z), 2) ## usage if dciid ll <- dciid(data.frame(x=1:10, y=1:10), iid="y") dclone(ll, 2) ## respecting grouping structure in iid ll$y <- rep(1:5, each=2) (dci <- dclone(ll, 2)) nclones(dci) ## repeating the data column-wise dclone(dctr(m), 2)
Manipulating dclone environments.
pullDcloneEnv(x, type = c("model", "results")) pushDcloneEnv(x, value, type = c("model", "results")) clearDcloneEnv(..., list = character(), type = c("model", "results")) listDcloneEnv(type = c("model", "results")) existsDcloneEnv(x, type = c("model", "results"), mode = "any", inherits = TRUE)
pullDcloneEnv(x, type = c("model", "results")) pushDcloneEnv(x, value, type = c("model", "results")) clearDcloneEnv(..., list = character(), type = c("model", "results")) listDcloneEnv(type = c("model", "results")) existsDcloneEnv(x, type = c("model", "results"), mode = "any", inherits = TRUE)
x |
a variable name, given as a character string. No coercion is done, and the first element of a character vector of length greater than one will be used, with a warning. |
value |
a value to be assigned to |
type |
character, the type of environment to be accessed, see Details. |
... |
the objects to be removed, as names (unquoted) or character strings (quoted). |
list |
a character vector naming objects to be removed. |
mode |
the mode or type of object sought: see the |
inherits |
logical, should the enclosing frames of the environment be searched? |
type = "model"
manipulates the .DcloneEnvModel
environment, which is meant to store temporary objects
for model fitting with ‘snow’ type parallelism
(see parDosa
for the implementation).
This is swiped clean after use.
Thetype = "results"
manipulates the .DcloneEnvResults
environment, which is meant to store result objects on the workers.
This is not swiped clean after use.
pullDcloneEnv
pulls an object from these environments,
similar to get
in effect.
pushDcloneEnv
pushes an object to these environments,
similar to assign
in effect.
clearDcloneEnv
removes object(s) from these environments,
similar to rm
in effect.
listDcloneEnv
lists name(s) of object(s) in these environments,
similar to ls
in effect.
existsDcloneEnv
tests if an object exists in these environments,
similar to exists
in effect.
For pullDcloneEnv
, the object found.
If no object is found an error results.
pushDcloneEnv
is invoked for its side effect,
which is assigning value
to the variable x
.
For clearDcloneEnv
its is the side effect of an object removed.
No value returned.
listDcloneEnv
returns a character vector.
existsDcloneEnv
returns logical, TRUE
if and only if an
object of the correct name and mode is found.
Peter Solymos
Setting options.
dcoptions(...)
dcoptions(...)
... |
Arguments in |
dcoptions
is a convenient way of handling options related to the
package.
When parameters are set by dcoptions
, their former values are
returned in an invisible named list. Such a list can be passed as an
argument to dcoptions
to restore the parameter values.
Tags are the following:
autoburnin |
logical, to use in |
diag |
critical value to use for data cloning convergence diagnostics, default is 0.05. |
LB |
logical, should load balancing be used,
default is |
overwrite |
logical, should existing model file be overwritten,
default is |
rhat |
critical value for testing chain convergence, default is 1.1. |
RNG |
parallel RNG type, either
|
verbose |
integer, should output be verbose (>0) or not (0), default is 1. |
Peter Solymos
## set LB option, but store old value ov <- dcoptions("LB"=TRUE) ## this is old value ov ## this is new value getOption("dcoptions") ## reset to old value dcoptions(ov) ## check reset getOption("dcoptions")
## set LB option, but store old value ov <- dcoptions("LB"=TRUE) ## this is old value ov ## this is new value getOption("dcoptions") ## reset to old value dcoptions(ov) ## check reset getOption("dcoptions")
The function is used to retrieve descriptive statistics from fitted objects on order to evaluate convergence of the data cloning algorithm. This is best done via visual display of the results, separately for each parameters of interest.
dctable(x, ...) ## Default S3 method: dctable(x, ...) ## S3 method for class 'dctable' plot(x, which = 1:length(x), type = c("all", "var", "log.var"), position = "topright", box.cex = 0.75, box.bg, ...) extractdctable(x, ...) ## Default S3 method: extractdctable(x, ...) dcdiag(x, ...) ## Default S3 method: dcdiag(x, ...) ## S3 method for class 'dcdiag' plot(x, which = c("all", "lambda.max", "ms.error", "r.squared", "log.lambda.max"), position = "topright", ...) extractdcdiag(x, ...) ## Default S3 method: extractdcdiag(x, ...)
dctable(x, ...) ## Default S3 method: dctable(x, ...) ## S3 method for class 'dctable' plot(x, which = 1:length(x), type = c("all", "var", "log.var"), position = "topright", box.cex = 0.75, box.bg, ...) extractdctable(x, ...) ## Default S3 method: extractdctable(x, ...) dcdiag(x, ...) ## Default S3 method: dcdiag(x, ...) ## S3 method for class 'dcdiag' plot(x, which = c("all", "lambda.max", "ms.error", "r.squared", "log.lambda.max"), position = "topright", ...) extractdcdiag(x, ...) ## Default S3 method: extractdcdiag(x, ...)
x |
An MCMC or a 'dctable' object. |
... |
Optionally more fitted model objects for function |
which |
What to plot. For |
type |
Type of plot to be drawn. See Details. |
position |
Position for the legend, as for |
box.cex |
Scaling factor for the interquartile boxes. |
box.bg |
Background color for the interquartile boxes. |
dctable
returns the "dctable"
attribute of the MCMC
object, or if it is NULL
, calculates the dctable
summaries. If more than one fitted objects are provided, summaries are
calculated for all objects, and results are ordered by the number of
clones.
The plot
method for dctable
helps in graphical
representation of the descriptive statistics.
type = "all"
results in plotting means,
standard deviations and quantiles
against the number of clones as boxplot. type = "var"
results in plotting the scaled variances
against the number of clones. In this case variances are
divided by the variance of the
model with smallest number of clones, min(n.clones)
.
type = "log.var"
is the same as "var"
,
but on the log scale. Along with the values, the
min(n.clones) / n.clones
line is plotted for reference.
Lele et al. (2010) introduced diagnostic measures
for checking the convergence of the data cloning algorithm
which are based on the joint posterior distribution
and not only on single parameters. These
include to calculate the largest eigenvalue of the posterior
variance covariance matrix (lambda.max
as returned by
lambdamax.diag
),
or to calculate mean squared error (ms.error
) and another
correlation-like fit statistic (r.squared
) based on a
Chi-squared approximation
(as returned by chisq.diag
). The maximum
eigenvalue reflects the degenerateness of the
posterior distribution, while the two fit measures reflect
if the Normal approximation is adequate. All
three statistics should converge to zero as the number of clones
increases. If this happens, different prior specifications are no
longer influencing the results (Lele et al., 2007, 2010).
These are conveniently collected by the dcdiag
function.
IMPORTANT! Have you checked if different prior specifications lead to the same results?
An object of class 'dctable'. It is a list, and contains as many data frames as the number of parameters in the fitted object. Each data frame contains descriptives as the function of the number of clones.
dcdiag
returns a data frame with convergence diagnostics.
The plot
methods produce graphs as side effect.
Peter Solymos, implementation is based on many discussions with Khurram Nadeem and Subhash Lele.
Lele, S.R., B. Dennis and F. Lutscher, 2007. Data cloning: easy maximum likelihood estimation for complex ecological models using Bayesian Markov chain Monte Carlo methods. Ecology Letters 10, 551–563.
Lele, S. R., K. Nadeem and B. Schmuland, 2010. Estimability and likelihood inference for generalized linear mixed models using data cloning. Journal of the American Statistical Association 105, 1617–1625.
Solymos, P., 2010. dclone: Data Cloning in R. The R Journal 2(2), 29–37. URL: https://journal.r-project.org/archive/2010-2/RJournal_2010-2_Solymos.pdf
Data cloning: dclone
Model fitting: jags.fit
, bugs.fit
,
dc.fit
## Not run: ## simulation for Poisson GLMM set.seed(1234) n <- 20 beta <- c(2, -1) sigma <- 0.1 alpha <- rnorm(n, 0, sigma) x <- runif(n) X <- model.matrix(~x) linpred <- crossprod(t(X), beta) + alpha Y <- rpois(n, exp(linpred)) ## JAGS model as a function jfun1 <- function() { for (i in 1:n) { Y[i] ~ dpois(lambda[i]) log(lambda[i]) <- alpha[i] + inprod(X[i,], beta[1,]) alpha[i] ~ dnorm(0, 1/sigma^2) } for (j in 1:np) { beta[1,j] ~ dnorm(0, 0.001) } sigma ~ dlnorm(0, 0.001) } ## data jdata <- list(n = n, Y = Y, X = X, np = NCOL(X)) ## number of clones to be used, etc. ## iteartive fitting jmod <- dc.fit(jdata, c("beta", "sigma"), jfun1, n.clones = 1:5, multiply = "n", unchanged = "np") ## summary with DC SE and R hat summary(jmod) dct <- dctable(jmod) plot(dct) ## How to use estimates to make priors more informative? glmm.model.up <- function() { for (i in 1:n) { Y[i] ~ dpois(lambda[i]) log(lambda[i]) <- alpha[i] + inprod(X[i,], beta[1,]) alpha[i] ~ dnorm(0, 1/sigma^2) } for (j in 1:p) { beta[1,j] ~ dnorm(priors[j,1], priors[j,2]) } sigma ~ dgamma(priors[(p+1),2], priors[(p+1),1]) } ## function for updating, x is an MCMC object upfun <- function(x) { if (missing(x)) { p <- ncol(X) return(cbind(c(rep(0, p), 0.001), rep(0.001, p+1))) } else { par <- coef(x) return(cbind(par, rep(0.01, length(par)))) } } updat <- list(n = n, Y = Y, X = X, p = ncol(X), priors = upfun()) dcmod <- dc.fit(updat, c("beta", "sigma"), glmm.model.up, n.clones = 1:5, multiply = "n", unchanged = "p", update = "priors", updatefun = upfun) summary(dcmod) dct <- dctable(dcmod) plot(dct) plot(dct, type = "var") ## End(Not run)
## Not run: ## simulation for Poisson GLMM set.seed(1234) n <- 20 beta <- c(2, -1) sigma <- 0.1 alpha <- rnorm(n, 0, sigma) x <- runif(n) X <- model.matrix(~x) linpred <- crossprod(t(X), beta) + alpha Y <- rpois(n, exp(linpred)) ## JAGS model as a function jfun1 <- function() { for (i in 1:n) { Y[i] ~ dpois(lambda[i]) log(lambda[i]) <- alpha[i] + inprod(X[i,], beta[1,]) alpha[i] ~ dnorm(0, 1/sigma^2) } for (j in 1:np) { beta[1,j] ~ dnorm(0, 0.001) } sigma ~ dlnorm(0, 0.001) } ## data jdata <- list(n = n, Y = Y, X = X, np = NCOL(X)) ## number of clones to be used, etc. ## iteartive fitting jmod <- dc.fit(jdata, c("beta", "sigma"), jfun1, n.clones = 1:5, multiply = "n", unchanged = "np") ## summary with DC SE and R hat summary(jmod) dct <- dctable(jmod) plot(dct) ## How to use estimates to make priors more informative? glmm.model.up <- function() { for (i in 1:n) { Y[i] ~ dpois(lambda[i]) log(lambda[i]) <- alpha[i] + inprod(X[i,], beta[1,]) alpha[i] ~ dnorm(0, 1/sigma^2) } for (j in 1:p) { beta[1,j] ~ dnorm(priors[j,1], priors[j,2]) } sigma ~ dgamma(priors[(p+1),2], priors[(p+1),1]) } ## function for updating, x is an MCMC object upfun <- function(x) { if (missing(x)) { p <- ncol(X) return(cbind(c(rep(0, p), 0.001), rep(0.001, p+1))) } else { par <- coef(x) return(cbind(par, rep(0.01, length(par)))) } } updat <- list(n = n, Y = Y, X = X, p = ncol(X), priors = upfun()) dcmod <- dc.fit(updat, c("beta", "sigma"), glmm.model.up, n.clones = 1:5, multiply = "n", unchanged = "p", update = "priors", updatefun = upfun) summary(dcmod) dct <- dctable(dcmod) plot(dct) plot(dct, type = "var") ## End(Not run)
The function plots error bars to existing plot.
errlines(x, ...) ## Default S3 method: errlines(x, y, type = "l", code = 0, width = 0, vertical = TRUE, col = 1, bg = NA, ...)
errlines(x, ...) ## Default S3 method: errlines(x, y, type = "l", code = 0, width = 0, vertical = TRUE, col = 1, bg = NA, ...)
x |
Numeric vector with coordinates along the horizontal axis
(if |
y |
A matrix-like object with 2 columns for lower and upper values on the
vertical axis (if |
type |
Character, |
code |
Integer code, determining the kind of ticks to be drawn. See Details. |
width |
Numeric, width of the ticks (if |
vertical |
Logical, if errorbars should be plotted vertically or horizontally. |
col |
Color of the error lines to be drawn, recycled if needed. |
bg |
If |
... |
Other arguments passed to the function |
The errlines
function uses lines
to draw error bars
to existing plot when type = "l"
.
polygon
is used for boxes when type = "b"
.
If code = 0
no ticks are drawn, if code = 1
, only
lower ticks are drawn, if code = 2
only
lower ticks are drawn, if code = 3
both
lower and upper ticks are drawn.
Adds error bars to an existing plot as a side effect.
Returns NULL
invisibly.
Peter Solymos
x <- 1:10 a <- rnorm(10,10) a <- a[order(a)] b <- runif(10) y <- cbind(a-b, a+b+rev(b)) opar <- par(mfrow=c(2, 3)) plot(x, a, ylim = range(y)) errlines(x, y) plot(x, a, ylim = range(y)) errlines(x, y, width = 0.5, code = 1) plot(x, a, ylim = range(y), col = 1:10) errlines(x, y, width = 0.5, code = 3, col = 1:10) plot(x, a, ylim = range(y)) errlines(x, y, width = 0.5, code = 2, type = "b") plot(x, a, ylim = range(y)) errlines(x, y, width = 0.5, code = 3, type = "b") plot(x, a, ylim = range(y), type = "n") errlines(x, y, width = 0.5, code = 3, type = "b", bg = 1:10) errlines(x, cbind(a-b/2, a+b/2+rev(b)/2)) points(x, a) par(opar)
x <- 1:10 a <- rnorm(10,10) a <- a[order(a)] b <- runif(10) y <- cbind(a-b, a+b+rev(b)) opar <- par(mfrow=c(2, 3)) plot(x, a, ylim = range(y)) errlines(x, y) plot(x, a, ylim = range(y)) errlines(x, y, width = 0.5, code = 1) plot(x, a, ylim = range(y), col = 1:10) errlines(x, y, width = 0.5, code = 3, col = 1:10) plot(x, a, ylim = range(y)) errlines(x, y, width = 0.5, code = 2, type = "b") plot(x, a, ylim = range(y)) errlines(x, y, width = 0.5, code = 3, type = "b") plot(x, a, ylim = range(y), type = "n") errlines(x, y, width = 0.5, code = 3, type = "b", bg = 1:10) errlines(x, cbind(a-b/2, a+b/2+rev(b)/2)) points(x, a) par(opar)
Evaluates parallel argument.
evalParallelArgument(cl, quit = FALSE)
evalParallelArgument(cl, quit = FALSE)
cl |
|
quit |
Logical, whether it should stop with error when ambiguous parallel definition is found (conflicting default environmental variable settings). |
NULL
for sequential evaluation or
the original value of cl
if parallel
evaluation is meaningful.
Peter Solymos
evalParallelArgument() evalParallelArgument(NULL) evalParallelArgument(1) evalParallelArgument(2) cl <- makePSOCKcluster(2) evalParallelArgument(cl) stopCluster(cl) oop <- options("mc.cores"=2) evalParallelArgument() options(oop)
evalParallelArgument() evalParallelArgument(NULL) evalParallelArgument(1) evalParallelArgument(2) cl <- makePSOCKcluster(2) evalParallelArgument(cl) stopCluster(cl) oop <- options("mc.cores"=2) evalParallelArgument() options(oop)
Convenient functions designed to work well with cloned data arguments and JAGS.
jags.fit(data, params, model, inits = NULL, n.chains = 3, n.adapt = 1000, n.update = 1000, thin = 1, n.iter = 5000, updated.model = TRUE, ...)
jags.fit(data, params, model, inits = NULL, n.chains = 3, n.adapt = 1000, n.update = 1000, thin = 1, n.iter = 5000, updated.model = TRUE, ...)
data |
A named list or environment containing the data.
If an environment, |
params |
Character vector of parameters to be sampled. |
model |
Character string (name of the model file), a function containing
the model, or a or |
inits |
Optional specification of initial values in the form of a
list or a function (see Initialization at
|
n.chains |
Number of chains to generate. |
n.adapt |
Number of steps for adaptation. |
n.update |
Number of updates before iterations.
It is usually a bad idea to use |
thin |
Thinning value. |
n.iter |
Number of iterations. |
updated.model |
Logical, if the updated model should be attached as attribute
(this can be used to further update if convergence
was not satisfactory, see |
... |
Further arguments passed to |
An mcmc.list
object. If data cloning is used via the
data
argument, summary
returns a modified summary
containing scaled data cloning standard errors
(scaled by sqrt(n.clones)
, see dcsd
),
and values
(as returned by
gelman.diag
).
Peter Solymos
Underlying functions: jags.model
,
update.jags
,
coda.samples
Parallel chain computations: jags.parfit
Methods: dcsd
, confint.mcmc.list.dc
,
coef.mcmc.list
, quantile.mcmc.list
,
vcov.mcmc.list.dc
## Not run: if (require(rjags)) { ## simple regression example from the JAGS manual jfun <- function() { for (i in 1:N) { Y[i] ~ dnorm(mu[i], tau) mu[i] <- alpha + beta * (x[i] - x.bar) } x.bar <- mean(x[]) alpha ~ dnorm(0.0, 1.0E-4) beta ~ dnorm(0.0, 1.0E-4) sigma <- 1.0/sqrt(tau) tau ~ dgamma(1.0E-3, 1.0E-3) } ## data generation set.seed(1234) N <- 100 alpha <- 1 beta <- -1 sigma <- 0.5 x <- runif(N) linpred <- crossprod(t(model.matrix(~x)), c(alpha, beta)) Y <- rnorm(N, mean = linpred, sd = sigma) ## list of data for the model jdata <- list(N = N, Y = Y, x = x) ## what to monitor jpara <- c("alpha", "beta", "sigma") ## fit the model with JAGS regmod <- jags.fit(jdata, jpara, jfun, n.chains = 3) ## model summary summary(regmod) ## data cloning dcdata <- dclone(jdata, 5, multiply = "N") dcmod <- jags.fit(dcdata, jpara, jfun, n.chains = 3) summary(dcmod) } ## End(Not run)
## Not run: if (require(rjags)) { ## simple regression example from the JAGS manual jfun <- function() { for (i in 1:N) { Y[i] ~ dnorm(mu[i], tau) mu[i] <- alpha + beta * (x[i] - x.bar) } x.bar <- mean(x[]) alpha ~ dnorm(0.0, 1.0E-4) beta ~ dnorm(0.0, 1.0E-4) sigma <- 1.0/sqrt(tau) tau ~ dgamma(1.0E-3, 1.0E-3) } ## data generation set.seed(1234) N <- 100 alpha <- 1 beta <- -1 sigma <- 0.5 x <- runif(N) linpred <- crossprod(t(model.matrix(~x)), c(alpha, beta)) Y <- rnorm(N, mean = linpred, sd = sigma) ## list of data for the model jdata <- list(N = N, Y = Y, x = x) ## what to monitor jpara <- c("alpha", "beta", "sigma") ## fit the model with JAGS regmod <- jags.fit(jdata, jpara, jfun, n.chains = 3) ## model summary summary(regmod) ## data cloning dcdata <- dclone(jdata, 5, multiply = "N") dcmod <- jags.fit(dcdata, jpara, jfun, n.chains = 3) summary(dcmod) } ## End(Not run)
Does the same job as jags.fit
,
but parallel chains are run on parallel workers, thus
computations can be faster (up to 1/n.chains
) for long MCMC runs.
jags.parfit(cl, data, params, model, inits = NULL, n.chains = 3, ...)
jags.parfit(cl, data, params, model, inits = NULL, n.chains = 3, ...)
cl |
A cluster object created by |
data |
A named list or environment containing the data. If an environment,
|
params |
Character vector of parameters to be sampled. |
model |
Character string (name of the model file), a function
containing the model, or a or |
inits |
Specification of initial values in the form of a
list or a function, can be missing.
Missing value setting can include RNG seed information,
see Initialization at |
n.chains |
Number of chains to generate, must be higher than 1. Ideally, this is equal to the number of parallel workers in the cluster. |
... |
Other arguments passed to |
Chains are run on parallel workers, and the results are combined in the end.
No update method is available for parallel mcmc.list
objects.
See parUpdate
and related parallel functions
(parJagsModel
, parCodaSamples
)
for such purpose.
Additionally loaded JAGS modules (e.g. "glm"
,
"lecuyer"
) need to be loaded to the workers
when using 'snow' type cluster as cl
argument. See Examples.
The use of the "lecuyer"
module is recommended when
running more than 4 chains. See Examples and
parallel.inits
.
An mcmc.list
object.
Peter Solymos
Sequential version: jags.fit
Function for stepwise modeling with JAGS: parJagsModel
,
parUpdate
, parCodaSamples
## Not run: if (require(rjags)) { set.seed(1234) n <- 20 x <- runif(n, -1, 1) X <- model.matrix(~x) beta <- c(2, -1) mu <- crossprod(t(X), beta) Y <- rpois(n, exp(mu)) glm.model <- function() { for (i in 1:n) { Y[i] ~ dpois(lambda[i]) log(lambda[i]) <- inprod(X[i,], beta[1,]) } for (j in 1:np) { beta[1,j] ~ dnorm(0, 0.001) } } dat <- list(Y=Y, X=X, n=n, np=ncol(X)) load.module("glm") m <- jags.fit(dat, "beta", glm.model) cl <- makePSOCKcluster(3) ## load glm module tmp <- clusterEvalQ(cl, library(dclone)) parLoadModule(cl, "glm") pm <- jags.parfit(cl, dat, "beta", glm.model) ## chains are not identical -- this is good pm[1:2,] summary(pm) ## examples on how to use initial values ## fixed initial values inits <- list(list(beta=matrix(c(0,1),1,2)), list(beta=matrix(c(1,0),1,2)), list(beta=matrix(c(0,0),1,2))) pm2 <- jags.parfit(cl, dat, "beta", glm.model, inits) ## random numbers generated prior to jags.parfit inits <- list(list(beta=matrix(rnorm(2),1,2)), list(beta=matrix(rnorm(2),1,2)), list(beta=matrix(rnorm(2),1,2))) pm3 <- jags.parfit(cl, dat, "beta", glm.model, inits) ## self contained function inits <- function() list(beta=matrix(rnorm(2),1,2)) pm4 <- jags.parfit(cl, dat, "beta", glm.model, inits) ## function pointing to the global environment fun <- function() list(beta=matrix(rnorm(2),1,2)) inits <- function() fun() clusterExport(cl, "fun") ## using the L'Ecuyer module with 6 chains load.module("lecuyer") parLoadModule(cl,"lecuyer") pm5 <- jags.parfit(cl, dat, "beta", glm.model, inits, n.chains=6) nchain(pm5) unload.module("lecuyer") parUnloadModule(cl,"lecuyer") stopCluster(cl) ## multicore type forking if (.Platform$OS.type != "windows") { pm6 <- jags.parfit(3, dat, "beta", glm.model) } } ## End(Not run)
## Not run: if (require(rjags)) { set.seed(1234) n <- 20 x <- runif(n, -1, 1) X <- model.matrix(~x) beta <- c(2, -1) mu <- crossprod(t(X), beta) Y <- rpois(n, exp(mu)) glm.model <- function() { for (i in 1:n) { Y[i] ~ dpois(lambda[i]) log(lambda[i]) <- inprod(X[i,], beta[1,]) } for (j in 1:np) { beta[1,j] ~ dnorm(0, 0.001) } } dat <- list(Y=Y, X=X, n=n, np=ncol(X)) load.module("glm") m <- jags.fit(dat, "beta", glm.model) cl <- makePSOCKcluster(3) ## load glm module tmp <- clusterEvalQ(cl, library(dclone)) parLoadModule(cl, "glm") pm <- jags.parfit(cl, dat, "beta", glm.model) ## chains are not identical -- this is good pm[1:2,] summary(pm) ## examples on how to use initial values ## fixed initial values inits <- list(list(beta=matrix(c(0,1),1,2)), list(beta=matrix(c(1,0),1,2)), list(beta=matrix(c(0,0),1,2))) pm2 <- jags.parfit(cl, dat, "beta", glm.model, inits) ## random numbers generated prior to jags.parfit inits <- list(list(beta=matrix(rnorm(2),1,2)), list(beta=matrix(rnorm(2),1,2)), list(beta=matrix(rnorm(2),1,2))) pm3 <- jags.parfit(cl, dat, "beta", glm.model, inits) ## self contained function inits <- function() list(beta=matrix(rnorm(2),1,2)) pm4 <- jags.parfit(cl, dat, "beta", glm.model, inits) ## function pointing to the global environment fun <- function() list(beta=matrix(rnorm(2),1,2)) inits <- function() fun() clusterExport(cl, "fun") ## using the L'Ecuyer module with 6 chains load.module("lecuyer") parLoadModule(cl,"lecuyer") pm5 <- jags.parfit(cl, dat, "beta", glm.model, inits, n.chains=6) nchain(pm5) unload.module("lecuyer") parUnloadModule(cl,"lecuyer") stopCluster(cl) ## multicore type forking if (.Platform$OS.type != "windows") { pm6 <- jags.parfit(3, dat, "beta", glm.model) } } ## End(Not run)
jagsModel
is used to create an object representing a
Bayesian graphical model, specified with a BUGS-language description
of the prior distribution, and a set of data.
This function uses jags.model
but keeps track
of data cloning information supplied via the data
argument.
The model argument can also accept functions or 'custommodel' objects.
jagsModel(file, data=sys.frame(sys.parent()), inits, n.chains = 1, n.adapt=1000, quiet=FALSE)
jagsModel(file, data=sys.frame(sys.parent()), inits, n.chains = 1, n.adapt=1000, quiet=FALSE)
file |
the name of the file containing a
description of the model in the
JAGS dialect of the BUGS language.
Alternatively, |
data |
a list or environment containing the data. Any numeric
objects in |
inits |
optional specification of initial values in the form of a list or a function. If omitted, initial values will be generated automatically. It is an error to supply an initial value for an observed node. |
n.chains |
the number of chains for the model |
n.adapt |
the number of iterations for adaptation. See
|
quiet |
if |
parJagsModel
returns an object inheriting from class jags
which can be used to generate dependent samples from the posterior
distribution of the parameters.
An object of class jags
is a list of functions that share a
common environment, see jags.model
for details.
An n.clones
attribute is attached to the object when applicable.
Peter Solymos
Underlying functions: jags.model
,
update.jags
See example on help page of codaSamples
.
Parallel version: parJagsModel
These functions calculates diagnostics for evaluating data cloning convergence.
lambdamax.diag(x, ...) ## S3 method for class 'mcmc.list' lambdamax.diag(x, ...) chisq.diag(x, ...) ## S3 method for class 'mcmc.list' chisq.diag(x, ...)
lambdamax.diag(x, ...) ## S3 method for class 'mcmc.list' lambdamax.diag(x, ...) chisq.diag(x, ...) ## S3 method for class 'mcmc.list' chisq.diag(x, ...)
x |
An object of class |
... |
Other arguments to be passed. |
These diagnostics can be used to test for the data cloning convergence
(Lele et al. 2007, 2010).
Asymptotically the posterior distribution of the parameters approaches
a degenerate multivariate normal distribution. As the distribution
is getting more degenerate, the maximal eigenvalue ()
of the unscaled covariance matrix is decreasing.
There is no critical value under which
is good
enough. By default, 0.05 is used (see
getOption("dclone")$diag
).
Another diagnostic tool is to check if the joint posterior distribution
is multivariate normal. It is done by chisq.diag
as described by
Lele et al. (2010).
lambdamax.diag
returns a single value, the maximum of the
eigenvalues of the
unscaled variance covariance matrix of the estimated parameters.
chisq.diag
returns two test statistic values
(mean squared error and r-squared) with empirical and theoretical
quantiles.
Khurram Nadeem, [email protected]
Peter Solymos
Lele, S.R., B. Dennis and F. Lutscher, 2007. Data cloning: easy maximum likelihood estimation for complex ecological models using Bayesian Markov chain Monte Carlo methods. Ecology Letters 10, 551–563.
Lele, S. R., K. Nadeem and B. Schmuland, 2010. Estimability and likelihood inference for generalized linear mixed models using data cloning. Journal of the American Statistical Association 105, 1617–1625.
Solymos, P., 2010. dclone: Data Cloning in R. The R Journal 2(2), 29–37. URL: https://journal.r-project.org/archive/2010-2/RJournal_2010-2_Solymos.pdf
Eigen decomposition: eigen
data(regmod) lambdamax.diag(regmod) chisq.diag(regmod)
data(regmod) lambdamax.diag(regmod) chisq.diag(regmod)
Matrix symmetry might depend on numerical precision issues. The older version of JAGS had a bug related to this issue for multivariate normal nodes. This simple function can fix the issue, but new JAGS versions do not require such intervention.
make.symmetric(x)
make.symmetric(x)
x |
A square matrix. |
The function takes the average as (x[i, j] + x[j, i]) / 2
for each off diagonal cells.
A symmetric square matrix.
The function works for any matrix, even for those not intended to be symmetric.
Peter Solymos
x <- as.matrix(as.dist(matrix(1:25, 5, 5))) diag(x) <- 100 x[lower.tri(x)] <- x[lower.tri(x)] - 0.1 x[upper.tri(x)] <- x[upper.tri(x)] + 0.1 x make.symmetric(x)
x <- as.matrix(as.dist(matrix(1:25, 5, 5))) diag(x) <- 100 x[lower.tri(x)] <- x[lower.tri(x)] - 0.1 x[upper.tri(x)] <- x[upper.tri(x)] + 0.1 x make.symmetric(x)
mclapplySB
is a size balancing version of
mclapply
.
mclapplySB(X, FUN, ..., mc.preschedule = TRUE, mc.set.seed = TRUE, mc.silent = FALSE, mc.cores = 1L, mc.cleanup = TRUE, mc.allow.recursive = TRUE, size = 1)
mclapplySB(X, FUN, ..., mc.preschedule = TRUE, mc.set.seed = TRUE, mc.silent = FALSE, mc.cores = 1L, mc.cleanup = TRUE, mc.allow.recursive = TRUE, size = 1)
X |
a vector (atomic or list) or an expressions vector. Other
objects (including classed objects) will be coerced by
|
FUN |
the function to be applied to each element of |
... |
optional arguments to |
mc.preschedule |
see |
mc.set.seed |
see |
mc.silent |
see |
mc.cores |
The number of cores to use, i.e. how many processes will be spawned (at most) |
mc.cleanup |
see |
mc.allow.recursive |
see |
size |
Vector of problem sizes
(or relative performance information)
corresponding to elements of |
mclapply
gives details of the forking
mechanism.
mclapply
is used unmodified
if sizes of the jobs are equal (length(unique(size)) == 1
).
Size balancing (as described in parDosa
)
is used to balance workload on the child processes
otherwise.
A list.
Peter Solymos
Methods for 'mcmc.list' objects.
dcsd(object, ...) ## S3 method for class 'mcmc.list' dcsd(object, ...) ## S3 method for class 'mcmc.list' coef(object, ...) ## S3 method for class 'mcmc.list.dc' confint(object, parm, level = 0.95, ...) ## S3 method for class 'mcmc.list' vcov(object, ...) ## S3 method for class 'mcmc.list.dc' vcov(object, invfisher = TRUE, ...) ## S3 method for class 'mcmc.list' quantile(x, ...)
dcsd(object, ...) ## S3 method for class 'mcmc.list' dcsd(object, ...) ## S3 method for class 'mcmc.list' coef(object, ...) ## S3 method for class 'mcmc.list.dc' confint(object, parm, level = 0.95, ...) ## S3 method for class 'mcmc.list' vcov(object, ...) ## S3 method for class 'mcmc.list.dc' vcov(object, invfisher = TRUE, ...) ## S3 method for class 'mcmc.list' quantile(x, ...)
x , object
|
MCMC object to be processed. |
parm |
A specification of which parameters are to be given confidence intervals, either a vector of numbers or a vector of names. If missing, all parameters are considered. |
level |
The confidence level required. |
... |
Further arguments passed to functions. |
invfisher |
Logical, if the inverse of the Fisher information matrix
( |
dcsd
returns the data cloning standard errors of a posterior
MCMC chain calculated as standard deviation times the square root
of the number of clones.
The coef
method returns mean of the posterior MCMC chains
for the monitored parameters.
The confint
method returns Wald-type confidence intervals
for the parameters assuming asymptotic normality.
The vcov
method returns the inverse of the Fisher
information matrix (invfisher = TRUE
) or the covariance matrix
of the joint posterior distribution (invfisher = FALSE
).
The invfisher
is valid only for mcmc.list.dc
(data cloned) objects.
The quantile
method returns quantiles for each variable.
Some functions only available for the 'mcmc.list.dc' class which inherits from class 'mcmc.list'.
Peter Solymos
## Not run: ## simple regression example from the JAGS manual jfun <- function() { for (i in 1:N) { Y[i] ~ dnorm(mu[i], tau) mu[i] <- alpha + beta * (x[i] - x.bar) } x.bar <- mean(x) alpha ~ dnorm(0.0, 1.0E-4) beta ~ dnorm(0.0, 1.0E-4) sigma <- 1.0/sqrt(tau) tau ~ dgamma(1.0E-3, 1.0E-3) } ## data generation set.seed(1234) N <- 100 alpha <- 1 beta <- -1 sigma <- 0.5 x <- runif(N) linpred <- crossprod(t(model.matrix(~x)), c(alpha, beta)) Y <- rnorm(N, mean = linpred, sd = sigma) ## data for the model dcdata <- dclone(list(N = N, Y = Y, x = x), 5, multiply = "N") ## data cloning dcmod <- jags.fit(dcdata, c("alpha", "beta", "sigma"), jfun, n.chains = 3) summary(dcmod) coef(dcmod) dcsd(dcmod) confint(dcmod) vcov(dcmod) vcov(dcmod, invfisher = FALSE) quantile(dcmod) ## End(Not run)
## Not run: ## simple regression example from the JAGS manual jfun <- function() { for (i in 1:N) { Y[i] ~ dnorm(mu[i], tau) mu[i] <- alpha + beta * (x[i] - x.bar) } x.bar <- mean(x) alpha ~ dnorm(0.0, 1.0E-4) beta ~ dnorm(0.0, 1.0E-4) sigma <- 1.0/sqrt(tau) tau ~ dgamma(1.0E-3, 1.0E-3) } ## data generation set.seed(1234) N <- 100 alpha <- 1 beta <- -1 sigma <- 0.5 x <- runif(N) linpred <- crossprod(t(model.matrix(~x)), c(alpha, beta)) Y <- rnorm(N, mean = linpred, sd = sigma) ## data for the model dcdata <- dclone(list(N = N, Y = Y, x = x), 5, multiply = "N") ## data cloning dcmod <- jags.fit(dcdata, c("alpha", "beta", "sigma"), jfun, n.chains = 3) summary(dcmod) coef(dcmod) dcsd(dcmod) confint(dcmod) vcov(dcmod) vcov(dcmod, invfisher = FALSE) quantile(dcmod) ## End(Not run)
Conveniently calculates statistics for mcmc.list objects.
mcmcapply(x, FUN, ...) ## S3 method for class 'mcmc.list' stack(x, ...)
mcmcapply(x, FUN, ...) ## S3 method for class 'mcmc.list' stack(x, ...)
x |
Objects of class |
FUN |
A function to be used in the calculations, returning a single value. |
... |
Other arguments passed to |
mcmcapply
returns a certain statistics based on FUN
after coercing into a matrix. FUN
can be missing,
in this case mcmcapply
is equivalent
to calling as.matrix
on an 'mcmc.list' object.
stack
can be used to concatenates 'mcmc.list'
objects into a single vector
along with index variables indicating where each observation originated
from (e.g. iteration, variable, chain).
mcmcapply
returns statistic value for each variable
based on FUN
, using all values in all chains of the MCMC object.
stack
returns a data frame with columns:
iter, variable, chain, value.
Peter Solymos
data(regmod) mcmcapply(regmod, mean) mcmcapply(regmod, sd) x <- stack(regmod) head(x) summary(x) library(lattice) xyplot(value ~ iter | variable, data=x, type="l", scales = "free", groups=chain)
data(regmod) mcmcapply(regmod, mean) mcmcapply(regmod, sd) x <- stack(regmod) head(x) summary(x) library(lattice) xyplot(value ~ iter | variable, data=x, type="l", scales = "free", groups=chain)
Retrieves the number of clones from an object.
nclones(x, ...) ## Default S3 method: nclones(x, ...) ## S3 method for class 'list' nclones(x, ...)
nclones(x, ...) ## Default S3 method: nclones(x, ...) ## S3 method for class 'list' nclones(x, ...)
x |
An object. |
... |
Other arguments to be passed. |
Returns the number of of clones, or NULL
.
Peter Solymos
x <- dclone(1:10, 10) nclones(x) nclones(1:10) # this is NULL
x <- dclone(1:10, 10) nclones(x) nclones(1:10) # this is NULL
The data set contains observations (point counts) of 198 sites of the Alberta Biodiversity Monitoring Institute.
count
: integer, ovenbird counts per site.
site, year
: numeric, site number and year of data collection.
ecosite
: factor with 5 levels,
ecological categorization of the sites.
uplow
: factor with 2 levels, ecological
categorization of the sites
(same es ecosite but levels are grouped into
upland
and lowland
).
dsucc, dalien, thd
: numeric, percentage of successional,
alienating and total human disturbance
based on interpreted 3 x 7 km photoplots centered on each site.
long, lat
: numeric, public
longitude/latitude coordinates of the sites.
data(ovenbird)
data(ovenbird)
Alberta Biodiversity Monitoring Institute, https://www.abmi.ca
data(ovenbird) summary(ovenbird) str(ovenbird)
data(ovenbird) summary(ovenbird) str(ovenbird)
A matrix of scatterplots is produced.
## S3 method for class 'mcmc.list' pairs(x, n = 25, col = 1:length(x), col.hist = "gold", col.image = terrain.colors(50), density = TRUE, contour = TRUE, mean = TRUE, ...)
## S3 method for class 'mcmc.list' pairs(x, n = 25, col = 1:length(x), col.hist = "gold", col.image = terrain.colors(50), density = TRUE, contour = TRUE, mean = TRUE, ...)
x |
an 'mcmc.list' object. |
n |
number of of grid points in each direction for two-dimensional kernel density estimation. Can be scalar or a length-2 integer vector. |
col |
color for chains in upper panel scatterplots. |
col.hist |
color for histogram fill in diagonal panels. |
col.image |
color palette for image plot in lower panel scatterplots. |
density |
logical, if image plot based on the two-dimensional kernel density estimation should be plotted in lower panel. |
contour |
logical, if contour plot based on the two-dimensional kernel density estimation should be plotted in lower panel. |
mean |
logical, if lines should indicate means of the posterior densities in the panels. |
... |
additional graphical parameters/arguments. |
The function produces a scatterplot matrix for 'mcmc.list' objects.
Diagonal panels are posterior densities with labels and
rug on the top. Upper panels are pairwise bivariate scatterplots
with coloring corresponding to chains, thus highlighting mixing properties
although not as clearly as trace plots. Lower panels
are two-dimensional kernel density estimates based on
kde2d
function
of MASS package using image
and contour
.
The function returns NULL
invisibly and
produces a plot as a side effect.
Peter Solymos
Two-dimensional kernel density estimation:
kde2d
in MASS package
data(regmod) pairs(regmod)
data(regmod) pairs(regmod)
This function takes care of initial values with safe RNGs based on
parallel.seeds
of the rjags package.
parallel.inits(inits, n.chains)
parallel.inits(inits, n.chains)
inits |
Initial values (see Initialization at |
n.chains |
Number of chains to generate. |
Initial values are handled similar to as it is done in
jags.model
.
RNGs are based on values returned by
parallel.seeds
.
If the "lecuyer"
JAGS module is active, RNGs are based on
the "lecuyer::RngStream"
factory, otherwise those are based on
the "base::BaseRNG"
factory.
Returns a list of initial values with RNGs.
Peter Solymos. Based on Martyn Plummer's
parallel.seeds
function and code in
jags.model
for initial value handling in the
rjags package.
This seeding function is used in all of dclone's
parallel functions that do initialization:
parJagsModel
, jags.parfit
,
dc.parfit
if (require(rjags)) { ## "base::BaseRNG" factory. parallel.inits(NULL, 2) ## "lecuyer::RngStream" factory load.module("lecuyer") parallel.inits(NULL, 2) unload.module("lecuyer") ## some non NULL inits specifications parallel.inits(list(a=0), 2) parallel.inits(list(list(a=0), list(a=0)), 2) parallel.inits(function() list(a=0), 2) parallel.inits(function(chain) list(a=chain), 2) }
if (require(rjags)) { ## "base::BaseRNG" factory. parallel.inits(NULL, 2) ## "lecuyer::RngStream" factory load.module("lecuyer") parallel.inits(NULL, 2) unload.module("lecuyer") ## some non NULL inits specifications parallel.inits(list(a=0), 2) parallel.inits(list(list(a=0), list(a=0)), 2) parallel.inits(function() list(a=0), 2) parallel.inits(function(chain) list(a=chain), 2) }
This function sets a trace
monitor for all requested nodes, updates the model on each
workers. Finally, it return the chains to the master and coerces the
output to a single mcmc.list
object.
parCodaSamples(cl, model, variable.names, n.iter, thin = 1, na.rm = TRUE, ...)
parCodaSamples(cl, model, variable.names, n.iter, thin = 1, na.rm = TRUE, ...)
cl |
A cluster object created by |
model |
character, name of a jags model object |
variable.names |
a character vector giving the names of variables to be monitored |
n.iter |
number of iterations to monitor |
thin |
thinning interval for monitors |
na.rm |
logical flag that indicates whether variables containing
missing values should be omitted. See details in help page
of |
... |
optional arguments that are passed to the update method for jags model objects |
An mcmc.list
object with possibly an n.clones
attribute.
Peter Solymos
Original sequential function in rjags:
coda.samples
Sequential dclone-ified version: codaSamples
## Not run: if (require(rjags)) { model <- function() { for (i in 1:N) { Y[i] ~ dnorm(mu[i], tau) mu[i] <- alpha + beta * (x[i] - x.bar) } x.bar <- mean(x[]) alpha ~ dnorm(0.0, 1.0E-4) beta ~ dnorm(0.0, 1.0E-4) sigma <- 1.0/sqrt(tau) tau ~ dgamma(1.0E-3, 1.0E-3) } ## data generation set.seed(1234) N <- 100 alpha <- 1 beta <- -1 sigma <- 0.5 x <- runif(N) linpred <- crossprod(t(model.matrix(~x)), c(alpha, beta)) Y <- rnorm(N, mean = linpred, sd = sigma) jdata <- list(N = N, Y = Y, x = x) jpara <- c("alpha", "beta", "sigma") ## jags model on parallel workers ## n.chains must be <= no. of workers cl <- makePSOCKcluster(4) parJagsModel(cl, name="res", file=model, data=jdata, n.chains = 2, n.adapt=1000) parUpdate(cl, "res", n.iter=1000) m <- parCodaSamples(cl, "res", jpara, n.iter=2000) stopifnot(2==nchain(m)) ## with data cloning dcdata <- dclone(list(N = N, Y = Y, x = x), 2, multiply="N") parJagsModel(cl, name="res2", file=model, data=dcdata, n.chains = 2, n.adapt=1000) parUpdate(cl, "res2", n.iter=1000) m2 <- parCodaSamples(cl, "res2", jpara, n.iter=2000) stopifnot(2==nchain(m2)) nclones(m2) stopCluster(cl) } ## End(Not run)
## Not run: if (require(rjags)) { model <- function() { for (i in 1:N) { Y[i] ~ dnorm(mu[i], tau) mu[i] <- alpha + beta * (x[i] - x.bar) } x.bar <- mean(x[]) alpha ~ dnorm(0.0, 1.0E-4) beta ~ dnorm(0.0, 1.0E-4) sigma <- 1.0/sqrt(tau) tau ~ dgamma(1.0E-3, 1.0E-3) } ## data generation set.seed(1234) N <- 100 alpha <- 1 beta <- -1 sigma <- 0.5 x <- runif(N) linpred <- crossprod(t(model.matrix(~x)), c(alpha, beta)) Y <- rnorm(N, mean = linpred, sd = sigma) jdata <- list(N = N, Y = Y, x = x) jpara <- c("alpha", "beta", "sigma") ## jags model on parallel workers ## n.chains must be <= no. of workers cl <- makePSOCKcluster(4) parJagsModel(cl, name="res", file=model, data=jdata, n.chains = 2, n.adapt=1000) parUpdate(cl, "res", n.iter=1000) m <- parCodaSamples(cl, "res", jpara, n.iter=2000) stopifnot(2==nchain(m)) ## with data cloning dcdata <- dclone(list(N = N, Y = Y, x = x), 2, multiply="N") parJagsModel(cl, name="res2", file=model, data=dcdata, n.chains = 2, n.adapt=1000) parUpdate(cl, "res2", n.iter=1000) m2 <- parCodaSamples(cl, "res2", jpara, n.iter=2000) stopifnot(2==nchain(m2)) nclones(m2) stopCluster(cl) } ## End(Not run)
parDosa
is a wrapper function around many
functionalities of the parallel package.
It is designed to work closely with MCMC fitting functions,
e.g. can easily be called from inside of a function.
parDosa(cl, seq, fun, cldata, lib = NULL, dir = NULL, evalq=NULL, size = 1, balancing = c("none", "load", "size", "both"), rng.type = c("none", "RNGstream"), cleanup = TRUE, unload = FALSE, iseed=NULL, ...)
parDosa(cl, seq, fun, cldata, lib = NULL, dir = NULL, evalq=NULL, size = 1, balancing = c("none", "load", "size", "both"), rng.type = c("none", "RNGstream"), cleanup = TRUE, unload = FALSE, iseed=NULL, ...)
cl |
A cluster object created by |
seq |
A vector to split. |
fun |
A function or character string naming a function. |
cldata |
A list containing data.
This list is then exported to the cluster by
|
lib |
Character, name of package(s). Optionally packages can be loaded onto the cluster. More than one package can be specified as character vector. Packages already loaded are skipped. |
dir |
Working directory to use, if |
evalq |
Character, expressions to evaluate,
e.g. for changing global options (passed to |
balancing |
Character, type of balancing to perform (see Details). |
size |
Vector of problem sizes (or relative performance information)
corresponding to elements of |
rng.type |
Character, |
cleanup |
logical, if |
unload |
logical, if |
iseed |
integer or |
... |
Other arguments of |
The function uses 'snow' type clusters when cl
is a cluster
object. The function uses 'multicore' type forking (shared memory)
when cl
is an integer.
The value from getOption("mc.cores")
is used if the
argument is NULL
.
The function sets the random seeds, loads packages lib
onto the cluster, sets the working directory as dir
,
exports cldata
and evaluates fun
on seq
.
No balancing (balancing = "none"
) means, that the problem
is split into roughly equal
subsets, without respect to size
(see clusterSplit
). This splitting
is deterministic (reproducible).
Load balancing (balancing = "load"
) means,
that the problem is not splitted into subsets
a priori, but subsequent items are placed on the
worker which is empty
(see clusterApplyLB
for load balancing).
This splitting is non-deterministic (might not be reproducible).
Size balancing (balancing = "size"
) means,
that the problem is splitted into
subsets, with respect to size
(see clusterSplitSB
and parLapplySB
).
In size balancing, the problem is re-ordered from
largest to smallest, and then subsets are
determined by minimizing the total approximate processing time.
This splitting is deterministic (reproducible).
Size and load balancing (balancing = "both"
) means,
that the problem is re-ordered from largest to smallest,
and then undeterministic load balancing
is used (see parLapplySLB
).
If size
is correct, this is identical to size balancing.
This splitting is non-deterministic (might not be reproducible).
Usually a list with results returned by the cluster.
Peter Solymos
Size balancing: parLapplySB
, parLapplySLB
,
mclapplySB
Optimizing the number of workers:
clusterSize
, plotClusterSize
.
parDosa
is used internally by parallel dclone
functions: jags.parfit
, dc.parfit
,
parJagsModel
, parUpdate
,
parCodaSamples
.
parDosa
manipulates specific environments
described on the help page DcloneEnv
.
parJagsModel
is used to create an object representing a
Bayesian graphical model, specified with a BUGS-language description
of the prior distribution, and a set of data.
parJagsModel(cl, name, file, data=sys.frame(sys.parent()), inits, n.chains = 1, n.adapt=1000, quiet=FALSE)
parJagsModel(cl, name, file, data=sys.frame(sys.parent()), inits, n.chains = 1, n.adapt=1000, quiet=FALSE)
cl |
A cluster object created by |
name |
character, name for the model to be assigned on the workers. |
file |
the name of the file containing a
description of the model in the
JAGS dialect of the BUGS language.
Alternatively, |
data |
a list or environment containing the data. Any numeric
objects in |
inits |
optional specification of initial values in the form of a
list or a function (see |
n.chains |
the number of parallel chains for the model |
n.adapt |
the number of iterations for adaptation. See
|
quiet |
if |
parJagsModel
returns an object inheriting from class jags
which can be used to generate dependent samples from the posterior
distribution of the parameters. These jags
models are
residing on the workers, thus updating/sampling is possible.
Length of cl
must be equal to or greater than n.chains
.
RNG seed generation takes place first on the master,
and chains then initialized on
each worker by distributing inits
and single chained models.
An object of class jags
is a list of functions that share a
common environment, see jags.model
for details.
Data cloning information is attached to the returned
object if data argument has n.clones
attribute.
Peter Solymos
Original sequential function in rjags:
jags.model
Sequential dclone-ified version: jagsModel
See example on help page of parCodaSamples
.
A JAGS module is a dynamically loaded library that extends the functionality of JAGS. These functions load and unload JAGS modules and show the names of the currently loaded modules on parallel workers.
parLoadModule(cl, name, path, quiet=FALSE) parUnloadModule(cl, name, quiet=FALSE) parListModules(cl)
parLoadModule(cl, name, path, quiet=FALSE) parUnloadModule(cl, name, quiet=FALSE) parListModules(cl)
cl |
a cluster object created by the parallel package. |
name |
character, name of the module to be loaded |
path |
file path to the location of the DLL. If omitted,
the option |
quiet |
a logical. If |
Peter Solymos
list.modules
,
load.module
, unload.module
## Not run: if (require(rjags)) { cl <- makePSOCKcluster(3) parListModules(cl) parLoadModule(cl, "glm") parListModules(cl) parUnloadModule(cl, "glm") parListModules(cl) stopCluster(cl) } ## End(Not run)
## Not run: if (require(rjags)) { cl <- makePSOCKcluster(3) parListModules(cl) parLoadModule(cl, "glm") parListModules(cl) parUnloadModule(cl, "glm") parListModules(cl) stopCluster(cl) } ## End(Not run)
JAGS modules contain factory objects for samplers, monitors, and random number generators for a JAGS model. These functions allow fine-grained control over which factories are active on parallel workers.
parListFactories(cl, type) parSetFactory(cl, name, type, state)
parListFactories(cl, type) parSetFactory(cl, name, type, state)
cl |
a cluster object created by the parallel package. |
name |
name of the factory to set |
type |
type of factory to query or set. Possible values are
|
state |
a logical. If |
parListFactories
returns a a list of data frame
with two columns per each worker, the first
column shows the names of the factory objects in the currently loaded
modules, and the second column is a logical vector indicating whether
the corresponding factory is active or not.
sparStFactory
is called to change
the future behaviour of factory
objects. If a factory is set to inactive then it will be skipped.
When a module is loaded, all of its factory objects are active. This is also true if a module is unloaded and then reloaded.
Peter Solymos
## Not run: if (require(rjags)) { cl <- makePSOCKcluster(3) parListFactories(cl, "sampler") parListFactories(cl, "monitor") parListFactories(cl, "rng") parSetFactory(cl, "base::Slice", "sampler", FALSE) parListFactories(cl, "sampler") parSetFactory(cl, "base::Slice", "sampler", TRUE) stopCluster(cl) } ## End(Not run)
## Not run: if (require(rjags)) { cl <- makePSOCKcluster(3) parListFactories(cl, "sampler") parListFactories(cl, "monitor") parListFactories(cl, "rng") parSetFactory(cl, "base::Slice", "sampler", FALSE) parListFactories(cl, "sampler") parSetFactory(cl, "base::Slice", "sampler", TRUE) stopCluster(cl) } ## End(Not run)
Update the Markov chain associated with the model on parallel workers. (This represents the 'burn-in' phase when nodes are not monitored.)
parUpdate(cl, object, n.iter=1, ...)
parUpdate(cl, object, n.iter=1, ...)
cl |
A cluster object created by |
object |
character, name of a jags model object |
n.iter |
number of iterations of the Markov chain to run |
... |
additional arguments to the update method, see
|
The parUpdate
function modifies the
original object on parallel workers and returns NULL
.
Peter Solymos
See example on help page of parCodaSamples
.
This data set was made via the jags.fit
function.
data(regmod)
data(regmod)
See Example.
data(regmod) summary(regmod) plot(regmod) ## Not run: ## DATA GENERATION ## simple regression example from the JAGS manual jfun <- function() { for (i in 1:N) { Y[i] ~ dnorm(mu[i], tau) mu[i] <- alpha + beta * (x[i] - x.bar) } x.bar <- mean(x[]) alpha ~ dnorm(0.0, 1.0E-4) beta ~ dnorm(0.0, 1.0E-4) sigma <- 1.0/sqrt(tau) tau ~ dgamma(1.0E-3, 1.0E-3) } ## data generation set.seed(1234) N <- 100 alpha <- 1 beta <- -1 sigma <- 0.5 x <- runif(N) linpred <- crossprod(t(model.matrix(~x)), c(alpha, beta)) Y <- rnorm(N, mean = linpred, sd = sigma) ## list of data for the model jdata <- list(N = N, Y = Y, x = x) ## what to monitor jpara <- c("alpha", "beta", "sigma") ## fit the model with JAGS regmod <- jags.fit(jdata, jpara, jfun, n.chains = 3, updated.model = FALSE) ## End(Not run)
data(regmod) summary(regmod) plot(regmod) ## Not run: ## DATA GENERATION ## simple regression example from the JAGS manual jfun <- function() { for (i in 1:N) { Y[i] ~ dnorm(mu[i], tau) mu[i] <- alpha + beta * (x[i] - x.bar) } x.bar <- mean(x[]) alpha ~ dnorm(0.0, 1.0E-4) beta ~ dnorm(0.0, 1.0E-4) sigma <- 1.0/sqrt(tau) tau ~ dgamma(1.0E-3, 1.0E-3) } ## data generation set.seed(1234) N <- 100 alpha <- 1 beta <- -1 sigma <- 0.5 x <- runif(N) linpred <- crossprod(t(model.matrix(~x)), c(alpha, beta)) Y <- rnorm(N, mean = linpred, sd = sigma) ## list of data for the model jdata <- list(N = N, Y = Y, x = x) ## what to monitor jpara <- c("alpha", "beta", "sigma") ## fit the model with JAGS regmod <- jags.fit(jdata, jpara, jfun, n.chains = 3, updated.model = FALSE) ## End(Not run)
Convenient functions designed to work well with cloned data arguments and Stan.
stan.fit(data, params, model, inits = NULL, seed = sample.int(.Machine$integer.max, 1), n.chains = 3, format = c("mcmc.list", "stanfit"), stan.model = TRUE, fit = NA, ...) stan.model(object, ...) stan.parfit(cl, data, params, model, inits = NULL, seed = sample.int(.Machine$integer.max, n.chains), n.chains = 3, format = c("mcmc.list", "stanfit"), stan.model = TRUE, fit = NA, ...)
stan.fit(data, params, model, inits = NULL, seed = sample.int(.Machine$integer.max, 1), n.chains = 3, format = c("mcmc.list", "stanfit"), stan.model = TRUE, fit = NA, ...) stan.model(object, ...) stan.parfit(cl, data, params, model, inits = NULL, seed = sample.int(.Machine$integer.max, n.chains), n.chains = 3, format = c("mcmc.list", "stanfit"), stan.model = TRUE, fit = NA, ...)
data |
A list (or environment) containing the data. |
params |
Character vector of parameters to be sampled. |
model |
Character string (name of the model file), a function containing
the model, or a |
inits |
Optional specification of initial values in the
form of a list or a function.
If |
seed |
Random seed. |
n.chains |
number of Markov chains. |
format |
Desired output format. |
stan.model |
Logical, if |
fit |
Fitted Stan object. |
cl |
A cluster object created by |
object |
A fitted MCMC object ('mcmc.list' class for example),
with |
... |
Further arguments. |
By default, an stan.fit
returns an
mcmc.list
object. If data cloning is used via the
data
argument, summary
returns a modified summary
containing scaled data cloning standard errors
(scaled by sqrt(n.clones)
), and
values (as returned by
gelman.diag
).
stan.model
returns the stanmodel
object.
stan.parfit
runs chains using multiple cores when cl
is an integer. Using a cluster object leads to recompiling the
model (therefore fit
is ignored), and might not be
very quick to run.
Peter Solymos
Underlying functions:
stan
and
stanfit
in package rstan
Methods: dcsd
, confint.mcmc.list.dc
,
coef.mcmc.list
, quantile.mcmc.list
,
vcov.mcmc.list.dc
## Not run: if (require(rstan)) { model <- custommodel("data { int<lower=0> N; vector[N] y; vector[N] x; } parameters { real alpha; real beta; real<lower=0> sigma; } model { alpha ~ normal(0,10); beta ~ normal(0,10); sigma ~ cauchy(0,5); for (n in 1:N) y[n] ~ normal(alpha + beta * x[n], sigma); }") N <- 100 alpha <- 1 beta <- -1 sigma <- 0.5 x <- runif(N) y <- rnorm(N, alpha + beta * x, sigma) dat <- list(N=N, y=y, x=x) params <- c("alpha", "beta", "sigma") ## compile on 1st time only fit0 <- stan.fit(dat, params, model) ## reuse compiled fit0 fit <- stan.fit(dat, params, model, fit=fit0) sm <- stan.model(fit) summary(fit) sm ## data cloning dcdat <- dclone(dat, n.clones=2, multiply="N") dcfit <- stan.fit(dcdat, params, model, fit=fit0) summary(dcfit) nclones(dcfit) ## using parallel options cl <- makeCluster(2) ## cannot utilize compiled fit0 fit2 <- stan.parfit(cl=cl, dat, params, model) stopCluster(cl) if (.Platform$OS.type != "windows") { ## utilize compiled fit0 fit3 <- stan.parfit(cl=2, dat, params, model, fit=fit0) } } ## End(Not run)
## Not run: if (require(rstan)) { model <- custommodel("data { int<lower=0> N; vector[N] y; vector[N] x; } parameters { real alpha; real beta; real<lower=0> sigma; } model { alpha ~ normal(0,10); beta ~ normal(0,10); sigma ~ cauchy(0,5); for (n in 1:N) y[n] ~ normal(alpha + beta * x[n], sigma); }") N <- 100 alpha <- 1 beta <- -1 sigma <- 0.5 x <- runif(N) y <- rnorm(N, alpha + beta * x, sigma) dat <- list(N=N, y=y, x=x) params <- c("alpha", "beta", "sigma") ## compile on 1st time only fit0 <- stan.fit(dat, params, model) ## reuse compiled fit0 fit <- stan.fit(dat, params, model, fit=fit0) sm <- stan.model(fit) summary(fit) sm ## data cloning dcdat <- dclone(dat, n.clones=2, multiply="N") dcfit <- stan.fit(dcdat, params, model, fit=fit0) summary(dcfit) nclones(dcfit) ## using parallel options cl <- makeCluster(2) ## cannot utilize compiled fit0 fit2 <- stan.parfit(cl=cl, dat, params, model) stopCluster(cl) if (.Platform$OS.type != "windows") { ## utilize compiled fit0 fit3 <- stan.parfit(cl=2, dat, params, model, fit=fit0) } } ## End(Not run)
Automatic updating of an MCMC object until a desired statistic value reached.
updated.model(object, ...) ## S3 method for class 'mcmc.list' update(object, fun, times = 1, n.update = 0, n.iter, thin, ...)
updated.model(object, ...) ## S3 method for class 'mcmc.list' update(object, fun, times = 1, n.update = 0, n.iter, thin, ...)
object |
A fitted MCMC object ('mcmc.list' class for example), with
|
fun |
A function that evaluates convergence of the MCMC chains,
must return logical result. See Examples.
The iterative updating quits when return value is |
times |
Number of times the updating should be repeated.
If |
n.update |
Number of updating iterations. The default 0 indicates,
that only |
n.iter |
Number of iterations for sampling and evaluating |
thin |
Thinning value. If missing, value is taken from |
... |
Other arguments passed to |
updated.model
can be used to retrieve the updated model
from an MCMC object fitted via the function jags.fit
and dc.fit
(with flavour = "jags"
).
The update
method is a wrapper for this purpose,
specifically designed
for the case when MCMC convergence is problematic. A function
is evaluated on the updated model in each iteration of
the updating process,
and an MCMC object is returned when iteration ends,
or when the evaluated
function returns TRUE
value.
n.update
and n.iter
can be vectors, if lengths are
shorter then times
, values are recycled.
Data cloning information is preserved.
updated.model
returns the state of the JAGS model after updating
and sampling. This can be further updated by the function
update.jags
and sampled by coda.samples
if
convergence diagnostics were not satisfactory.
update
returns an MCMC object with
"updated.model"
attribute.
Peter Solymos
jags.fit
, coda.samples
,
update.jags
## Not run: ## simple regression example from the JAGS manual jfun <- function() { for (i in 1:N) { Y[i] ~ dnorm(mu[i], tau) mu[i] <- alpha + beta * (x[i] - x.bar) } x.bar <- mean(x[]) alpha ~ dnorm(0.0, 1.0E-4) beta ~ dnorm(0.0, 1.0E-4) sigma <- 1.0/sqrt(tau) tau ~ dgamma(1.0E-3, 1.0E-3) } ## data generation set.seed(1234) N <- 100 alpha <- 1 beta <- -1 sigma <- 0.5 x <- runif(N) linpred <- crossprod(t(model.matrix(~x)), c(alpha, beta)) Y <- rnorm(N, mean = linpred, sd = sigma) ## list of data for the model jdata <- list(N = N, Y = Y, x = x) ## what to monitor jpara <- c("alpha", "beta", "sigma") ## fit the model with JAGS regmod <- jags.fit(jdata, jpara, jfun, n.chains = 3) ## get the updated model upmod <- updated.model(regmod) upmod ## automatic updating ## using R_hat < 1.1 as criteria critfun <- function(x) all(gelman.diag(x)$psrf[,1] < 1.1) mod <- update(regmod, critfun, 5) ## update just once mod2 <- update(regmod) summary(mod) ## End(Not run)
## Not run: ## simple regression example from the JAGS manual jfun <- function() { for (i in 1:N) { Y[i] ~ dnorm(mu[i], tau) mu[i] <- alpha + beta * (x[i] - x.bar) } x.bar <- mean(x[]) alpha ~ dnorm(0.0, 1.0E-4) beta ~ dnorm(0.0, 1.0E-4) sigma <- 1.0/sqrt(tau) tau ~ dgamma(1.0E-3, 1.0E-3) } ## data generation set.seed(1234) N <- 100 alpha <- 1 beta <- -1 sigma <- 0.5 x <- runif(N) linpred <- crossprod(t(model.matrix(~x)), c(alpha, beta)) Y <- rnorm(N, mean = linpred, sd = sigma) ## list of data for the model jdata <- list(N = N, Y = Y, x = x) ## what to monitor jpara <- c("alpha", "beta", "sigma") ## fit the model with JAGS regmod <- jags.fit(jdata, jpara, jfun, n.chains = 3) ## get the updated model upmod <- updated.model(regmod) upmod ## automatic updating ## using R_hat < 1.1 as criteria critfun <- function(x) all(gelman.diag(x)$psrf[,1] < 1.1) mod <- update(regmod, critfun, 5) ## update just once mod2 <- update(regmod) summary(mod) ## End(Not run)
Writes or removes a BUGS model file to or from the hard drive.
write.jags.model(model, filename = "model.txt", digits = 5, dir = tempdir(), overwrite = getOption("dcoptions")$overwrite) clean.jags.model(filename = "model.txt") custommodel(model, exclude = NULL, digits = 5)
write.jags.model(model, filename = "model.txt", digits = 5, dir = tempdir(), overwrite = getOption("dcoptions")$overwrite) clean.jags.model(filename = "model.txt") custommodel(model, exclude = NULL, digits = 5)
model |
JAGS model to write onto the hard drive (see Example).
For |
digits |
Number of significant digits used in the output. |
filename |
Character, the name of the file to write/remove.
It can be a |
dir |
Optional argument for directory where to write the file.
The default is to use a temporary directory and use
|
overwrite |
Logical, if |
exclude |
Numeric, lines of the model to exclude (see Details). |
write.jags.model
is built upon the function
write.model
of the R2WinBUGS package.
clean.jags.model
is built upon the function
file.remove
, and
intended to be used internally to clean up the JAGS
model file after estimating sessions,
ideally via the on.exit
function.
It requires the full path as returned by write.jags.model
.
The function custommodel
can be used to exclude some lines
of the model. This is handy when there are variations of the same model.
write.jags.model
accepts results returned by custommodel
.
This is also the preferred way of including BUGS models into
R packages, because the function form often includes
undefined functions.
Use the %_%
operator if the model is a function and the model
contains truncation (I()
in WinBUGS, T()
in JAGS).
See explanation on help page of write.model
.
write.jags.model
invisibly returns the name of the file
that was written eventually (possibly including random string).
The return value includes the full path.
clean.jags.model
invisibly returns the result of
file.remove
(logical).
custommodel
returns an object of class 'custommodel',
which is a character vector.
Peter Solymos
## Not run: ## simple regression example from the JAGS manual jfun <- function() { for (i in 1:N) { Y[i] ~ dnorm(mu[i], tau) mu[i] <- alpha + beta * (x[i] - x.bar) } x.bar <- mean(x) alpha ~ dnorm(0.0, 1.0E-4) beta ~ dnorm(0.0, 1.0E-4) sigma <- 1.0/sqrt(tau) tau ~ dgamma(1.0E-3, 1.0E-3) } ## data generation set.seed(1234) N <- 100 alpha <- 1 beta <- -1 sigma <- 0.5 x <- runif(N) linpred <- crossprod(t(model.matrix(~x)), c(alpha, beta)) Y <- rnorm(N, mean = linpred, sd = sigma) ## list of data for the model jdata <- list(N = N, Y = Y, x = x) ## what to monitor jpara <- c("alpha", "beta", "sigma") ## write model onto hard drive jmodnam <- write.jags.model(jfun) ## fit the model regmod <- jags.fit(jdata, jpara, jmodnam, n.chains = 3) ## cleanup clean.jags.model(jmodnam) ## model summary summary(regmod) ## End(Not run) ## let's customize this model jfun2 <- structure( c(" model { ", " for (i in 1:n) { ", " Y[i] ~ dpois(lambda[i]) ", " Y[i] <- alpha[i] + inprod(X[i,], beta[1,]) ", " log(lambda[i]) <- alpha[i] + inprod(X[i,], beta[1,]) ", " alpha[i] ~ dnorm(0, 1/sigma^2) ", " } ", " for (j in 1:np) { ", " beta[1,j] ~ dnorm(0, 0.001) ", " } ", " sigma ~ dlnorm(0, 0.001) ", " } "), class = "custommodel") custommodel(jfun2) ## GLMM custommodel(jfun2, 4) ## LM custommodel(jfun2, c(3,5)) ## deparse when print print(custommodel(jfun2), deparse=TRUE)
## Not run: ## simple regression example from the JAGS manual jfun <- function() { for (i in 1:N) { Y[i] ~ dnorm(mu[i], tau) mu[i] <- alpha + beta * (x[i] - x.bar) } x.bar <- mean(x) alpha ~ dnorm(0.0, 1.0E-4) beta ~ dnorm(0.0, 1.0E-4) sigma <- 1.0/sqrt(tau) tau ~ dgamma(1.0E-3, 1.0E-3) } ## data generation set.seed(1234) N <- 100 alpha <- 1 beta <- -1 sigma <- 0.5 x <- runif(N) linpred <- crossprod(t(model.matrix(~x)), c(alpha, beta)) Y <- rnorm(N, mean = linpred, sd = sigma) ## list of data for the model jdata <- list(N = N, Y = Y, x = x) ## what to monitor jpara <- c("alpha", "beta", "sigma") ## write model onto hard drive jmodnam <- write.jags.model(jfun) ## fit the model regmod <- jags.fit(jdata, jpara, jmodnam, n.chains = 3) ## cleanup clean.jags.model(jmodnam) ## model summary summary(regmod) ## End(Not run) ## let's customize this model jfun2 <- structure( c(" model { ", " for (i in 1:n) { ", " Y[i] ~ dpois(lambda[i]) ", " Y[i] <- alpha[i] + inprod(X[i,], beta[1,]) ", " log(lambda[i]) <- alpha[i] + inprod(X[i,], beta[1,]) ", " alpha[i] ~ dnorm(0, 1/sigma^2) ", " } ", " for (j in 1:np) { ", " beta[1,j] ~ dnorm(0, 0.001) ", " } ", " sigma ~ dlnorm(0, 0.001) ", " } "), class = "custommodel") custommodel(jfun2) ## GLMM custommodel(jfun2, 4) ## LM custommodel(jfun2, c(3,5)) ## deparse when print print(custommodel(jfun2), deparse=TRUE)