Title: | Community Ecology Package |
---|---|
Description: | Ordination methods, diversity analysis and other functions for community and vegetation ecologists. |
Authors: | Jari Oksanen [aut, cre], Gavin L. Simpson [aut], F. Guillaume Blanchet [aut], Roeland Kindt [aut], Pierre Legendre [aut], Peter R. Minchin [aut], R.B. O'Hara [aut], Peter Solymos [aut], M. Henry H. Stevens [aut], Eduard Szoecs [aut], Helene Wagner [aut], Matt Barbour [aut], Michael Bedward [aut], Ben Bolker [aut], Daniel Borcard [aut], Gustavo Carvalho [aut], Michael Chirico [aut], Miquel De Caceres [aut], Sebastien Durand [aut], Heloisa Beatriz Antoniazi Evangelista [aut], Rich FitzJohn [aut], Michael Friendly [aut], Brendan Furneaux [aut], Geoffrey Hannigan [aut], Mark O. Hill [aut], Leo Lahti [aut], Dan McGlinn [aut], Marie-Helene Ouellette [aut], Eduardo Ribeiro Cunha [aut], Tyler Smith [aut], Adrian Stier [aut], Cajo J.F. Ter Braak [aut], James Weedon [aut] |
Maintainer: | Jari Oksanen <[email protected]> |
License: | GPL-2 |
Version: | 2.7-0 |
Built: | 2024-10-09 12:24:37 UTC |
Source: | https://github.com/vegandevs/vegan |
The vegan package provides tools for descriptive community ecology. It has most basic functions of diversity analysis, community ordination and dissimilarity analysis. Most of its multivariate tools can be used for other data types as well.
The functions in the vegan package contain tools for diversity analysis, ordination methods and tools for the analysis of dissimilarities. Together with the labdsv package, the vegan package provides most standard tools of descriptive community analysis. Package ade4 provides an alternative comprehensive package, and several other packages complement vegan and provide tools for deeper analysis in specific fields. Package BiodiversityR provides a GUI for a large subset of vegan functionality.
The vegan package is developed at GitHub (https://github.com/vegandevs/vegan/). GitHub provides up-to-date information and forums for bug reports.
Most important changes in vegan documents can be read with
news(package="vegan")
and vignettes can be browsed with
browseVignettes("vegan")
. The vignettes include a vegan
FAQ, discussion on design decisions, short introduction to ordination
and discussion on diversity methods.
To see the preferable citation of the package, type
citation("vegan")
.
The vegan development team is Jari Oksanen, F. Guillaume Blanchet, Roeland Kindt, Pierre Legendre, Peter R. Minchin, R. B. O'Hara, Gavin L. Simpson, Peter Solymos, M. Henry H. Stevens, Helene Wagner. Many other people have contributed to individual functions: see credits in function help pages.
### Example 1: Unconstrained ordination ## NMDS data(varespec) data(varechem) ord <- metaMDS(varespec) plot(ord, type = "t") ## Fit environmental variables ef <- envfit(ord, varechem) ef plot(ef, p.max = 0.05) ### Example 2: Constrained ordination (RDA) ## The example uses formula interface to define the model data(dune) data(dune.env) ## No constraints: PCA mod0 <- rda(dune ~ 1, dune.env) mod0 plot(mod0) ## All environmental variables: Full model mod1 <- rda(dune ~ ., dune.env) mod1 plot(mod1) ## Automatic selection of variables by permutation P-values mod <- ordistep(mod0, scope=formula(mod1)) mod plot(mod) ## Permutation test for all variables anova(mod) ## Permutation test of "type III" effects, or significance when a term ## is added to the model after all other terms anova(mod, by = "margin") ## Plot only sample plots, use different symbols and draw SD ellipses ## for Managemenet classes plot(mod, display = "sites", type = "n") with(dune.env, points(mod, disp = "si", pch = as.numeric(Management))) with(dune.env, legend("topleft", levels(Management), pch = 1:4, title = "Management")) with(dune.env, ordiellipse(mod, Management, label = TRUE)) ## add fitted surface of diversity to the model ordisurf(mod, diversity(dune), add = TRUE) ### Example 3: analysis of dissimilarites a.k.a. non-parametric ### permutational anova adonis2(dune ~ ., dune.env) adonis2(dune ~ Management + Moisture, dune.env)
### Example 1: Unconstrained ordination ## NMDS data(varespec) data(varechem) ord <- metaMDS(varespec) plot(ord, type = "t") ## Fit environmental variables ef <- envfit(ord, varechem) ef plot(ef, p.max = 0.05) ### Example 2: Constrained ordination (RDA) ## The example uses formula interface to define the model data(dune) data(dune.env) ## No constraints: PCA mod0 <- rda(dune ~ 1, dune.env) mod0 plot(mod0) ## All environmental variables: Full model mod1 <- rda(dune ~ ., dune.env) mod1 plot(mod1) ## Automatic selection of variables by permutation P-values mod <- ordistep(mod0, scope=formula(mod1)) mod plot(mod) ## Permutation test for all variables anova(mod) ## Permutation test of "type III" effects, or significance when a term ## is added to the model after all other terms anova(mod, by = "margin") ## Plot only sample plots, use different symbols and draw SD ellipses ## for Managemenet classes plot(mod, display = "sites", type = "n") with(dune.env, points(mod, disp = "si", pch = as.numeric(Management))) with(dune.env, legend("topleft", levels(Management), pch = 1:4, title = "Management")) with(dune.env, ordiellipse(mod, Management, label = TRUE)) ## add fitted surface of diversity to the model ordisurf(mod, diversity(dune), add = TRUE) ### Example 3: analysis of dissimilarites a.k.a. non-parametric ### permutational anova adonis2(dune ~ ., dune.env) adonis2(dune ~ Management + Moisture, dune.env)
Compute all single terms that can be added to or dropped from a constrained ordination model.
## S3 method for class 'cca' add1(object, scope, test = c("none", "permutation"), permutations = how(nperm=199), ...) ## S3 method for class 'cca' drop1(object, scope, test = c("none", "permutation"), permutations = how(nperm=199), ...)
## S3 method for class 'cca' add1(object, scope, test = c("none", "permutation"), permutations = how(nperm=199), ...) ## S3 method for class 'cca' drop1(object, scope, test = c("none", "permutation"), permutations = how(nperm=199), ...)
object |
A constrained ordination object from
|
scope |
A formula giving the terms to be considered for adding
or dropping; see |
test |
Should a permutation test be added using |
permutations |
a list of control values for the permutations
as returned by the function |
... |
Other arguments passed to |
With argument test = "none"
the functions will only call
add1.default
or drop1.default
. With
argument test = "permutation"
the functions will add test
results from anova.cca
. Function drop1.cca
will
call anova.cca
with argument by = "margin"
.
Function add1.cca
will implement a test for single term
additions that is not directly available in anova.cca
.
Functions are used implicitly in step
,
ordiR2step
and ordistep
. The
deviance.cca
and deviance.rda
used in
step
have no firm basis, and setting argument test
= "permutation"
may help in getting useful insight into validity of
model building. Function ordistep
calls alternately
drop1.cca
and add1.cca
with argument
test = "permutation"
and selects variables by their permutation
-values. Meticulous use of
add1.cca
and
drop1.cca
will allow more judicious model building.
The default number of permutations
is set to a low value, because
permutation tests can take a long time. It should be sufficient to
give a impression on the significances of the terms, but higher
values of permutations
should be used if values really
are important.
Returns a similar object as add1
and drop1
.
Jari Oksanen
add1
, drop1
and
anova.cca
for basic methods. You probably need these
functions with step
and ordistep
. Functions
deviance.cca
and extractAIC.cca
are used
to produce the other arguments than test results in the
output.
data(dune) data(dune.env) ## Automatic model building based on AIC but with permutation tests step(cca(dune ~ 1, dune.env), reformulate(names(dune.env)), test="perm") ## see ?ordistep to do the same, but based on permutation P-values ## Not run: ordistep(cca(dune ~ 1, dune.env), reformulate(names(dune.env))) ## End(Not run) ## Manual model building ## -- define the maximal model for scope mbig <- rda(dune ~ ., dune.env) ## -- define an empty model to start with m0 <- rda(dune ~ 1, dune.env) ## -- manual selection and updating add1(m0, scope=formula(mbig), test="perm") m0 <- update(m0, . ~ . + Management) add1(m0, scope=formula(mbig), test="perm") m0 <- update(m0, . ~ . + Moisture) ## -- included variables still significant? drop1(m0, test="perm") add1(m0, scope=formula(mbig), test="perm")
data(dune) data(dune.env) ## Automatic model building based on AIC but with permutation tests step(cca(dune ~ 1, dune.env), reformulate(names(dune.env)), test="perm") ## see ?ordistep to do the same, but based on permutation P-values ## Not run: ordistep(cca(dune ~ 1, dune.env), reformulate(names(dune.env))) ## End(Not run) ## Manual model building ## -- define the maximal model for scope mbig <- rda(dune ~ ., dune.env) ## -- define an empty model to start with m0 <- rda(dune ~ 1, dune.env) ## -- manual selection and updating add1(m0, scope=formula(mbig), test="perm") m0 <- update(m0, . ~ . + Management) add1(m0, scope=formula(mbig), test="perm") m0 <- update(m0, . ~ . + Moisture) ## -- included variables still significant? drop1(m0, test="perm") add1(m0, scope=formula(mbig), test="perm")
In additive diversity partitioning, mean values of alpha diversity at lower levels of a sampling
hierarchy are compared to the total diversity in the entire data set (gamma diversity).
In hierarchical null model testing, a statistic returned by a function is evaluated
according to a nested hierarchical sampling design (hiersimu
).
adipart(...) ## Default S3 method: adipart(y, x, index=c("richness", "shannon", "simpson"), weights=c("unif", "prop"), relative = FALSE, nsimul=99, method = "r2dtable", ...) ## S3 method for class 'formula' adipart(formula, data, index=c("richness", "shannon", "simpson"), weights=c("unif", "prop"), relative = FALSE, nsimul=99, method = "r2dtable", ...) hiersimu(...) ## Default S3 method: hiersimu(y, x, FUN, location = c("mean", "median"), relative = FALSE, drop.highest = FALSE, nsimul=99, method = "r2dtable", ...) ## S3 method for class 'formula' hiersimu(formula, data, FUN, location = c("mean", "median"), relative = FALSE, drop.highest = FALSE, nsimul=99, method = "r2dtable", ...)
adipart(...) ## Default S3 method: adipart(y, x, index=c("richness", "shannon", "simpson"), weights=c("unif", "prop"), relative = FALSE, nsimul=99, method = "r2dtable", ...) ## S3 method for class 'formula' adipart(formula, data, index=c("richness", "shannon", "simpson"), weights=c("unif", "prop"), relative = FALSE, nsimul=99, method = "r2dtable", ...) hiersimu(...) ## Default S3 method: hiersimu(y, x, FUN, location = c("mean", "median"), relative = FALSE, drop.highest = FALSE, nsimul=99, method = "r2dtable", ...) ## S3 method for class 'formula' hiersimu(formula, data, FUN, location = c("mean", "median"), relative = FALSE, drop.highest = FALSE, nsimul=99, method = "r2dtable", ...)
y |
A community matrix. |
x |
A matrix with same number of rows as in |
formula |
A two sided model formula in the form |
data |
A data frame where to look for variables defined in the
right hand side of |
index |
Character, the diversity index to be calculated (see Details). |
weights |
Character, |
relative |
Logical, if |
nsimul |
Number of permutations to use. If |
method |
Null model method: either a name (character string) of
a method defined in |
FUN |
A function to be used by |
location |
Character, identifies which function (mean or median) is to be used to calculate location of the samples. |
drop.highest |
Logical, to drop the highest level or not. When
|
... |
Other arguments passed to functions, e.g. base of
logarithm for Shannon diversity, or |
Additive diversity partitioning means that mean alpha and beta
diversities add up to gamma diversity, thus beta diversity is measured
in the same dimensions as alpha and gamma (Lande 1996). This additive
procedure is then extended across multiple scales in a hierarchical
sampling design with levels of sampling
(Crist et al. 2003). Samples in lower hierarchical levels are nested
within higher level units, thus from
to
grain size
is increasing under constant survey extent. At each level
,
denotes average diversity found within samples.
At the highest sampling level, the diversity components are calculated as
For each lower sampling level as
Then, the additive partition of diversity is
Average alpha components can be weighted uniformly
(weight="unif"
) to calculate it as simple average, or
proportionally to sample abundances (weight="prop"
) to
calculate it as weighted average as follows
where
is the diversity index and
is the weight
calculated for the
th sample at the
th sampling level.
The implementation of additive diversity partitioning in
adipart
follows Crist et al. 2003. It is based on species
richness (, not
), Shannon's and Simpson's diversity
indices stated as the
index
argument.
The expected diversity components are calculated nsimul
times
by individual based randomisation of the community data matrix. This
is done by the "r2dtable"
method in oecosimu
by
default.
hiersimu
works almost in the same way as adipart
, but
without comparing the actual statistic values returned by FUN
to the highest possible value (cf. gamma diversity). This is so,
because in most of the cases, it is difficult to ensure additive
properties of the mean statistic values along the hierarchy.
An object of class "adipart"
or "hiersimu"
with same
structure as oecosimu
objects.
Péter Sólymos, [email protected]
Crist, T.O., Veech, J.A., Gering, J.C. and Summerville,
K.S. (2003). Partitioning species diversity across landscapes and
regions: a hierarchical analysis of ,
, and
-diversity. Am. Nat., 162, 734–743.
Lande, R. (1996). Statistics and partitioning of species diversity, and similarity among multiple communities. Oikos, 76, 5–13.
See oecosimu
for permutation settings and
calculating -values.
multipart
for multiplicative
diversity partitioning.
## NOTE: 'nsimul' argument usually needs to be >= 99 ## here much lower value is used for demonstration data(mite) data(mite.xy) data(mite.env) ## Function to get equal area partitions of the mite data cutter <- function (x, cut = seq(0, 10, by = 2.5)) { out <- rep(1, length(x)) for (i in 2:(length(cut) - 1)) out[which(x > cut[i] & x <= cut[(i + 1)])] <- i return(out)} ## The hierarchy of sample aggregation levsm <- with(mite.xy, data.frame( l1=1:nrow(mite), l2=cutter(y, cut = seq(0, 10, by = 2.5)), l3=cutter(y, cut = seq(0, 10, by = 5)), l4=rep(1, nrow(mite)))) ## Let's see in a map par(mfrow=c(1,3)) plot(mite.xy, main="l1", col=as.numeric(levsm$l1)+1, asp = 1) plot(mite.xy, main="l2", col=as.numeric(levsm$l2)+1, asp = 1) plot(mite.xy, main="l3", col=as.numeric(levsm$l3)+1, asp = 1) par(mfrow=c(1,1)) ## Additive diversity partitioning adipart(mite, index="richness", nsimul=19) ## the next two define identical models adipart(mite, levsm, index="richness", nsimul=19) adipart(mite ~ l2 + l3, levsm, index="richness", nsimul=19) ## Hierarchical null model testing ## diversity analysis (similar to adipart) hiersimu(mite, FUN=diversity, relative=TRUE, nsimul=19) hiersimu(mite ~ l2 + l3, levsm, FUN=diversity, relative=TRUE, nsimul=19) ## Hierarchical testing with the Morisita index morfun <- function(x) dispindmorisita(x)$imst hiersimu(mite ~., levsm, morfun, drop.highest=TRUE, nsimul=19)
## NOTE: 'nsimul' argument usually needs to be >= 99 ## here much lower value is used for demonstration data(mite) data(mite.xy) data(mite.env) ## Function to get equal area partitions of the mite data cutter <- function (x, cut = seq(0, 10, by = 2.5)) { out <- rep(1, length(x)) for (i in 2:(length(cut) - 1)) out[which(x > cut[i] & x <= cut[(i + 1)])] <- i return(out)} ## The hierarchy of sample aggregation levsm <- with(mite.xy, data.frame( l1=1:nrow(mite), l2=cutter(y, cut = seq(0, 10, by = 2.5)), l3=cutter(y, cut = seq(0, 10, by = 5)), l4=rep(1, nrow(mite)))) ## Let's see in a map par(mfrow=c(1,3)) plot(mite.xy, main="l1", col=as.numeric(levsm$l1)+1, asp = 1) plot(mite.xy, main="l2", col=as.numeric(levsm$l2)+1, asp = 1) plot(mite.xy, main="l3", col=as.numeric(levsm$l3)+1, asp = 1) par(mfrow=c(1,1)) ## Additive diversity partitioning adipart(mite, index="richness", nsimul=19) ## the next two define identical models adipart(mite, levsm, index="richness", nsimul=19) adipart(mite ~ l2 + l3, levsm, index="richness", nsimul=19) ## Hierarchical null model testing ## diversity analysis (similar to adipart) hiersimu(mite, FUN=diversity, relative=TRUE, nsimul=19) hiersimu(mite ~ l2 + l3, levsm, FUN=diversity, relative=TRUE, nsimul=19) ## Hierarchical testing with the Morisita index morfun <- function(x) dispindmorisita(x)$imst hiersimu(mite ~., levsm, morfun, drop.highest=TRUE, nsimul=19)
Analysis of variance using distance matrices — for
partitioning distance matrices among sources of variation and fitting
linear models (e.g., factors, polynomial regression) to distance
matrices; uses a permutation test with pseudo- ratios.
adonis2(formula, data, permutations = 999, method = "bray", sqrt.dist = FALSE, add = FALSE, by = NULL, parallel = getOption("mc.cores"), na.action = na.fail, strata = NULL, ...)
adonis2(formula, data, permutations = 999, method = "bray", sqrt.dist = FALSE, add = FALSE, by = NULL, parallel = getOption("mc.cores"), na.action = na.fail, strata = NULL, ...)
formula |
Model formula. The left-hand side (LHS) of the formula
must be either a community data matrix or a dissimilarity matrix,
e.g., from |
data |
the data frame for the independent variables, with rows
in the same order as the community data matrix or dissimilarity
matrix named on the LHS of |
permutations |
a list of control values for the permutations
as returned by the function |
method |
the name of any method used in |
sqrt.dist |
Take square root of dissimilarities. This often euclidifies dissimilarities. |
add |
Add a constant to the non-diagonal dissimilarities such
that all eigenvalues are non-negative in the underlying Principal
Co-ordinates Analysis (see |
by |
|
parallel |
Number of parallel processes or a predefined socket
cluster. With |
na.action |
Handling of missing values on the right-hand-side
of the formula (see |
strata |
Groups within which to constrain permutations. The traditional non-movable strata are set as Blocks in the permute package, but some more flexible alternatives may be more appropriate. |
... |
Other arguments passed to |
adonis2
is a function for the analysis and partitioning sums of
squares using dissimilarities. The function is based on the principles
of McArdle & Anderson (2001) and can perform sequential, marginal and
overall tests. The function also allows using additive constants or
squareroot of dissimilarities to avoid negative eigenvalues, but can
also handle semimetric indices (such as Bray-Curtis) that produce
negative eigenvalues. The adonis2
tests are identical to
anova.cca
of dbrda
. With Euclidean
distances, the tests are also identical to anova.cca
of
rda
.
The function partitions sums of squares of a multivariate data set,
and they are directly analogous to MANOVA (multivariate analysis of
variance). McArdle and Anderson (2001) and Anderson (2001) refer to
the method as “permutational MANOVA” (formerly
“nonparametric MANOVA”). Further, as the inputs are linear
predictors, and a response matrix of an arbitrary number of columns,
they are a robust alternative to both parametric MANOVA and to
ordination methods for describing how variation is attributed to
different experimental treatments or uncontrolled covariates. The
method is also analogous to distance-based redundancy analysis and
algorithmically similar to dbrda
(Legendre and Anderson
1999), and provides an alternative to AMOVA (nested analysis of
molecular variance, Excoffier, Smouse, and Quattro, 1992; amova
in the ade4 package) for both crossed and nested factors.
The function returns an anova.cca
result object with a
new column for partial : This is the proportion
of sum of squares from the total, and in marginal models
(
by = "margin"
) the terms do not add up to
1.
Anderson (2001, Fig. 4) warns that the method may confound
location and dispersion effects: significant differences may be caused
by different within-group variation (dispersion) instead of different
mean values of the groups (see Warton et al. 2012 for a general
analysis). However, it seems that adonis2
is less sensitive to
dispersion effects than some of its alternatives (anosim
,
mrpp
). Function betadisper
is a sister
function to adonis2
to study the differences in dispersion
within the same geometric framework.
Martin Henry H. Stevens and Jari Oksanen.
Anderson, M.J. 2001. A new method for non-parametric multivariate analysis of variance. Austral Ecology, 26: 32–46.
Excoffier, L., P.E. Smouse, and J.M. Quattro. 1992. Analysis of molecular variance inferred from metric distances among DNA haplotypes: Application to human mitochondrial DNA restriction data. Genetics, 131:479–491.
Legendre, P. and M.J. Anderson. 1999. Distance-based redundancy analysis: Testing multispecies responses in multifactorial ecological experiments. Ecological Monographs, 69:1–24.
McArdle, B.H. and M.J. Anderson. 2001. Fitting multivariate models to community data: A comment on distance-based redundancy analysis. Ecology, 82: 290–297.
Warton, D.I., Wright, T.W., Wang, Y. 2012. Distance-based multivariate analyses confound location and dispersion effects. Methods in Ecology and Evolution, 3, 89–101.
mrpp
, anosim
,
mantel
, varpart
.
data(dune) data(dune.env) ## default is overall (omnibus) test adonis2(dune ~ Management*A1, data = dune.env) ## sequential tests adonis2(dune ~ Management*A1, data = dune.env, by = "terms") ### Example of use with strata, for nested (e.g., block) designs. dat <- expand.grid(rep=gl(2,1), NO3=factor(c(0,10)),field=gl(3,1) ) dat Agropyron <- with(dat, as.numeric(field) + as.numeric(NO3)+2) +rnorm(12)/2 Schizachyrium <- with(dat, as.numeric(field) - as.numeric(NO3)+2) +rnorm(12)/2 total <- Agropyron + Schizachyrium dotplot(total ~ NO3, dat, jitter.x=TRUE, groups=field, type=c('p','a'), xlab="NO3", auto.key=list(columns=3, lines=TRUE) ) Y <- data.frame(Agropyron, Schizachyrium) mod <- metaMDS(Y, trace = FALSE) plot(mod) ### Ellipsoid hulls show treatment with(dat, ordiellipse(mod, NO3, kind = "ehull", label = TRUE)) ### Spider shows fields with(dat, ordispider(mod, field, lty=3, col="red", label = TRUE)) ### Incorrect (no strata) adonis2(Y ~ NO3, data = dat, permutations = 199) ## Correct with strata with(dat, adonis2(Y ~ NO3, data = dat, permutations = 199, strata = field))
data(dune) data(dune.env) ## default is overall (omnibus) test adonis2(dune ~ Management*A1, data = dune.env) ## sequential tests adonis2(dune ~ Management*A1, data = dune.env, by = "terms") ### Example of use with strata, for nested (e.g., block) designs. dat <- expand.grid(rep=gl(2,1), NO3=factor(c(0,10)),field=gl(3,1) ) dat Agropyron <- with(dat, as.numeric(field) + as.numeric(NO3)+2) +rnorm(12)/2 Schizachyrium <- with(dat, as.numeric(field) - as.numeric(NO3)+2) +rnorm(12)/2 total <- Agropyron + Schizachyrium dotplot(total ~ NO3, dat, jitter.x=TRUE, groups=field, type=c('p','a'), xlab="NO3", auto.key=list(columns=3, lines=TRUE) ) Y <- data.frame(Agropyron, Schizachyrium) mod <- metaMDS(Y, trace = FALSE) plot(mod) ### Ellipsoid hulls show treatment with(dat, ordiellipse(mod, NO3, kind = "ehull", label = TRUE)) ### Spider shows fields with(dat, ordispider(mod, field, lty=3, col="red", label = TRUE)) ### Incorrect (no strata) adonis2(Y ~ NO3, data = dat, permutations = 199) ## Correct with strata with(dat, adonis2(Y ~ NO3, data = dat, permutations = 199, strata = field))
Analysis of similarities (ANOSIM) provides a way to test statistically whether there is a significant difference between two or more groups of sampling units.
anosim(x, grouping, permutations = 999, distance = "bray", strata = NULL, parallel = getOption("mc.cores"))
anosim(x, grouping, permutations = 999, distance = "bray", strata = NULL, parallel = getOption("mc.cores"))
x |
Data matrix or data frame in which rows are samples and columns are response variable(s), or a dissimilarity object or a symmetric square matrix of dissimilarities. |
grouping |
Factor for grouping observations. |
permutations |
a list of control values for the permutations
as returned by the function |
distance |
Choice of distance metric that measures the
dissimilarity between two observations. See |
strata |
An integer vector or factor specifying the strata for permutation. If supplied, observations are permuted only within the specified strata. |
parallel |
Number of parallel processes or a predefined socket
cluster. With |
Analysis of similarities (ANOSIM) provides a way to test statistically
whether there is a significant difference between two or more groups
of sampling units. Function anosim
operates directly on a
dissimilarity matrix. A suitable dissimilarity matrix is produced by
functions dist
or vegdist
. The
method is philosophically allied with NMDS ordination
(monoMDS
), in that it uses only the rank order of
dissimilarity values.
If two groups of sampling units are really different in their species
composition, then compositional dissimilarities between the groups
ought to be greater than those within the groups. The anosim
statistic is based on the difference of mean ranks between
groups (
) and within groups (
):
The divisor is chosen so that will be in the interval
, value
indicating completely random
grouping.
The statistical significance of observed is assessed by
permuting the grouping vector to obtain the empirical distribution
of
under null-model. See
permutations
for
additional details on permutation tests in Vegan. The distribution
of simulated values can be inspected with the permustats
function.
The function has summary
and plot
methods. These both
show valuable information to assess the validity of the method: The
function assumes that all ranked dissimilarities within groups
have about equal median and range. The plot
method uses
boxplot
with options notch=TRUE
and
varwidth=TRUE
.
The function returns a list of class "anosim"
with following
items:
call |
Function call. |
statistic |
The value of ANOSIM statistic |
signif |
Significance from permutation. |
perm |
Permutation values of |
class.vec |
Factor with value |
dis.rank |
Rank of dissimilarity entry. |
dissimilarity |
The name of the dissimilarity index: the
|
control |
A list of control values for the permutations
as returned by the function |
The anosim
function can confound the differences between groups
and dispersion within groups and the results can be difficult to
interpret (cf. Warton et al. 2012). The function returns a lot of
information to ease studying its performance. Most anosim
models could be analysed with adonis2
which seems to be a
more robust alternative.
Jari Oksanen, with a help from Peter R. Minchin.
Clarke, K. R. (1993). Non-parametric multivariate analysis of changes in community structure. Australian Journal of Ecology 18, 117–143.
Warton, D.I., Wright, T.W., Wang, Y. 2012. Distance-based multivariate analyses confound location and dispersion effects. Methods in Ecology and Evolution, 3, 89–101
mrpp
for a similar function using original
dissimilarities instead of their ranks.
dist
and vegdist
for obtaining
dissimilarities, and rank
for ranking real values. For
comparing dissimilarities against continuous variables, see
mantel
. Function adonis2
is a more robust
alternative that should preferred.
data(dune) data(dune.env) dune.dist <- vegdist(dune) dune.ano <- with(dune.env, anosim(dune.dist, Management)) summary(dune.ano) plot(dune.ano)
data(dune) data(dune.env) dune.dist <- vegdist(dune) dune.ano <- with(dune.env, anosim(dune.dist, Management)) summary(dune.ano) plot(dune.ano)
The function performs an ANOVA like permutation test for Constrained
Correspondence Analysis (cca
), Redundancy Analysis
(rda
) or distance-based Redundancy Analysis (dbRDA,
dbrda
) to assess the significance of constraints.
## S3 method for class 'cca' anova(object, ..., permutations = how(nperm=999), by = NULL, model = c("reduced", "direct", "full"), parallel = getOption("mc.cores"), strata = NULL, cutoff = 1, scope = NULL) ## S3 method for class 'cca' permutest(x, permutations = how(nperm = 99), model = c("reduced", "direct", "full"), by = NULL, first = FALSE, strata = NULL, parallel = getOption("mc.cores"), ...)
## S3 method for class 'cca' anova(object, ..., permutations = how(nperm=999), by = NULL, model = c("reduced", "direct", "full"), parallel = getOption("mc.cores"), strata = NULL, cutoff = 1, scope = NULL) ## S3 method for class 'cca' permutest(x, permutations = how(nperm = 99), model = c("reduced", "direct", "full"), by = NULL, first = FALSE, strata = NULL, parallel = getOption("mc.cores"), ...)
object |
One or several result objects from |
x |
A single ordination result object. |
permutations |
a list of control values for the permutations
as returned by the function |
by |
Setting |
model |
Permutation model: |
parallel |
Use parallel processing with the given number of cores. |
strata |
An integer vector or factor specifying the strata for
permutation. If supplied, observations are permuted only within
the specified strata. It is an error to use this when
|
cutoff |
Only effective with |
scope |
Only effective with |
first |
Analyse only significance of the first axis. |
... |
Parameters passed to other functions. |
Functions anova.cca
and permutest.cca
implement ANOVA
like permutation tests for the joint effect of constraints in
cca
, rda
, dbrda
or
capscale
. Function anova
is intended as a more
user-friendly alternative to permutest
(that is the real
workhorse).
Function anova
can analyse a sequence of constrained
ordination models. The analysis is based on the differences in
residual deviance in permutations of nested models.
The default test is for the sum of all constrained eigenvalues.
Setting first = TRUE
will perform a test for the first
constrained eigenvalue. Argument first
can be set either in
anova.cca
or in permutest.cca
. It is also possible to
perform significance tests for each axis or for each term
(constraining variable) using argument by
in anova.cca
.
Setting by = "axis"
will perform separate significance tests
for each constrained axis. All previous constrained axes will be used
as conditions (“partialled out”) and a test for the first
constrained eigenvalues is performed (Legendre et al. 2011). You can
stop permutation tests after exceeding a given significance level with
argument cutoff
to speed up calculations in large
models. Setting by = "terms"
will perform separate significance
test for each term (constraining variable). The terms are assessed
sequentially from first to last, and the order of the terms will
influence their significances. Setting by = "onedf"
will
perform a similar sequential test for one-degree-of-freedom effects,
where multi-level factors are split in their contrasts. Setting
by = "margin"
will perform separate significance test for each
marginal term in a model with all other terms. The marginal test also
accepts a scope
argument for the drop.scope
which
can be a character vector of term labels that are analysed, or a
fitted model of lower scope. The marginal effects are also known as
“Type III” effects, but the current function only evaluates
marginal terms. It will, for instance, ignore main effects that are
included in interaction terms. In calculating pseudo-, all
terms are compared to the same residual of the full model.
Community data are permuted with choice model="direct"
, and
residuals after partial CCA/ RDA/ dbRDA with choice
model="reduced"
(default). If there is no partial CCA/ RDA/
dbRDA stage, model="reduced"
simply permutes the data and is
equivalent to model="direct"
. The test statistic is
“pseudo-”, which is the ratio of constrained and
unconstrained total Inertia (Chi-squares, variances or something
similar), each divided by their respective degrees of freedom. If
there are no conditions (“partial” terms), the sum of all
eigenvalues remains constant, so that pseudo-
and eigenvalues
would give equal results. In partial CCA/ RDA/ dbRDA, the effect of
conditioning variables (“covariables”) is removed before
permutation, and the total Chi-square is not fixed, and test based on
pseudo-
would differ from the test based on plain
eigenvalues.
The function anova.cca
calls permutest.cca
and fills an
anova
table. Additional attributes are
Random.seed
(the random seeds used),
control
(the permutation design, see how) and
F.perm
(the permuted test statistics).
Jari Oksanen
Legendre, P. and Legendre, L. (2012). Numerical Ecology. 3rd English ed. Elsevier.
Legendre, P., Oksanen, J. and ter Braak, C.J.F. (2011). Testing the significance of canonical axes in redundancy analysis. Methods in Ecology and Evolution 2, 269–277.
anova.cca
, cca
,
rda
, dbrda
to get something to
analyse. Function drop1.cca
calls anova.cca
with by = "margin"
, and add1.cca
an analysis
for single terms additions, which can be used in automatic or
semiautomatic model building (see deviance.cca
).
data(dune, dune.env) mod <- cca(dune ~ Moisture + Management, dune.env) ## overall test anova(mod) ## tests for individual terms anova(mod, by="term") anova(mod, by="margin") ## sequential test for contrasts anova(mod, by = "onedf") ## test for adding all environmental variables anova(mod, cca(dune ~ ., dune.env))
data(dune, dune.env) mod <- cca(dune ~ Moisture + Management, dune.env) ## overall test anova(mod) ## tests for individual terms anova(mod, by="term") anova(mod, by="margin") ## sequential test for contrasts anova(mod, by = "onedf") ## test for adding all environmental variables anova(mod, cca(dune ~ ., dune.env))
The function computes the dissimilarity matrix of a dataset multiple
times using vegdist
while randomly subsampling the
dataset each time. All of the subsampled iterations are then averaged
(mean) to provide a distance matrix that represents the average of
multiple subsampling iterations. This emulates the behavior of the
distance matrix calculator within the Mothur microbial ecology toolkit.
avgdist(x, sample, distfun = vegdist, meanfun = mean, transf = NULL, iterations = 100, dmethod = "bray", diag = TRUE, upper = TRUE, ...)
avgdist(x, sample, distfun = vegdist, meanfun = mean, transf = NULL, iterations = 100, dmethod = "bray", diag = TRUE, upper = TRUE, ...)
x |
Community data matrix. |
sample |
The subsampling depth to be used in each iteration. Samples that do not meet this threshold will be removed from the analysis, and their identity returned to the user in stdout. |
distfun |
The dissimilarity matrix function to be used. Default is the
vegan |
meanfun |
The calculation to use for the average (mean or median). |
transf |
Option for transforming the count data before calculating the
distance matrix. Any base transformation option can be used (e.g.
|
iterations |
The number of random iterations to perform before averaging. Default is 100 iterations. |
dmethod |
Dissimilarity index to be used with the specified dissimilarity matrix function. Default is Bray-Curtis |
diag , upper
|
Return dissimilarities with diagonal and upper
triangle. NB. the default differs from |
... |
Any additional arguments to add to the distance function or mean/median function specified. |
The function builds on the function rrarefy
and and
additional distance matrix function (e.g. vegdist
) to
add more meaningful representations of distances among randomly
subsampled datasets by presenting the average of multiple random
iterations. This function runs using the vegdist
. This
functionality has been utilized in the Mothur standalone microbial
ecology toolkit here.
Geoffrey Hannigan, with some minor tweaks by Gavin L. Simpson.
This function utilizes the vegdist
and rrarefy
functions.
# Import an example count dataset data(BCI) # Test the base functionality mean.avg.dist <- avgdist(BCI, sample = 50, iterations = 10) # Test the transformation function mean.avg.dist.t <- avgdist(BCI, sample = 50, iterations = 10, transf = sqrt) # Test the median functionality median.avg.dist <- avgdist(BCI, sample = 50, iterations = 10, meanfun = median) # Print the resulting tables head(as.matrix(mean.avg.dist)) head(as.matrix(mean.avg.dist.t)) head(as.matrix(median.avg.dist)) # Run example to illustrate low variance of mean, median, and stdev results # Mean and median std dev are around 0.05 sdd <- avgdist(BCI, sample = 50, iterations = 100, meanfun = sd) summary(mean.avg.dist) summary(median.avg.dist) summary(sdd) # Test for when subsampling depth excludes some samples # Return samples that are removed for not meeting depth filter depth.avg.dist <- avgdist(BCI, sample = 450, iterations = 10) # Print the result depth.avg.dist
# Import an example count dataset data(BCI) # Test the base functionality mean.avg.dist <- avgdist(BCI, sample = 50, iterations = 10) # Test the transformation function mean.avg.dist.t <- avgdist(BCI, sample = 50, iterations = 10, transf = sqrt) # Test the median functionality median.avg.dist <- avgdist(BCI, sample = 50, iterations = 10, meanfun = median) # Print the resulting tables head(as.matrix(mean.avg.dist)) head(as.matrix(mean.avg.dist.t)) head(as.matrix(median.avg.dist)) # Run example to illustrate low variance of mean, median, and stdev results # Mean and median std dev are around 0.05 sdd <- avgdist(BCI, sample = 50, iterations = 100, meanfun = sd) summary(mean.avg.dist) summary(median.avg.dist) summary(sdd) # Test for when subsampling depth excludes some samples # Return samples that are removed for not meeting depth filter depth.avg.dist <- avgdist(BCI, sample = 450, iterations = 10) # Print the result depth.avg.dist
Tree counts in 1-hectare plots in the Barro Colorado Island and associated site information.
data(BCI) data(BCI.env)
data(BCI) data(BCI.env)
A data frame with 50 plots (rows) of 1 hectare with counts of trees on
each plot with total of 225 species (columns). Full Latin names are
used for tree species. The names were updated against
http://www.theplantlist.org and Kress et al. (2009) which allows
matching 207 of species against doi:10.5061/dryad.63q27 (Zanne et
al., 2014). The original species names are available as attribute
original.names
of BCI
. See Examples for changed names.
For BCI.env
, a data frame with 50 plots (rows) and nine site
variables derived from Pyke et al. (2001) and Harms et al. (2001):
UTM.EW
: UTM coordinates (zone 17N) East-West.
UTM.NS
: UTM coordinates (zone 17N) North-South.
Precipitation
: Precipitation in mm per year.
Elevation
: Elevation in m above sea level.
Age.cat
: Forest age category.
Geology
: The Underlying geological formation.
Habitat
: Dominant habitat type based on the map of
habitat types in 25 grid cells in each plot (Harms et al. 2001,
excluding streamside habitat). The habitat types are Young
forests (ca. 100 years), old forests on > 7 degree slopes
(OldSlope
), old forests under 152 m elevation
(OldLow
) and at higher elevation (OldHigh
) and
Swamp
forests.
River
: "Yes"
if there is streamside habitat
in the plot.
EnvHet
: Environmental Heterogeneity assessed as the
Simpson diversity of frequencies of Habitat
types in 25
grid cells in the plot.
Data give the numbers of trees at least 10 cm in diameter at breast
height (DBH) in each one hectare quadrat in the 1982 BCI
plot. Within each plot, all individuals were tallied and are
recorded in this table. The full survey included smaller trees with
DBH 1 cm or larger, but the BCI
dataset is a subset of larger
trees as compiled by Condit et al. (2002). The full data with
thinner trees has densities above 4000 stems per hectare, or about
ten times more stems than these data. The dataset BCI
was
provided (in 2003) to illustrate analysis methods in
vegan. For scientific research on ecological issues we
strongly recommend to access complete and more modern data (Condit
et al. 2019) with updated taxonomy (Condit et al. 2020).
The data frame contains only the Barro Colorado Island subset of the full data table of Condit et al. (2002).
The quadrats are located in a regular grid. See BCI.env
for the
coordinates.
A full description of the site information in BCI.env
is
given in Pyke et al. (2001) and Harms et al. (2001). N.B.
Pyke et al. (2001) and Harms et al. (2001) give conflicting
information about forest age categories and elevation.
https://www.science.org/doi/10.1126/science.1066854 for community data and References for environmental data. For updated complete data (incl. thinner trees down to 1 cm), see Condit et al. (2019).
Condit, R, Pitman, N, Leigh, E.G., Chave, J., Terborgh, J., Foster, R.B., Nuñez, P., Aguilar, S., Valencia, R., Villa, G., Muller-Landau, H.C., Losos, E. & Hubbell, S.P. (2002). Beta-diversity in tropical forest trees. Science 295, 666–669.
Condit R., Pérez, R., Aguilar, S., Lao, S., Foster, R. & Hubbell, S. (2019). Complete data from the Barro Colorado 50-ha plot: 423617 trees, 35 years [Dataset]. Dryad. doi:10.15146/5xcp-0d46
Condit, R., Aguilar, S., Lao, S., Foster, R., Hubbell, S. (2020). BCI 50-ha Plot Taxonomy [Dataset]. Dryad. doi:10.15146/R3FH61
Harms K.E., Condit R., Hubbell S.P. & Foster R.B. (2001) Habitat associations of trees and shrubs in a 50-ha neotropical forest plot. J. Ecol. 89, 947–959.
Kress W.J., Erickson D.L, Jones F.A., Swenson N.G, Perez R., Sanjur O. & Bermingham E. (2009) Plant DNA barcodes and a community phylogeny of a tropical forest dynamics plot in Panama. PNAS 106, 18621–18626.
Pyke, C. R., Condit, R., Aguilar, S., & Lao, S. (2001). Floristic composition across a climatic gradient in a neotropical lowland forest. Journal of Vegetation Science 12, 553–566. doi:10.2307/3237007
Zanne A.E., Tank D.C., Cornwell, W.K., Eastman J.M., Smith, S.A., FitzJohn, R.G., McGlinn, D.J., O’Meara, B.C., Moles, A.T., Reich, P.B., Royer, D.L., Soltis, D.E., Stevens, P.F., Westoby, M., Wright, I.J., Aarssen, L., Bertin, R.I., Calaminus, A., Govaerts, R., Hemmings, F., Leishman, M.R., Oleksyn, J., Soltis, P.S., Swenson, N.G., Warman, L. & Beaulieu, J.M. (2014) Three keys to the radiation of angiosperms into freezing environments. Nature 506, 89–92. doi:10.1038/nature12872 (published online Dec 22, 2013).
Extra-CRAN package natto
(https://github.com/jarioksa/natto) has data set
BCI.env2
with original grid data of Harms et al. (2001)
habitat classification, and data set BCI.taxon
of APG III
classification of tree species.
data(BCI, BCI.env) head(BCI.env) ## see changed species names oldnames <- attr(BCI, "original.names") taxa <- cbind("Old Names" = oldnames, "Current Names" = names(BCI)) noquote(taxa[taxa[,1] != taxa[,2], ])
data(BCI, BCI.env) head(BCI.env) ## see changed species names oldnames <- attr(BCI, "original.names") taxa <- cbind("Old Names" = oldnames, "Current Names" = names(BCI)) noquote(taxa[taxa[,1] != taxa[,2], ])
Beals smoothing replaces each entry in the community data with a probability of a target species occurring in that particular site, based on the joint occurrences of the target species with the species that actually occur in the site. Swan's (1970) degree of absence applies Beals smoothing to zero items so long that all zeros are replaced with smoothed values.
beals(x, species = NA, reference = x, type = 0, include = TRUE) swan(x, maxit = Inf, type = 0)
beals(x, species = NA, reference = x, type = 0, include = TRUE) swan(x, maxit = Inf, type = 0)
x |
Community data frame or matrix. |
species |
Column index used to compute Beals function for a single species.
The default ( |
reference |
Community data frame or matrix to be used to compute
joint occurrences. By default, |
type |
Numeric. Specifies if and how abundance values have to be
used in function |
include |
This logical flag indicates whether the target species has to be
included when computing the mean of the conditioned probabilities. The
original Beals (1984) definition is equivalent to |
maxit |
Maximum number of iterations. The default |
Beals smoothing is the estimated probability that
species
occurs at site
. It is defined as
, where
is the number of
species at site
,
is the number of joint
occurrences of species
and
,
is the
number of occurrences of species
, and
is the incidence
(0 or 1) of species (this last term is usually omitted from the
equation, but it is necessary). As
can be
interpreted as a mean of conditional probability, the
beals
function can be interpreted as a mean of conditioned probabilities (De
Cáceres & Legendre 2008). The present function is
generalized to abundance values (De Cáceres & Legendre
2008).
The type
argument specifies if and how abundance values have to be
used. type = 0
presence/absence mode. type = 1
abundances in reference
(or x
) are used to compute
conditioned probabilities. type = 2
abundances in x
are
used to compute weighted averages of conditioned
probabilities. type = 3
abundances are used to compute both
conditioned probabilities and weighted averages.
Beals smoothing was originally suggested as a method of data
transformation to remove excessive zeros (Beals 1984, McCune 1994).
However, it is not a suitable method for this purpose since it does
not maintain the information on species presences: a species may have
a higher probability of occurrence at a site where it does not occur
than at sites where it occurs. Moreover, it regularizes data too
strongly. The method may be useful in identifying species that belong
to the species pool (Ewald 2002) or to identify suitable unoccupied
patches in metapopulation analysis (Münzbergová &
Herben 2004). In this case, the function should be called with
include=FALSE
for cross-validation smoothing for species;
argument species
can be used if only one species is studied.
Swan (1970) suggested replacing zero values with degrees of absence of
a species in a community data matrix. Swan expressed the method in
terms of a similarity matrix, but it is equivalent to applying Beals
smoothing to zero values, at each step shifting the smallest initially
non-zero item to value one, and repeating this so many times that
there are no zeros left in the data. This is actually very similar to
extended dissimilarities (implemented in function
stepacross
), but very rarely used.
The function returns a transformed data matrix or a vector if Beals smoothing is requested for a single species.
Miquel De Cáceres and Jari Oksanen
Beals, E.W. 1984. Bray-Curtis ordination: an effective strategy for analysis of multivariate ecological data. Pp. 1–55 in: MacFadyen, A. & E.D. Ford [eds.] Advances in Ecological Research, 14. Academic Press, London.
De Cáceres, M. & Legendre, P. 2008. Beals smoothing revisited. Oecologia 156: 657–669.
Ewald, J. 2002. A probabilistic approach to estimating species pools from large compositional matrices. J. Veg. Sci. 13: 191–198.
McCune, B. 1994. Improving community ordination with the Beals smoothing function. Ecoscience 1: 82–86.
Münzbergová, Z. & Herben, T. 2004. Identification of suitable unoccupied habitats in metapopulation studies using co-occurrence of species. Oikos 105: 408–414.
Swan, J.M.A. 1970. An examination of some ordination problems by use of simulated vegetational data. Ecology 51: 89–102.
decostand
for proper standardization methods,
specpool
for an attempt to assess the size of species
pool. Function indpower
assesses the power of each species
to estimate the probabilities predicted by beals
.
data(dune) ## Default x <- beals(dune) ## Remove target species x <- beals(dune, include = FALSE) ## Smoothed values against presence or absence of species pa <- decostand(dune, "pa") boxplot(as.vector(x) ~ unlist(pa), xlab="Presence", ylab="Beals") ## Remove the bias of tarbet species: Yields lower values. beals(dune, type =3, include = FALSE) ## Uses abundance information. ## Vector with beals smoothing values corresponding to the first species ## in dune. beals(dune, species=1, include=TRUE)
data(dune) ## Default x <- beals(dune) ## Remove target species x <- beals(dune, include = FALSE) ## Smoothed values against presence or absence of species pa <- decostand(dune, "pa") boxplot(as.vector(x) ~ unlist(pa), xlab="Presence", ylab="Beals") ## Remove the bias of tarbet species: Yields lower values. beals(dune, type =3, include = FALSE) ## Uses abundance information. ## Vector with beals smoothing values corresponding to the first species ## in dune. beals(dune, species=1, include=TRUE)
Implements Marti Anderson's PERMDISP2 procedure for the analysis of
multivariate homogeneity of group dispersions (variances).
betadisper
is a multivariate analogue of Levene's test for
homogeneity of variances. Non-euclidean distances between objects and
group centres (centroids or medians) are handled by reducing the
original distances to principal coordinates. This procedure has
latterly been used as a means of assessing beta diversity. There are
anova
, scores
, plot
and boxplot
methods.
TukeyHSD.betadisper
creates a set of confidence intervals on
the differences between the mean distance-to-centroid of the levels of
the grouping factor with the specified family-wise probability of
coverage. The intervals are based on the Studentized range statistic,
Tukey's 'Honest Significant Difference' method.
betadisper(d, group, type = c("median","centroid"), bias.adjust = FALSE, sqrt.dist = FALSE, add = FALSE) ## S3 method for class 'betadisper' anova(object, ...) ## S3 method for class 'betadisper' scores(x, display = c("sites", "centroids"), choices = c(1,2), ...) ## S3 method for class 'betadisper' eigenvals(x, ...) ## S3 method for class 'betadisper' plot(x, axes = c(1,2), cex = 0.7, pch = seq_len(ng), col = NULL, lty = "solid", lwd = 1, hull = TRUE, ellipse = FALSE, conf, segments = TRUE, seg.col = "grey", seg.lty = lty, seg.lwd = lwd, label = TRUE, label.cex = 1, ylab, xlab, main, sub, ...) ## S3 method for class 'betadisper' boxplot(x, ylab = "Distance to centroid", ...) ## S3 method for class 'betadisper' TukeyHSD(x, which = "group", ordered = FALSE, conf.level = 0.95, ...) ## S3 method for class 'betadisper' print(x, digits = max(3, getOption("digits") - 3), neigen = 8, ...)
betadisper(d, group, type = c("median","centroid"), bias.adjust = FALSE, sqrt.dist = FALSE, add = FALSE) ## S3 method for class 'betadisper' anova(object, ...) ## S3 method for class 'betadisper' scores(x, display = c("sites", "centroids"), choices = c(1,2), ...) ## S3 method for class 'betadisper' eigenvals(x, ...) ## S3 method for class 'betadisper' plot(x, axes = c(1,2), cex = 0.7, pch = seq_len(ng), col = NULL, lty = "solid", lwd = 1, hull = TRUE, ellipse = FALSE, conf, segments = TRUE, seg.col = "grey", seg.lty = lty, seg.lwd = lwd, label = TRUE, label.cex = 1, ylab, xlab, main, sub, ...) ## S3 method for class 'betadisper' boxplot(x, ylab = "Distance to centroid", ...) ## S3 method for class 'betadisper' TukeyHSD(x, which = "group", ordered = FALSE, conf.level = 0.95, ...) ## S3 method for class 'betadisper' print(x, digits = max(3, getOption("digits") - 3), neigen = 8, ...)
d |
a distance structure such as that returned by
|
group |
vector describing the group structure, usually a factor
or an object that can be coerced to a factor using
|
type |
the type of analysis to perform. Use the spatial median or the group centroid? The spatial median is now the default. |
bias.adjust |
logical: adjust for small sample bias in beta diversity estimates? |
sqrt.dist |
Take square root of dissimilarities. This often euclidifies dissimilarities. |
add |
Add a constant to the non-diagonal dissimilarities such
that all eigenvalues are non-negative in the underlying Principal
Co-ordinates Analysis (see |
display |
character; partial match to access scores for
|
object , x
|
an object of class |
choices , axes
|
the principal coordinate axes wanted. |
hull |
logical; should the convex hull for each group be plotted? |
ellipse |
logical; should the standard deviation data ellipse for each group be plotted? |
conf |
Expected fractions of data coverage for data ellipses,
e.g. 0.95. The default is to draw a 1 standard deviation data
ellipse, but if supplied, |
pch |
plot symbols for the groups, a vector of length equal to the number of groups. |
col |
colors for the plot symbols and centroid labels for the groups, a vector of length equal to the number of groups. |
lty , lwd
|
linetype, linewidth for convex hulls and confidence ellipses. |
segments |
logical; should segments joining points to their centroid be drawn? |
seg.col |
colour to draw segments between points and their centroid. Can be a vector, in which case one colour per group. |
seg.lty , seg.lwd
|
linetype and line width for segments. |
label |
logical; should the centroids by labelled with their respective factor label? |
label.cex |
numeric; character expansion for centroid labels. |
cex , ylab , xlab , main , sub
|
graphical parameters. For details,
see |
which |
A character vector listing terms in the fitted model for which the intervals should be calculated. Defaults to the grouping factor. |
ordered |
logical; see |
conf.level |
A numeric value between zero and one giving the family-wise confidence level to use. |
digits , neigen
|
numeric; for the |
... |
arguments, including graphical parameters (for
|
One measure of multivariate dispersion (variance) for a group of samples is to calculate the average distance of group members to the group centroid or spatial median (both referred to as 'centroid' from now on unless stated otherwise) in multivariate space. To test if the dispersions (variances) of one or more groups are different, the distances of group members to the group centroid are subject to ANOVA. This is a multivariate analogue of Levene's test for homogeneity of variances if the distances between group members and group centroids is the Euclidean distance.
However, better measures of distance than the Euclidean distance are available for ecological data. These can be accommodated by reducing the distances produced using any dissimilarity coefficient to principal coordinates, which embeds them within a Euclidean space. The analysis then proceeds by calculating the Euclidean distances between group members and the group centroid on the basis of the principal coordinate axes rather than the original distances.
Non-metric dissimilarity coefficients can produce principal coordinate axes that have negative Eigenvalues. These correspond to the imaginary, non-metric part of the distance between objects. If negative Eigenvalues are produced, we must correct for these imaginary distances.
The distance to its centroid of a point is
where
is the squared Euclidean distance between
, the principal coordinate for the
th
point in the
th group, and
, the
coordinate of the centroid for the
th group. The
super-scripted ‘
’ and ‘
’ indicate the
real and imaginary parts respectively. This is equation (3) in
Anderson (2006). If the imaginary part is greater in magnitude than
the real part, then we would be taking the square root of a negative
value, resulting in NaN, and these cases are changed to zero distances
(with a warning). This is in line with the behaviour of Marti Anderson's
PERMDISP2 programme.
To test if one or more groups is more variable than the others, ANOVA
of the distances to group centroids can be performed and parametric
theory used to interpret the significance of . An alternative is to
use a permutation test.
permutest.betadisper
permutes model
residuals to generate a permutation distribution of under the Null
hypothesis of no difference in dispersion between groups.
Pairwise comparisons of group mean dispersions can also be performed
using permutest.betadisper
. An alternative to the classical
comparison of group dispersions, is to calculate Tukey's Honest
Significant Differences between groups, via
TukeyHSD.betadisper
. This is a simple wrapper to
TukeyHSD
. The user is directed to read the help file
for TukeyHSD
before using this function. In particular,
note the statement about using the function with
unbalanced designs.
The results of the analysis can be visualised using the plot
and boxplot
methods. The distances of points to all centroids
(group
) can be found with function centredist
.
One additional use of these functions is in assessing beta diversity
(Anderson et al 2006). Function betadiver
provides some popular dissimilarity measures for this purpose.
As noted in passing by Anderson (2006) and in a related
context by O'Neill (2000), estimates of dispersion around a
central location (median or centroid) that is calculated from the same data
will be biased downward. This bias matters most when comparing diversity
among treatments with small, unequal numbers of samples. Setting
bias.adjust=TRUE
when using betadisper
imposes a
correction (Stier et al. 2013).
The anova
method returns an object of class "anova"
inheriting from class "data.frame"
.
The scores
method returns a list with one or both of the
components "sites"
and "centroids"
.
The plot
function invisibly returns an object of class
"ordiplot"
, a plotting structure which can be used by
identify.ordiplot
(to identify the points) or other
functions in the ordiplot
family.
The boxplot
function invisibly returns a list whose components
are documented in boxplot
.
eigenvals.betadisper
returns a named vector of eigenvalues.
TukeyHSD.betadisper
returns a list. See TukeyHSD
for further details.
betadisper
returns a list of class "betadisper"
with the
following components:
eig |
numeric; the eigenvalues of the principal coordinates analysis. |
vectors |
matrix; the eigenvectors of the principal coordinates analysis. |
distances |
numeric; the Euclidean distances in principal coordinate space between the samples and their respective group centroid or median. |
group |
factor; vector describing the group structure |
centroids |
matrix; the locations of the group centroids or medians on the principal coordinates. |
group.distances |
numeric; the mean distance to each group centroid or median. |
call |
the matched function call. |
Stewart Schultz noticed that the permutation test for
type="centroid"
had the wrong type I error and was
anti-conservative. As such, the default for type
has been
changed to "median"
, which uses the spatial median as the group
centroid. Tests suggests that the permutation test for this type of
analysis gives the correct error rates.
If group
consists of a single level or group, then the
anova
and permutest
methods are not appropriate and if
used on such data will stop with an error.
Missing values in either d
or group
will be removed
prior to performing the analysis.
Gavin L. Simpson; bias correction by Adrian Stier and Ben Bolker.
Anderson, M.J. (2006) Distance-based tests for homogeneity of multivariate dispersions. Biometrics 62, 245–253.
Anderson, M.J., Ellingsen, K.E. & McArdle, B.H. (2006) Multivariate dispersion as a measure of beta diversity. Ecology Letters 9, 683–693.
O'Neill, M.E. (2000) A Weighted Least Squares Approach to Levene's Test of Homogeneity of Variance. Australian & New Zealand Journal of Statistics 42, 81-–100.
Stier, A.C., Geange, S.W., Hanson, K.M., & Bolker, B.M. (2013) Predator density and timing of arrival affect reef fish community assembly. Ecology 94, 1057–1068.
permutest.betadisper
, centredist
,
anova.lm
,
scores
, boxplot
,
TukeyHSD
. Further measure of beta diversity
can be found in betadiver
.
data(varespec) ## Bray-Curtis distances between samples dis <- vegdist(varespec) ## First 16 sites grazed, remaining 8 sites ungrazed groups <- factor(c(rep(1,16), rep(2,8)), labels = c("grazed","ungrazed")) ## Calculate multivariate dispersions mod <- betadisper(dis, groups) mod ## Perform test anova(mod) ## Permutation test for F permutest(mod, pairwise = TRUE, permutations = 99) ## Tukey's Honest Significant Differences (mod.HSD <- TukeyHSD(mod)) plot(mod.HSD) ## Plot the groups and distances to centroids on the ## first two PCoA axes plot(mod) ## with data ellipses instead of hulls plot(mod, ellipse = TRUE, hull = FALSE) # 1 sd data ellipse plot(mod, ellipse = TRUE, hull = FALSE, conf = 0.90) # 90% data ellipse # plot with manual colour specification my_cols <- c("#1b9e77", "#7570b3") plot(mod, col = my_cols, pch = c(16,17), cex = 1.1) ## can also specify which axes to plot, ordering respected plot(mod, axes = c(3,1), seg.col = "forestgreen", seg.lty = "dashed") ## Draw a boxplot of the distances to centroid for each group boxplot(mod) ## `scores` and `eigenvals` also work scrs <- scores(mod) str(scrs) head(scores(mod, 1:4, display = "sites")) # group centroids/medians scores(mod, 1:4, display = "centroids") # eigenvalues from the underlying principal coordinates analysis eigenvals(mod) ## try out bias correction; compare with mod3 (mod3B <- betadisper(dis, groups, type = "median", bias.adjust=TRUE)) anova(mod3B) permutest(mod3B, permutations = 99) ## should always work for a single group group <- factor(rep("grazed", NROW(varespec))) (tmp <- betadisper(dis, group, type = "median")) (tmp <- betadisper(dis, group, type = "centroid")) ## simulate missing values in 'd' and 'group' ## using spatial medians groups[c(2,20)] <- NA dis[c(2, 20)] <- NA mod2 <- betadisper(dis, groups) ## messages mod2 permutest(mod2, permutations = 99) anova(mod2) plot(mod2) boxplot(mod2) plot(TukeyHSD(mod2)) ## Using group centroids mod3 <- betadisper(dis, groups, type = "centroid") mod3 permutest(mod3, permutations = 99) anova(mod3) plot(mod3) boxplot(mod3) plot(TukeyHSD(mod3))
data(varespec) ## Bray-Curtis distances between samples dis <- vegdist(varespec) ## First 16 sites grazed, remaining 8 sites ungrazed groups <- factor(c(rep(1,16), rep(2,8)), labels = c("grazed","ungrazed")) ## Calculate multivariate dispersions mod <- betadisper(dis, groups) mod ## Perform test anova(mod) ## Permutation test for F permutest(mod, pairwise = TRUE, permutations = 99) ## Tukey's Honest Significant Differences (mod.HSD <- TukeyHSD(mod)) plot(mod.HSD) ## Plot the groups and distances to centroids on the ## first two PCoA axes plot(mod) ## with data ellipses instead of hulls plot(mod, ellipse = TRUE, hull = FALSE) # 1 sd data ellipse plot(mod, ellipse = TRUE, hull = FALSE, conf = 0.90) # 90% data ellipse # plot with manual colour specification my_cols <- c("#1b9e77", "#7570b3") plot(mod, col = my_cols, pch = c(16,17), cex = 1.1) ## can also specify which axes to plot, ordering respected plot(mod, axes = c(3,1), seg.col = "forestgreen", seg.lty = "dashed") ## Draw a boxplot of the distances to centroid for each group boxplot(mod) ## `scores` and `eigenvals` also work scrs <- scores(mod) str(scrs) head(scores(mod, 1:4, display = "sites")) # group centroids/medians scores(mod, 1:4, display = "centroids") # eigenvalues from the underlying principal coordinates analysis eigenvals(mod) ## try out bias correction; compare with mod3 (mod3B <- betadisper(dis, groups, type = "median", bias.adjust=TRUE)) anova(mod3B) permutest(mod3B, permutations = 99) ## should always work for a single group group <- factor(rep("grazed", NROW(varespec))) (tmp <- betadisper(dis, group, type = "median")) (tmp <- betadisper(dis, group, type = "centroid")) ## simulate missing values in 'd' and 'group' ## using spatial medians groups[c(2,20)] <- NA dis[c(2, 20)] <- NA mod2 <- betadisper(dis, groups) ## messages mod2 permutest(mod2, permutations = 99) anova(mod2) plot(mod2) boxplot(mod2) plot(TukeyHSD(mod2)) ## Using group centroids mod3 <- betadisper(dis, groups, type = "centroid") mod3 permutest(mod3, permutations = 99) anova(mod3) plot(mod3) boxplot(mod3) plot(TukeyHSD(mod3))
The function estimates any of the 24 indices of beta diversity reviewed by Koleff et al. (2003). Alternatively, it finds the co-occurrence frequencies for triangular plots (Koleff et al. 2003).
betadiver(x, method = NA, order = FALSE, help = FALSE, ...) ## S3 method for class 'betadiver' plot(x, ...) ## S3 method for class 'betadiver' scores(x, triangular = TRUE, ...)
betadiver(x, method = NA, order = FALSE, help = FALSE, ...) ## S3 method for class 'betadiver' plot(x, ...) ## S3 method for class 'betadiver' scores(x, triangular = TRUE, ...)
x |
Community data matrix, or the |
method |
The index of beta diversity as defined in Koleff et al.
(2003), Table 1. You can use either the subscript of |
order |
Order sites by increasing number of species. This will influence the configuration in the triangular plot and non-symmetric indices. |
help |
Show the numbers, subscript names and the defining equations of the indices and exit. |
triangular |
Return scores suitable for triangular plotting of
proportions. If |
... |
Other arguments to functions. |
The most commonly used index of beta diversity is
, where
is the total number of
species, and
is the average number of species per site
(Whittaker 1960). A drawback of this model is that
increases
with sample size, but the expectation of
remains
constant, and so the beta diversity increases with sample size. A
solution to this problem is to study the beta diversity of pairs of
sites (Marion et al. 2017). If we denote the number of species
shared between two sites as
and the numbers of unique
species (not shared) as
and
, then
and
so that
. This is the Sørensen
dissimilarity as defined in vegan function
vegdist
with argument binary = TRUE
. Many other
indices are dissimilarity indices as well.
Function betadiver
finds all indices reviewed by Koleff et
al. (2003). All these indices could be found with function
designdist
, but the current function provides a
conventional shortcut. The function only finds the indices. The proper
analysis must be done with functions such as betadisper
,
adonis2
or mantel
.
The indices are directly taken from Table 1 of Koleff et al. (2003),
and they can be selected either by the index number or the subscript
name used by Koleff et al. The numbers, names and defining equations
can be seen using betadiver(help = TRUE)
. In all cases where
there are two alternative forms, the one with the term is
used. There are several duplicate indices, and the number of distinct
alternatives is much lower than 24 formally provided. The formulations
used in functions differ occasionally from those in Koleff et
al. (2003), but they are still mathematically equivalent. With
method = NA
, no index is calculated, but instead an object of
class betadiver
is returned. This is a list of elements
a
, b
and c
. Function plot
can be used to
display the proportions of these elements in triangular plot as
suggested by Koleff et al. (2003), and scores
extracts the
triangular coordinates or the raw scores. Function plot
returns
invisibly the triangular coordinates as an "ordiplot"
object.
With method = NA
, the function returns an object of class
"betadisper"
with elements a
, b
, and c
. If
method
is specified, the function returns a "dist"
object which can be used in any function analysing
dissimilarities. For beta diversity, particularly useful functions are
betadisper
to study the betadiversity in groups,
adonis2
for any model, and mantel
to
compare beta diversities to other dissimilarities or distances
(including geographical distances). Although betadiver
returns
a "dist"
object, some indices are similarities and cannot be
used as such in place of dissimilarities, but that is a user
error. Functions 10 ("j"
), 11 ("sor"
) and 21
("rlb"
) are similarity indices. Function sets argument
"maxdist"
similarly as vegdist
, using NA
when there is no fixed upper limit, and 0 for similarities.
Some indices return similarities instead of dissimilarities.
Jari Oksanen
Baselga, A. (2010) Partitioning the turnover and nestedness components of beta diversity. Global Ecology and Biogeography 19, 134–143.
Koleff, P., Gaston, K.J. and Lennon, J.J. (2003) Measuring beta diversity for presence-absence data. Journal of Animal Ecology 72, 367–382.
Marion, Z.H., Fordyce, J.A. and Fitzpatrick, B.M. (2017) Pairwise beta diversity resolves an underappreciated source of confusion in calculating species turnover. Ecology 98, 933–939.
Whittaker, R.H. (1960) Vegetation of Siskiyou mountains, Oregon and California. Ecological Monographs 30, 279–338.
designdist
can be used to implement all these
functions, and also allows using notation with alpha
and
gamma
diversities. vegdist
has some canned
alternatives. Functions betadisper
,
adonis2
and mantel
can be used for
analysing beta diversity objects. The returned dissimilarities can
be used in any distance-based methods, such as
metaMDS
, capscale
and
dbrda
. Functions nestedbetasor
and
nestedbetajac
implement decomposition beta diversity
measures (Sørensen and Jaccard) into turnover and
nestedness components following Baselga (2010).
## Raw data and plotting data(sipoo) m <- betadiver(sipoo) plot(m) ## The indices betadiver(help=TRUE) ## The basic Whittaker index d <- betadiver(sipoo, "w") ## This should be equal to Sorensen index (binary Bray-Curtis in ## vegan) range(d - vegdist(sipoo, binary=TRUE))
## Raw data and plotting data(sipoo) m <- betadiver(sipoo) plot(m) ## The indices betadiver(help=TRUE) ## The basic Whittaker index d <- betadiver(sipoo, "w") ## This should be equal to Sorensen index (binary Bray-Curtis in ## vegan) range(d - vegdist(sipoo, binary=TRUE))
This function computes coefficients of dispersal direction between geographically connected areas, as defined by Legendre and Legendre (1984), and also described in Legendre and Legendre (2012, section 13.3.4).
bgdispersal(mat, PAonly = FALSE, abc = FALSE)
bgdispersal(mat, PAonly = FALSE, abc = FALSE)
mat |
Data frame or matrix containing a community composition data table (species presence-absence or abundance data). |
PAonly |
|
abc |
If |
The signs of the DD coefficients indicate the direction of dispersal, provided that the asymmetry is significant. A positive sign indicates dispersal from the first (row in DD tables) to the second region (column); a negative sign indicates the opposite. A McNemar test of asymmetry is computed from the presence-absence data to test the hypothesis of a significant asymmetry between the two areas under comparison.
In the input data table, the rows are sites or
areas, the columns are taxa. Most often, the taxa
are species, but the coefficients can be computed
from genera or families as well. DD1 and DD2 only
are computed for presence-absence data. The four
types of coefficients are computed for
quantitative data, which are converted to
presence-absence for the computation of DD1 and
DD2. PAonly = FALSE
indicates that the four types
of coefficients are requested. PAonly = TRUE
if DD1
and DD2 only are sought.
Function bgdispersal
returns a list containing the following matrices:
DD1 |
|
DD2 |
|
DD3 |
|
DD4 |
|
McNemar |
McNemar chi-square statistic of asymmetry (Sokal and
Rohlf 1995):
|
prob.McNemar |
probabilities associated
with McNemar statistics, chi-square test. H0: no
asymmetry in |
The function uses a more powerful alternative for the McNemar test
than the classical formula. The classical formula was constructed in
the spirit of Pearson's Chi-square, but the formula in this function
was constructed in the spirit of Wilks Chi-square or the
statistic. Function
mcnemar.test
uses the classical
formula. The new formula was introduced in vegan version
1.10-11, and the older implementations of bgdispersal
used the
classical formula.
Pierre Legendre, Departement de Sciences Biologiques, Universite de Montreal
Legendre, P. and V. Legendre. 1984. Postglacial dispersal of freshwater fishes in the Québec peninsula. Can. J. Fish. Aquat. Sci. 41: 1781-1802.
Legendre, P. and L. Legendre. 2012. Numerical ecology, 3rd English edition. Elsevier Science BV, Amsterdam.
Sokal, R. R. and F. J. Rohlf. 1995. Biometry. The principles and practice of statistics in biological research. 3rd edn. W. H. Freeman, New York.
mat <- matrix(c(32,15,14,10,70,30,100,4,10,30,25,0,18,0,40, 0,0,20,0,0,0,0,4,0,30,20,0,0,0,0,25,74,42,1,45,89,5,16,16,20), 4, 10, byrow=TRUE) bgdispersal(mat)
mat <- matrix(c(32,15,14,10,70,30,100,4,10,30,25,0,18,0,40, 0,0,20,0,0,0,0,4,0,30,20,0,0,0,0,25,74,42,1,45,89,5,16,16,20), 4, 10, byrow=TRUE) bgdispersal(mat)
Function finds the best subset of environmental variables, so that the Euclidean distances of scaled environmental variables have the maximum (rank) correlation with community dissimilarities.
## Default S3 method: bioenv(comm, env, method = "spearman", index = "bray", upto = ncol(env), trace = FALSE, partial = NULL, metric = c("euclidean", "mahalanobis", "manhattan", "gower"), parallel = getOption("mc.cores"), ...) ## S3 method for class 'formula' bioenv(formula, data, ...) bioenvdist(x, which = "best")
## Default S3 method: bioenv(comm, env, method = "spearman", index = "bray", upto = ncol(env), trace = FALSE, partial = NULL, metric = c("euclidean", "mahalanobis", "manhattan", "gower"), parallel = getOption("mc.cores"), ...) ## S3 method for class 'formula' bioenv(formula, data, ...) bioenvdist(x, which = "best")
comm |
Community data frame or a dissimilarity object or a square matrix that can be interpreted as dissimilarities. |
env |
Data frame of continuous environmental variables. |
method |
The correlation method used in |
index |
The dissimilarity index used for community data ( |
upto |
Maximum number of parameters in studied subsets. |
formula , data
|
Model |
trace |
Trace the calculations |
partial |
Dissimilarities partialled out when inspecting
variables in |
metric |
Metric used for distances of environmental distances. See Details. |
parallel |
Number of parallel processes or a predefined socket
cluster. With |
x |
|
which |
The number of the model for which the environmental
distances are evaluated, or the |
... |
Other arguments passed to |
The function calculates a community dissimilarity matrix using
vegdist
. Then it selects all possible subsets of
environmental variables, scale
s the variables, and
calculates Euclidean distances for this subset using
dist
. The function finds the correlation between
community dissimilarities and environmental distances, and for each
size of subsets, saves the best result. There are
subsets of
variables, and an exhaustive search may take a
very, very, very long time (parameter
upto
offers a partial
relief).
The argument metric
defines distances in the given set of
environmental variables. With metric = "euclidean"
, the
variables are scaled to unit variance and Euclidean distances are
calculated. With metric = "mahalanobis"
, the Mahalanobis
distances are calculated: in addition to scaling to unit variance,
the matrix of the current set of environmental variables is also
made orthogonal (uncorrelated). With metric = "manhattan"
,
the variables are scaled to unit range and Manhattan distances are
calculated, so that the distances are sums of differences of
environmental variables. With metric = "gower"
, the Gower
distances are calculated using function
daisy
. This allows also using factor
variables, but with continuous variables the results are equal to
metric = "manhattan"
.
The function can be called with a model formula
where
the LHS is the data matrix and RHS lists the environmental variables.
The formula interface is practical in selecting or transforming
environmental variables.
With argument partial
you can perform “partial”
analysis. The partializing item must be a dissimilarity object of
class dist
. The
partial
item can be used with any correlation method
,
but it is strictly correct only for Pearson.
Function bioenvdist
recalculates the environmental distances
used within the function. The default is to calculate distances for
the best model, but the number of any model can be given.
Clarke & Ainsworth (1993) suggested this method to be used for selecting the best subset of environmental variables in interpreting results of nonmetric multidimensional scaling (NMDS). They recommended a parallel display of NMDS of community dissimilarities and NMDS of Euclidean distances from the best subset of scaled environmental variables. They warned against the use of Procrustes analysis, but to me this looks like a good way of comparing these two ordinations.
Clarke & Ainsworth wrote a computer program BIO-ENV giving the name to the current function. Presumably BIO-ENV was later incorporated in Clarke's PRIMER software (available for Windows). In addition, Clarke & Ainsworth suggested a novel method of rank correlation which is not available in the current function.
The function returns an object of class bioenv
with a
summary
method.
If you want to study the ‘significance’ of bioenv
results, you can use function mantel
or
mantel.partial
which use the same definition of
correlation. However, bioenv
standardizes environmental
variables depending on the used metric, and you must do the same in
mantel
for comparable results (the standardized data are
returned as item x
in the result object). It is safest to use
bioenvdist
to extract the environmental distances that really
were used within bioenv
. NB., bioenv
selects variables
to maximize the Mantel correlation, and significance tests based on
a priori selection of variables are biased.
Jari Oksanen
Clarke, K. R & Ainsworth, M. 1993. A method of linking multivariate community structure to environmental variables. Marine Ecology Progress Series, 92, 205–219.
vegdist
, dist
, cor
for underlying routines, monoMDS
and
metaMDS
for ordination, procrustes
for
Procrustes analysis, protest
for an alternative, and
rankindex
for studying alternatives to the default
Bray-Curtis index.
# The method is very slow for large number of possible subsets. # Therefore only 6 variables in this example. data(varespec) data(varechem) sol <- bioenv(wisconsin(varespec) ~ log(N) + P + K + Ca + pH + Al, varechem) sol ## IGNORE_RDIFF_BEGIN summary(sol) ## IGNORE_RDIFF_END
# The method is very slow for large number of possible subsets. # Therefore only 6 variables in this example. data(varespec) data(varechem) sol <- bioenv(wisconsin(varespec) ~ log(N) + P + K + Ca + pH + Al, varechem) sol ## IGNORE_RDIFF_BEGIN summary(sol) ## IGNORE_RDIFF_END
Draws a PCA biplot with species scores indicated by biplot arrows
## S3 method for class 'rda' biplot(x, choices = c(1, 2), scaling = "species", display = c("sites", "species"), type, xlim, ylim, col = c(1,2), const, correlation = FALSE, ...)
## S3 method for class 'rda' biplot(x, choices = c(1, 2), scaling = "species", display = c("sites", "species"), type, xlim, ylim, col = c(1,2), const, correlation = FALSE, ...)
x |
A |
choices |
Axes to show. |
scaling |
Scaling for species and site scores. Either species
( The type of scores can also be specified as one of |
correlation |
logical; if |
display |
Scores shown. These must some of the alternatives
|
type |
Type of plot: partial match to |
xlim , ylim
|
the x and y limits (min, max) of the plot. |
col |
Colours used for sites and species (in this order). If only one colour is given, it is used for both. |
const |
General scaling constant for |
... |
Other parameters for plotting functions. |
Produces a plot or biplot of the results of a call to
rda
. It is common for the "species" scores in a PCA to
be drawn as biplot arrows that point in the direction of increasing
values for that variable. The biplot.rda
function provides a
wrapper to plot.cca
to allow the easy production of such a
plot.
biplot.rda
is only suitable for unconstrained models. If
used on an ordination object with constraints, an error is issued.
If species scores are drawn using "text"
, the arrows are drawn
from the origin to 0.85 * species score, whilst the labels are
drawn at the species score. If the type used is "points"
, then
no labels are drawn and therefore the arrows are drawn from the origin
to the actual species score.
The plot
function returns invisibly a plotting structure which
can be used by identify.ordiplot
to identify
the points or other functions in the ordiplot
family.
Gavin Simpson, based on plot.cca
by Jari Oksanen.
plot.cca
, rda
for something to
plot, ordiplot
for an alternative plotting routine
and more support functions, and text
,
points
and arrows
for the basic routines.
data(dune) mod <- rda(dune, scale = TRUE) biplot(mod, scaling = "symmetric") ## plot.cca can do the same plot(mod, scaling = "symmetric", spe.par = list(arrows=TRUE)) ## different type for species and site scores biplot(mod, scaling = "symmetric", type = c("text", "points")) ## We can use ordiplot pipes to build similar plots with flexible ## control plot(mod, scaling = "symmetric", type="n") |> points("sites", cex=0.7) |> text("species", arrows=TRUE, length=0.05, col=2, cex=0.7, font=3)
data(dune) mod <- rda(dune, scale = TRUE) biplot(mod, scaling = "symmetric") ## plot.cca can do the same plot(mod, scaling = "symmetric", spe.par = list(arrows=TRUE)) ## different type for species and site scores biplot(mod, scaling = "symmetric", type = c("text", "points")) ## We can use ordiplot pipes to build similar plots with flexible ## control plot(mod, scaling = "symmetric", type="n") |> points("sites", cex=0.7) |> text("species", arrows=TRUE, length=0.05, col=2, cex=0.7, font=3)
This function is a wrapper for the kmeans
function. It creates
several partitions forming a cascade from a small to a large number of
groups.
cascadeKM(data, inf.gr, sup.gr, iter = 100, criterion = "calinski", parallel = getOption("mc.cores")) cIndexKM(y, x, index = "all") ## S3 method for class 'cascadeKM' plot(x, min.g, max.g, grpmts.plot = TRUE, sortg = FALSE, gridcol = NA, ...)
cascadeKM(data, inf.gr, sup.gr, iter = 100, criterion = "calinski", parallel = getOption("mc.cores")) cIndexKM(y, x, index = "all") ## S3 method for class 'cascadeKM' plot(x, min.g, max.g, grpmts.plot = TRUE, sortg = FALSE, gridcol = NA, ...)
data |
The data matrix. The objects (samples) are the rows. |
inf.gr |
The number of groups for the partition with the smallest number of groups of the cascade (min). |
sup.gr |
The number of groups for the partition with the largest number of groups of the cascade (max). |
iter |
The number of random starting configurations for each value
of |
criterion |
The criterion that will be used to select the best
partition. The default value is |
y |
Object of class |
x |
Data matrix where columns correspond to variables and rows to
observations, or the plotting object in |
index |
The available indices are: |
min.g , max.g
|
The minimum and maximum numbers of groups to be displayed. |
grpmts.plot |
Show the plot ( |
sortg |
Sort the objects as a function of their group membership
to produce a more easily interpretable graph. See Details. The
original object names are kept; they are used as labels in the
output table |
gridcol |
The colour of the grid lines in the plots. |
... |
Other parameters to the functions (ignored). |
parallel |
Number of parallel processes or a predefined socket
cluster. With |
The function creates several partitions forming a cascade from a small
to a large number of groups formed by kmeans
. Most
of the work is performed by function cIndex
which is based on the
clustIndex
in package cclust).
Some of the criteria were removed from this version because computation
errors were generated when only one object was found in a group.
The default value is "calinski"
, which refers to the well-known
Calinski-Harabasz (1974) criterion. The other available index is the
simple structure index "ssi"
(Dolnicar et al. 1999).
In the case of groups of equal
sizes, "calinski"
is generally a good criterion to indicate the
correct number of groups. Users should not take its indications
literally when the groups are not equal in size. Type "all"
to
obtain both indices. The indices are defined as:
, where
is the
number of data points and
is the number of clusters.
is the sum of squares within the clusters while
is the sum of squares among the clusters. This index
is simply an
(ANOVA) statistic.
the “Simple Structure Index” multiplicatively combines several elements which influence the interpretability of a partitioning solution. The best partition is indicated by the highest SSI value.
In a simulation study, Milligan and Cooper (1985) found
that the Calinski-Harabasz criterion recovered the correct number of
groups the most often. We recommend this criterion because, if the
groups are of equal sizes, the maximum value of "calinski"
usually indicates the correct number of groups. Another available
index is the simple structure index "ssi"
. Users should not
take the indications of these indices literally when the groups are
not equal in size and explore the groups corresponding to other values
of .
Function cascadeKM
has a plot
method. Two plots are
produced. The graph on the left has the objects in
abscissa and the number of groups in ordinate. The groups are
represented by colours. The graph on the right shows the values of the
criterion ("calinski"
or "ssi"
) for determining the best
partition. The highest value of the criterion is marked in red. Points
marked in orange, if any, indicate partitions producing an increase in
the criterion value as the number of groups increases; they may
represent other interesting partitions.
If sortg=TRUE
, the objects are reordered by the following
procedure: (1) a simple matching distance matrix is computed among the
objects, based on the table of K-means assignments to groups, from
=
min.g
to =
max.g
. (2) A principal
coordinate analysis (PCoA, Gower 1966) is computed on the centred
distance matrix. (3) The first principal coordinate is used as the new
order of the objects in the graph. A simplified algorithm is used to
compute the first principal coordinate only, using the iterative
algorithm described in Legendre & Legendre (2012). The
full distance matrix among objects is never computed; this avoids
the problem of storing it when the number of objects is
large. Distance values are computed as they are needed by the
algorithm.
Function cascadeKM
returns an object of class
cascadeKM
with items:
partition |
Table with the partitions found for different numbers
of groups |
results |
Values of the criterion to select the best partition. |
criterion |
The name of the criterion used. |
size |
The number of objects found in each group, for all partitions (columns). |
Function cIndex
returns a vector with the index values. The
maximum value of these indices is supposed to indicate the best
partition. These indices work best with groups of equal sizes. When
the groups are not of equal sizes, one should not put too much faith
in the maximum of these indices, and also explore the groups
corresponding to other values of .
Marie-Helene Ouellette [email protected], Sebastien Durand [email protected] and Pierre Legendre [email protected]. Parallel processing by Virgilio Gómez-Rubio. Edited for vegan by Jari Oksanen.
Calinski, T. and J. Harabasz. 1974. A dendrite method for cluster analysis. Commun. Stat. 3: 1–27.
Dolnicar, S., K. Grabler and J. A. Mazanec. 1999. A tale of three cities: perceptual charting for analyzing destination images. Pp. 39-62 in: Woodside, A. et al. [eds.] Consumer psychology of tourism, hospitality and leisure. CAB International, New York.
Gower, J. C. 1966. Some distance properties of latent root and vector methods used in multivariate analysis. Biometrika 53: 325–338.
Legendre, P. & L. Legendre. 2012. Numerical ecology, 3rd English edition. Elsevier Science BV, Amsterdam.
Milligan, G. W. & M. C. Cooper. 1985. An examination of procedures for determining the number of clusters in a data set. Psychometrika 50: 159–179.
Weingessel, A., Dimitriadou, A. and Dolnicar, S. 2002. An examination of indexes for determining the number of clusters in binary data sets. Psychometrika 67: 137–160.
# Partitioning a (10 x 10) data matrix of random numbers mat <- matrix(runif(100),10,10) res <- cascadeKM(mat, 2, 5, iter = 25, criterion = 'calinski') toto <- plot(res) # Partitioning an autocorrelated time series vec <- sort(matrix(runif(30),30,1)) res <- cascadeKM(vec, 2, 5, iter = 25, criterion = 'calinski') toto <- plot(res) # Partitioning a large autocorrelated time series # Note that we remove the grid lines vec <- sort(matrix(runif(1000),1000,1)) res <- cascadeKM(vec, 2, 7, iter = 10, criterion = 'calinski') toto <- plot(res, gridcol=NA)
# Partitioning a (10 x 10) data matrix of random numbers mat <- matrix(runif(100),10,10) res <- cascadeKM(mat, 2, 5, iter = 25, criterion = 'calinski') toto <- plot(res) # Partitioning an autocorrelated time series vec <- sort(matrix(runif(30),30,1)) res <- cascadeKM(vec, 2, 5, iter = 25, criterion = 'calinski') toto <- plot(res) # Partitioning a large autocorrelated time series # Note that we remove the grid lines vec <- sort(matrix(runif(1000),1000,1)) res <- cascadeKM(vec, 2, 7, iter = 10, criterion = 'calinski') toto <- plot(res, gridcol=NA)
Function cca
performs correspondence analysis, or optionally
constrained correspondence analysis (a.k.a. canonical correspondence
analysis), or optionally partial constrained correspondence
analysis. Function rda
performs redundancy analysis, or
optionally principal components analysis.
These are all very popular ordination techniques in community ecology.
## S3 method for class 'formula' cca(formula, data, na.action = na.fail, subset = NULL, ...) ## S3 method for class 'formula' rda(formula, data, scale=FALSE, na.action = na.fail, subset = NULL, ...) ## Default S3 method: cca(X, Y, Z, ...) ## Default S3 method: rda(X, Y, Z, scale=FALSE, ...) ca(X, ...) pca(X, scale=FALSE, ...)
## S3 method for class 'formula' cca(formula, data, na.action = na.fail, subset = NULL, ...) ## S3 method for class 'formula' rda(formula, data, scale=FALSE, na.action = na.fail, subset = NULL, ...) ## Default S3 method: cca(X, Y, Z, ...) ## Default S3 method: rda(X, Y, Z, scale=FALSE, ...) ca(X, ...) pca(X, scale=FALSE, ...)
formula |
Model formula, where the left hand side gives the
community data matrix, right hand side gives the constraining variables,
and conditioning variables can be given within a special function
|
data |
Data frame containing the variables on the right hand side of the model formula. |
X |
Community data matrix. |
Y |
Constraining matrix, typically of environmental variables.
Can be missing. If this is a |
Z |
Conditioning matrix, the effect of which is removed
(“partialled out”) before next step. Can be missing. If this is a
|
scale |
Scale species to unit variance (like correlations). |
na.action |
Handling of missing values in constraints or
conditions. The default ( |
subset |
Subset of data rows. This can be a logical vector which
is |
... |
Other arguments for |
Since their introduction (ter Braak 1986), constrained, or canonical,
correspondence analysis and its spin-off, redundancy analysis, have
been the most popular ordination methods in community ecology.
Functions cca
and rda
are similar to popular
proprietary software Canoco
, although the implementation is
completely different. The functions are based on Legendre &
Legendre's (2012) algorithm: in cca
Chi-square transformed data matrix is subjected to weighted linear
regression on constraining variables, and the fitted values are
submitted to correspondence analysis performed via singular value
decomposition (svd
). Function rda
is similar, but uses
ordinary, unweighted linear regression and unweighted SVD. Legendre &
Legendre (2012), Table 11.5 (p. 650) give a skeleton of the RDA
algorithm of vegan. The algorithm of CCA is similar, but
involves standardization by row and column weights.
The functions cca()
and rda()
can be called either with
matrix-like entries for community data and constraints, or with formula
interface. In general, the formula interface is preferred, because it
allows a better control of the model and allows factor constraints. Some
analyses of ordination results are only possible if model was fitted
with formula (e.g., most cases of anova.cca
, automatic
model building).
In the following sections, X
, Y
and Z
, although
referred to as matrices, are more commonly data frames.
In the matrix interface, the
community data matrix X
must be given, but the other data
matrices may be omitted, and the corresponding stage of analysis is
skipped. If matrix Z
is supplied, its effects are removed from
the community matrix, and the residual matrix is submitted to the next
stage. This is called partial correspondence or redundancy
analysis. If matrix
Y
is supplied, it is used to constrain the ordination,
resulting in constrained or canonical correspondence analysis, or
redundancy analysis.
Finally, the residual is submitted to ordinary correspondence
analysis (or principal components analysis). If both matrices
Z
and Y
are missing, the
data matrix is analysed by ordinary correspondence analysis (or
principal components analysis).
Instead of separate matrices, the model can be defined using a model
formula
. The left hand side must be the
community data matrix (X
). The right hand side defines the
constraining model.
The constraints can contain ordered or unordered factors,
interactions among variables and functions of variables. The defined
contrasts
are honoured in factor
variables. The constraints can also be matrices (but not data
frames).
The formula can include a special term Condition
for conditioning variables (“covariables”) partialled out before
analysis. So the following commands are equivalent:
cca(X, Y, Z)
, cca(X ~ Y + Condition(Z))
, where Y
and Z
refer to constraints and conditions matrices respectively.
Constrained correspondence analysis is indeed a constrained method:
CCA does not try to display all variation in the
data, but only the part that can be explained by the used constraints.
Consequently, the results are strongly dependent on the set of
constraints and their transformations or interactions among the
constraints. The shotgun method is to use all environmental variables
as constraints. However, such exploratory problems are better
analysed with
unconstrained methods such as correspondence analysis
(decorana
, corresp
) or non-metric
multidimensional scaling (metaMDS
) and
environmental interpretation after analysis
(envfit
, ordisurf
).
CCA is a good choice if the user has
clear and strong a priori hypotheses on constraints and is not
interested in the major structure in the data set.
CCA is able to correct the curve artefact commonly found in correspondence analysis by forcing the configuration into linear constraints. However, the curve artefact can be avoided only with a low number of constraints that do not have a curvilinear relation with each other. The curve can reappear even with two badly chosen constraints or a single factor. Although the formula interface makes it easy to include polynomial or interaction terms, such terms often produce curved artefacts (that are difficult to interpret), these should probably be avoided.
According to folklore, rda
should be used with “short
gradients” rather than cca
. However, this is not based
on research which finds methods based on Euclidean metric as uniformly
weaker than those based on Chi-squared metric. However, standardized
Euclidean distance may be an appropriate measures (see Hellinger
standardization in decostand
in particular).
Partial CCA (pCCA; or alternatively partial RDA) can be used to remove
the effect of some
conditioning or background or random variables or
covariables before CCA proper. In fact, pCCA compares models
cca(X ~ Z)
and cca(X ~ Y + Z)
and attributes their
difference to the effect of Y
cleansed of the effect of
Z
. Some people have used the method for extracting
“components of variance” in CCA. However, if the effect of
variables together is stronger than sum of both separately, this can
increase total Chi-square after partialling out some
variation, and give negative “components of variance”. In general,
such components of “variance” are not to be trusted due to
interactions between two sets of variables.
The unconstrained ordination methods, Principal Components Analysis (PCA) and
Correspondence Analysis (CA), may be performed using pca()
and
ca()
, which are simple wrappers around rda()
and cca()
,
respectively. Functions pca()
and ca()
can only be called with
matrix-like objects.
The functions have summary
and plot
methods which are
documented separately (see plot.cca
, summary.cca
).
Function cca
returns a huge object of class cca
, which
is described separately in cca.object
.
Function rda
returns an object of class rda
which
inherits from class cca
and is described in cca.object
.
The scaling used in rda
scores is described in a separate
vignette with this package.
Functions pca()
and ca()
return objects of class
"vegan_pca"
and "vegan_ca"
respectively to avoid
clashes with other packages. These classes inherit from "rda"
and "cca"
respectively.
The responsible author was Jari Oksanen, but the code borrows heavily from Dave Roberts (Montana State University, USA).
The original method was by ter Braak, but the current implementation follows Legendre and Legendre.
Legendre, P. and Legendre, L. (2012) Numerical Ecology. 3rd English ed. Elsevier.
McCune, B. (1997) Influence of noisy environmental data on canonical correspondence analysis. Ecology 78, 2617-2623.
Palmer, M. W. (1993) Putting things in even better order: The advantages of canonical correspondence analysis. Ecology 74,2215-2230.
Ter Braak, C. J. F. (1986) Canonical Correspondence Analysis: a new eigenvector technique for multivariate direct gradient analysis. Ecology 67, 1167-1179.
This help page describes two constrained ordination functions,
cca
and rda
and their corresponding unconstrained
ordination functions, ca
and pca
. A related method,
distance-based redundancy analysis (dbRDA) is described separately
(capscale
), as is dbRDA's unconstrained variant,
principal coordinates analysis (PCO). All these functions return
similar objects (described in cca.object
). There are
numerous support functions that can be used to access the result object.
In the list below, functions of type cca
will handle all three
constrained ordination objects, and functions of rda
only handle
rda
and capscale
results.
The main plotting functions are plot.cca
for all
methods, and biplot.rda
for RDA and dbRDA. However,
generic vegan plotting functions can also handle the results.
The scores can be accessed and scaled with scores.cca
,
and summarized with summary.cca
. The eigenvalues can
be accessed with eigenvals.cca
and the regression
coefficients for constraints with coef.cca
. The
eigenvalues can be plotted with screeplot.cca
, and the
(adjusted) can be found with
RsquareAdj.rda
. The scores can be also calculated for
new data sets with predict.cca
which allows adding
points to ordinations. The values of constraints can be inferred
from ordination and community composition with
calibrate.cca
.
Diagnostic statistics can be found with goodness.cca
,
inertcomp
, spenvcor
,
intersetcor
, tolerance.cca
, and
vif.cca
. Function as.mlm.cca
refits the
result object as a multiple lm
object, and this allows
finding influence statistics (lm.influence
,
cooks.distance
etc.).
Permutation based significance for the overall model, single
constraining variables or axes can be found with
anova.cca
. Automatic model building with R
step
function is possible with
deviance.cca
, add1.cca
and
drop1.cca
. Functions ordistep
and
ordiR2step
(for RDA) are special functions for
constrained ordination. Randomized data sets can be generated with
simulate.cca
.
Separate methods based on constrained ordination model are principal
response curves (prc
) and variance partitioning between
several components (varpart
).
Design decisions are explained in vignette
on “Design decisions” which can be accessed with
browseVignettes("vegan")
.
data(varespec) data(varechem) ## Common but bad way: use all variables you happen to have in your ## environmental data matrix vare.cca <- cca(varespec, varechem) vare.cca plot(vare.cca) ## Formula interface and a better model vare.cca <- cca(varespec ~ Al + P*(K + Baresoil), data=varechem) vare.cca plot(vare.cca) ## Partialling out and negative components of variance cca(varespec ~ Ca, varechem) cca(varespec ~ Ca + Condition(pH), varechem) ## RDA data(dune) data(dune.env) dune.Manure <- rda(dune ~ Manure, dune.env) plot(dune.Manure)
data(varespec) data(varechem) ## Common but bad way: use all variables you happen to have in your ## environmental data matrix vare.cca <- cca(varespec, varechem) vare.cca plot(vare.cca) ## Formula interface and a better model vare.cca <- cca(varespec ~ Al + P*(K + Baresoil), data=varechem) vare.cca plot(vare.cca) ## Partialling out and negative components of variance cca(varespec ~ Ca, varechem) cca(varespec ~ Ca + Condition(pH), varechem) ## RDA data(dune) data(dune.env) dune.Manure <- rda(dune ~ Manure, dune.env) plot(dune.Manure)
Ordination methods cca
, rda
,
dbrda
and capscale
return similar result
objects. All these methods use the same internal function
ordConstrained
. They differ only in (1) initial
transformation of the data and in defining inertia, (2) weighting,
and (3) the use of rectangular rows columns data or
symmetric rows
rows dissimilarities:
rda
initializes data to give variance or correlations
as inertia, cca
is based on double-standardized data
to give Chi-square inertia and uses row and column weights,
capscale
maps the real part of dissimilarities to
rectangular data and performs RDA, and dbrda
performs
an RDA-like analysis directly on symmetric dissimilarities.
Function ordConstrained
returns the same result components
for all these methods, and the calling function may add some more
components to the final result. However, you should not access these
result components directly (using $
): the internal structure
is not regarded as stable application interface (API), but it can
change at any release. If you access the results components
directly, you take a risk of breakage at any vegan release.
The vegan provides a wide set of accessor functions to those
components, and these functions are updated when the result object
changes. This documentation gives an overview of accessor functions
to the cca
result object.
ordiYbar(x, model = c("CCA", "CA", "pCCA", "partial", "initial")) ## S3 method for class 'cca' model.frame(formula, ...) ## S3 method for class 'cca' model.matrix(object, ...) ## S3 method for class 'cca' weights(object, display = "sites", ...)
ordiYbar(x, model = c("CCA", "CA", "pCCA", "partial", "initial")) ## S3 method for class 'cca' model.frame(formula, ...) ## S3 method for class 'cca' model.matrix(object, ...) ## S3 method for class 'cca' weights(object, display = "sites", ...)
object , x , formula
|
|
model |
Show constrained ( |
display |
Display either |
... |
Other arguments passed to the the function. |
The internal (“working”) form of the dependent (community)
data can be accessed with function ordiYbar
. The form depends
on the ordination method: for instance, in cca
the
data are weighted and Chi-square transformed, and in
dbrda
they are Gower-centred dissimilarities. The
input data in the original (“response”) form can be accessed
with fitted.cca
and residuals.cca
.
Function predict.cca
can return either working or
response data, and also their lower-rank approximations.
The model matrix of independent data (“Constraints” and
“Conditions”) can be extracted with model.matrix
. In
partial analysis, the function returns a list of design matrices
called Conditions
and Constraints
. If either component
was missing, a single matrix is returned. The redundant (aliased)
terms do not appear in the model matrix. These terms can be found
with alias.cca
. Function model.frame
tries to
reconstruct the data frame from which the model matrices were
derived. This is only possible if the original model was fitted with
formula
and data
arguments, and still fails if the
data
are unavailable.
The number of observations can be accessed with
nobs.cca
, and the residual degrees of freedom with
df.residual.cca
. The information on observations with
missing values can be accessed with na.action
. The
terms and formula of the fitted model can be accessed with
formula
and terms
.
The weights used in cca
can be accessed with
weights
. In unweighted methods (rda
) all
weights are equal.
The ordination results are saved in separate components for partial terms, constraints and residual unconstrained ordination. There is no guarantee that these components will have the same internal names as currently, and you should be cautious when developing scripts and functions that directly access these components.
The constrained ordination algorithm is based on QR decomposition of
constraints and conditions (environmental data), and the QR
component is saved separately for partial and constrained
components. The QR decomposition of constraints can be accessed
with qr.cca
. This will also include the residual
effects of partial terms (Conditions), and it should be used
together with ordiYbar(x, "partial")
. The environmental data
are first centred in rda
or weighted and centred in
cca
. The QR decomposition is used in many functions that
access cca
results, and it can be used to find many items
that are not directly stored in the object. For examples, see
coef.cca
, coef.rda
,
vif.cca
, permutest.cca
,
predict.cca
, predict.rda
,
calibrate.cca
. See qr
for other possible
uses of this component. For instance, the rank of the constraints
can be found from the QR decomposition.
The eigenvalues of the solution can be accessed with
eigenvals.cca
. Eigenvalues are not evaluated for
partial component, and they will only be available for constrained
and residual components.
The ordination scores are internally stored as (weighted)
orthonormal scores matrices. These results can be accessed with
scores.cca
and scores.rda
functions. The
ordination scores are scaled when accessed with scores
functions, but internal (weighted) orthonormal scores can be
accessed by setting scaling = FALSE
. Unconstrained residual
component has species and site scores, and constrained component has
also fitted site scores or linear combination scores for sites and
biplot scores and centroids for constraint variables. The biplot
scores correspond to the model.matrix
, and centroids
are calculated for factor variables when they were used. The scores
can be selected by defining the axes, and there is no direct way of
accessing all scores of a certain component. The number of dimensions
can be assessed from eigenvals
. In addition, some
other types can be derived from the results although not saved in
the results. For instance, regression scores and model coefficients
can be accessed with scores
and coef
functions. Partial component will have no scores.
Distance-based methods (dbrda
, capscale
)
can have negative eigenvalues and associated imaginary axis scores. In
addition, species scores are initially missing in dbrda
and they are accessory and found after analysis in
capscale
(and may be misleading). Function
sppscores
can be used to add species scores or replace
them with more meaningful ones.
The latest large change of result object was made in release 2.5-1 in
2016. You can modernize ancient stray results with
modernobject <- update(ancientobject)
.
Jari Oksanen
Legendre, P. and Legendre, L. (2012) Numerical Ecology. 3rd English ed. Elsevier.
The core function is ordConstrained
which is called by
cca
, rda
, dbrda
and
capscale
. The basic class is "cca"
for all
methods, and the following functions are defined for this class:
RsquareAdj.cca
, SSD.cca
, add1.cca
, alias.cca
, anova.cca
, as.mlm.cca
, biplot.cca
, bstick.cca
, calibrate.cca
, centredist.cca
, coef.cca
, cooks.distance.cca
, deviance.cca
, df.residual.cca
, drop1.cca
, eigenvals.cca
, extractAIC.cca
, fitted.cca
, goodness.cca
, hatvalues.cca
, labels.cca
, model.frame.cca
, model.matrix.cca
, nobs.cca
, permutest.cca
, plot.cca
, points.cca
, predict.cca
, print.cca
, qr.cca
, residuals.cca
, rstandard.cca
, rstudent.cca
, scores.cca
, screeplot.cca
, sigma.cca
, simulate.cca
, stressplot.cca
, summary.cca
, text.cca
, tolerance.cca
, vcov.cca
, weights.cca
.
Other functions handling "cca"
objects include inertcomp
,
intersetcor
, mso
, ordiresids
,
ordistep
and ordiR2step
.
Canonical correlation analysis, following Brian McArdle's unpublished graduate course notes, plus improvements to allow the calculations in the case of very sparse and collinear matrices, and permutation test of Pillai's trace statistic.
CCorA(Y, X, stand.Y=FALSE, stand.X=FALSE, permutations = 0, ...) ## S3 method for class 'CCorA' biplot(x, plot.type="ov", xlabs, plot.axes = 1:2, int=0.5, col.Y="red", col.X="blue", cex=c(0.7,0.9), ...)
CCorA(Y, X, stand.Y=FALSE, stand.X=FALSE, permutations = 0, ...) ## S3 method for class 'CCorA' biplot(x, plot.type="ov", xlabs, plot.axes = 1:2, int=0.5, col.Y="red", col.X="blue", cex=c(0.7,0.9), ...)
Y |
Left matrix (object class: |
X |
Right matrix (object class: |
stand.Y |
Logical; should |
stand.X |
Logical; should |
permutations |
a list of control values for the permutations
as returned by the function |
x |
|
plot.type |
A character string indicating which of the following
plots should be produced: |
xlabs |
Row labels. The default is to use row names, |
plot.axes |
A vector with 2 values containing the order numbers of the canonical axes to be plotted. Default: first two axes. |
int |
Radius of the inner circles plotted as visual references in
the plots of the variables. Default: |
col.Y |
Color used for objects and variables in the first data table (Y) plots. In biplots, the objects are in black. |
col.X |
Color used for objects and variables in the second data table (X) plots. |
cex |
A vector with 2 values containing the size reduction factors
for the object and variable names, respectively, in the plots.
Default values: |
... |
Other arguments passed to these functions. The function
|
Canonical correlation analysis (Hotelling 1936) seeks linear
combinations of the variables of Y
that are maximally
correlated to linear combinations of the variables of X
. The
analysis estimates the relationships and displays them in graphs.
Pillai's trace statistic is computed and tested parametrically (F-test);
a permutation test is also available.
Algorithmic note –
The blunt approach would be to read the two matrices, compute the
covariance matrices, then the matrix
S12 %*% inv(S22) %*% t(S12) %*% inv(S11)
.
Its trace is Pillai's trace statistic.
This approach may fail, however, when there is heavy multicollinearity
in very sparse data matrices. The safe approach is to replace all data
matrices by their PCA object scores.
The function can produce different types of plots depending on the option
chosen:
"objects"
produces two plots of the objects, one in the space
of Y, the second in the space of X;
"variables"
produces two plots of the variables, one of the variables
of Y in the space of Y, the second of the variables of X in the space of X;
"ov"
produces four plots, two of the objects and two of the variables;
"biplots"
produces two biplots, one for the first matrix (Y) and
one for second matrix (X) solutions. For biplots, the function passes all arguments
to biplot.default
; consult its help page for configuring biplots.
Function CCorA
returns a list containing the following elements:
Pillai |
Pillai's trace statistic = sum of the canonical eigenvalues. |
Eigenvalues |
Canonical eigenvalues. They are the squares of the canonical correlations. |
CanCorr |
Canonical correlations. |
Mat.ranks |
Ranks of matrices |
RDA.Rsquares |
Bimultivariate redundancy coefficients (R-squares) of RDAs of Y|X and X|Y. |
RDA.adj.Rsq |
|
nperm |
Number of permutations. |
p.Pillai |
Parametric probability value associated with Pillai's trace. |
p.perm |
Permutational probability associated with Pillai's trace. |
Cy |
Object scores in Y biplot. |
Cx |
Object scores in X biplot. |
corr.Y.Cy |
Scores of Y variables in Y biplot, computed as cor(Y,Cy). |
corr.X.Cx |
Scores of X variables in X biplot, computed as cor(X,Cx). |
corr.Y.Cx |
cor(Y,Cy) available for plotting variables Y in space of X manually. |
corr.X.Cy |
cor(X,Cx) available for plotting variables X in space of Y manually. |
control |
A list of control values for the permutations
as returned by the function |
call |
Call to the CCorA function. |
Pierre Legendre, Departement de Sciences Biologiques, Universite de Montreal. Implemented in vegan with the help of Jari Oksanen.
Hotelling, H. 1936. Relations between two sets of variates. Biometrika 28: 321-377.
Legendre, P. 2005. Species associations: the Kendall coefficient of concordance revisited. Journal of Agricultural, Biological, and Environmental Statistics 10: 226-245.
# Example using two mite groups. The mite data are available in vegan data(mite) # Two mite species associations (Legendre 2005, Fig. 4) group.1 <- c(1,2,4:8,10:15,17,19:22,24,26:30) group.2 <- c(3,9,16,18,23,25,31:35) # Separate Hellinger transformations of the two groups of species mite.hel.1 <- decostand(mite[,group.1], "hel") mite.hel.2 <- decostand(mite[,group.2], "hel") rownames(mite.hel.1) = paste("S",1:nrow(mite),sep="") rownames(mite.hel.2) = paste("S",1:nrow(mite),sep="") out <- CCorA(mite.hel.1, mite.hel.2) out biplot(out, "ob") # Two plots of objects biplot(out, "v", cex=c(0.7,0.6)) # Two plots of variables biplot(out, "ov", cex=c(0.7,0.6)) # Four plots (2 for objects, 2 for variables) biplot(out, "b", cex=c(0.7,0.6)) # Two biplots biplot(out, xlabs = NA, plot.axes = c(3,5)) # Plot axes 3, 5. No object names biplot(out, plot.type="biplots", xlabs = NULL) # Replace object names by numbers # Example using random numbers. No significant relationship is expected mat1 <- matrix(rnorm(60),20,3) mat2 <- matrix(rnorm(100),20,5) out2 = CCorA(mat1, mat2, permutations=99) out2 biplot(out2, "b")
# Example using two mite groups. The mite data are available in vegan data(mite) # Two mite species associations (Legendre 2005, Fig. 4) group.1 <- c(1,2,4:8,10:15,17,19:22,24,26:30) group.2 <- c(3,9,16,18,23,25,31:35) # Separate Hellinger transformations of the two groups of species mite.hel.1 <- decostand(mite[,group.1], "hel") mite.hel.2 <- decostand(mite[,group.2], "hel") rownames(mite.hel.1) = paste("S",1:nrow(mite),sep="") rownames(mite.hel.2) = paste("S",1:nrow(mite),sep="") out <- CCorA(mite.hel.1, mite.hel.2) out biplot(out, "ob") # Two plots of objects biplot(out, "v", cex=c(0.7,0.6)) # Two plots of variables biplot(out, "ov", cex=c(0.7,0.6)) # Four plots (2 for objects, 2 for variables) biplot(out, "b", cex=c(0.7,0.6)) # Two biplots biplot(out, xlabs = NA, plot.axes = c(3,5)) # Plot axes 3, 5. No object names biplot(out, plot.type="biplots", xlabs = NULL) # Replace object names by numbers # Example using random numbers. No significant relationship is expected mat1 <- matrix(rnorm(60),20,3) mat2 <- matrix(rnorm(100),20,5) out2 = CCorA(mat1, mat2, permutations=99) out2 biplot(out2, "b")
Function finds Euclidean or squared Mahalanobis distances of points
to class centroids in betadisper
or ordination results.
## Default S3 method: centredist(x, group, distance = c("euclidean", "mahalanobis"), display = "sites", w, ...) ## S3 method for class 'betadisper' centredist(x, ...) ## S3 method for class 'cca' centredist(x, group, distance = c("euclidean", "mahalanobis"), display = c("sites", "species"), rank = 2, ...) ## S3 method for class 'rda' centredist(x, group, distance = c("euclidean", "mahalanobis"), display = "sites", rank = 2, ...) ## S3 method for class 'dbrda' centredist(x, group, distance = c("euclidean","mahalanobis"), display = "sites", rank = 2, ...) ## S3 method for class 'wcmdscale' centredist(x, group, distance= c("euclidean","mahalanobis"), display = "sites", rank = 2, ...)
## Default S3 method: centredist(x, group, distance = c("euclidean", "mahalanobis"), display = "sites", w, ...) ## S3 method for class 'betadisper' centredist(x, ...) ## S3 method for class 'cca' centredist(x, group, distance = c("euclidean", "mahalanobis"), display = c("sites", "species"), rank = 2, ...) ## S3 method for class 'rda' centredist(x, group, distance = c("euclidean", "mahalanobis"), display = "sites", rank = 2, ...) ## S3 method for class 'dbrda' centredist(x, group, distance = c("euclidean","mahalanobis"), display = "sites", rank = 2, ...) ## S3 method for class 'wcmdscale' centredist(x, group, distance= c("euclidean","mahalanobis"), display = "sites", rank = 2, ...)
x |
An ordination result object, or a matrix for the default
method or a |
group |
Factor to define classes for which centroids are found. |
distance |
Use either simple Euclidean distance or squared Mahalanobis distances that take into account the shape of covariance ellipses. |
display |
Kind of |
w |
Weights. If |
rank |
Number of axes used in ordination methods. |
... |
Other arguments to functions (ignored). |
Function finds either simple Euclidean distances of all points to
their centroid as given by argument group
or squared
Mahalanobis distances of points to the centroid. In addition to the
distances, it returns information of the original group and the
nearest group for each point.
Currently the distances are scaled so that the sum of squared
Euclidean distance will give the residual (unexplained) eigenvalue
in constrained ordination with given group. The distances may not
correspond to the distances in ordination plots with specific
scaling
and setting argument scaling
gives an
error. Due to restriction of this scaling, Euclidean distances for
species scores are only allowed in ca
and
cca
. These restrictions may be relaxed in the future
versions of the method.
Squared Mahalanobis distances (mahalanobis
) are scaled
by the within group covariance matrix, and they are consistent with
ordiellipse
.
Mahalanobis distances can only be calculated for group
that
are larger than rank
. For rank = 2
, minimum
permissible group
has three points. However, even this can
fail in degenerate cases, such as three points on a line. Euclidean
distances can be calculated for any rank
. With rank = "full"
distance-based methods wcmdscale
, pco
,
dbrda
and betadisper
will handle
negative eigenvalues and associated eigenvectors, but with numeric
rank they will only use real eigenvectors with positive eigenvalues.
There are specific methods for some ordination models, but many will
be handled by the default method (including metaMDS
and monoMDS
).
The specific method for betadisper
ignores most
arguments and calculates the distances as specified in model
fitting. The betadisper
result with argument
type = "centroid"
is equal to wcmdscale
,
pco
or dbrda
with the same
group
and rank = "full"
.
Function returns a list with components
centre |
Centre to which a point was originally allocated. |
nearest |
Nearest centre to a point. |
distances |
Distance matrix where each observation point is a row, and column gives the distance of the point to that centre. |
centerdist
is a synonym of centredist
.
Jari Oksanen
betadisper
, goodness.cca
,
mahalanobis
and various ordination methods.
data(dune, dune.env) d <- vegdist(dune) # Bray-Curtis mod <- with(dune.env, betadisper(d, Management)) centredist(mod) ## equal to metric scaling when using centroids instead of medians modc <- with(dune.env, betadisper(d, Management, type = "centroid")) ord <- pco(d) all.equal(centredist(modc), centredist(ord, dune.env$Management, rank="full"), check.attributes = FALSE)
data(dune, dune.env) d <- vegdist(dune) # Bray-Curtis mod <- with(dune.env, betadisper(d, Management)) centredist(mod) ## equal to metric scaling when using centroids instead of medians modc <- with(dune.env, betadisper(d, Management, type = "centroid")) ord <- pco(d) all.equal(centredist(modc), centredist(ord, dune.env$Management, rank="full"), check.attributes = FALSE)
The CLAM statistical approach for classifying generalists and specialists in two distinct habitats is described in Chazdon et al. (2011).
clamtest(comm, groups, coverage.limit = 10, specialization = 2/3, npoints = 20, alpha = 0.05/20) ## S3 method for class 'clamtest' summary(object, ...) ## S3 method for class 'clamtest' plot(x, xlab, ylab, main, pch = 21:24, col.points = 1:4, col.lines = 2:4, lty = 1:3, position = "bottomright", ...)
clamtest(comm, groups, coverage.limit = 10, specialization = 2/3, npoints = 20, alpha = 0.05/20) ## S3 method for class 'clamtest' summary(object, ...) ## S3 method for class 'clamtest' plot(x, xlab, ylab, main, pch = 21:24, col.points = 1:4, col.lines = 2:4, lty = 1:3, position = "bottomright", ...)
comm |
Community matrix, consisting of counts. |
groups |
A vector identifying the two habitats. Must have exactly
two unique values or levels. Habitat IDs in the grouping vector
must match corresponding rows in the community matrix |
coverage.limit |
Integer, the sample coverage based correction
is applied to rare species with counts below this limit.
Sample coverage is calculated separately
for the two habitats. Sample relative abundances are used for species
with higher than or equal to |
specialization |
Numeric, specialization threshold value between 0 and 1.
The value of |
npoints |
Integer, number of points used to determine the boundary lines in the plots. |
alpha |
Numeric, nominal significance level for individual
tests. The default value reduces the conventional limit of
|
x , object
|
Fitted model object of class |
xlab , ylab
|
Labels for the plot axes. |
main |
Main title of the plot. |
pch , col.points
|
Symbols and colors used in plotting species groups. |
lty , col.lines
|
Line types and colors for boundary lines in plot to separate species groups. |
position |
Position of figure legend, see |
... |
Additional arguments passed to methods. |
The method uses a multinomial model based on estimated
species relative abundance in two habitats (A, B). It minimizes bias
due to differences in sampling intensities between two habitat types
as well as bias due to insufficient sampling within each
habitat. The method permits a robust statistical classification of
habitat specialists and generalists, without excluding rare species
a priori (Chazdon et al. 2011). Based on a user-defined
specialization
threshold, the model classifies species into
one of four groups: (1) generalists; (2) habitat A specialists; (3)
habitat B specialists; and (4) too rare to classify with confidence.
A data frame (with class attribute "clamtest"
),
with columns:
Species: |
species name (column names from |
Total_*A*: |
total count in habitat A, |
Total_*B*: |
total count in habitat B, |
Classes: |
species classification, a factor with
levels |
*A*
and *B*
are placeholders for habitat names/labels found in the
data.
The summary
method returns descriptive statistics of the results.
The plot
method returns values invisibly and produces a bivariate
scatterplot of species total abundances in the two habitats. Symbols and
boundary lines are shown for species groups.
The code was tested against standalone CLAM software provided on the website of Anne Chao (which were then at http://chao.stat.nthu.edu.tw/wordpress); minor inconsistencies were found, especially for finding the threshold for 'too rare' species. These inconsistencies are probably due to numerical differences between the two implementation. The current R implementation uses root finding for iso-lines instead of iterative search.
The original method (Chazdon et al. 2011) has two major problems:
It assumes that the error distribution is multinomial. This is a justified choice if individuals are freely distributed, and there is no over-dispersion or clustering of individuals. In most ecological data, the variance is much higher than multinomial assumption, and therefore test statistic are too optimistic.
The original authors suggest that multiple testing adjustment
for multiple testing should be based on the number of points
(npoints
) used to draw the critical lines on the plot,
whereas the adjustment should be based on the number of tests (i.e.,
tested species). The function uses the same numerical values as
the original paper, but there is no automatic connection between
npoints
and alpha
arguments, but you must work out
the adjustment yourself.
Peter Solymos [email protected]
Chazdon, R. L., Chao, A., Colwell, R. K., Lin, S.-Y., Norden, N., Letcher, S. G., Clark, D. B., Finegan, B. and Arroyo J. P.(2011). A novel statistical method for classifying habitat generalists and specialists. Ecology 92, 1332–1343.
data(mite) data(mite.env) sol <- with(mite.env, clamtest(mite, Shrub=="None", alpha=0.005)) summary(sol) head(sol) plot(sol)
data(mite) data(mite.env) sol <- with(mite.env, clamtest(mite, Shrub=="None", alpha=0.005)) summary(sol) head(sol) plot(sol)
The commsim
function can be used to feed Null Model algorithms into
nullmodel
analysis.
The make.commsim
function returns various predefined algorithm types
(see Details).
These functions represent low level interface for community null model
infrastructure in vegan with the intent of extensibility,
and less emphasis on direct use by users.
commsim(method, fun, binary, isSeq, mode) make.commsim(method) ## S3 method for class 'commsim' print(x, ...)
commsim(method, fun, binary, isSeq, mode) make.commsim(method) ## S3 method for class 'commsim' print(x, ...)
method |
Character, name of the algorithm. |
fun |
A function. For possible formal arguments of this function see Details. |
binary |
Logical, if the algorithm applies to presence-absence or count matrices. |
isSeq |
Logical, if the algorithm is sequential (needs burnin and thinning) or not. |
mode |
Character, storage mode of the community matrix, either
|
x |
An object of class |
... |
Additional arguments. |
The function fun
must return an array of dim(nr, nc, n)
,
and must take some of the following arguments:
x
: input matrix,
n
: number of permuted matrices in output,
nr
: number of rows,
nc
: number of columns,
rs
: vector of row sums,
cs
: vector of column sums,
rf
: vector of row frequencies (non-zero cells),
cf
: vector of column frequencies (non-zero cells),
s
: total sum of x
,
fill
: matrix fill (non-zero cells),
thin
: thinning value for sequential algorithms,
...
: additional arguments.
You can define your own null model, but
several null model algorithm are pre-defined and can be called by
their name. The predefined algorithms are described in detail in the
following chapters. The binary null models produce matrices of zeros
(absences) and ones (presences) also when input matrix is
quantitative. There are two types of quantitative data: Counts are
integers with a natural unit so that individuals can be shuffled, but
abundances can have real (floating point) values and do not have a
natural subunit for shuffling. All quantitative models can handle
counts, but only some are able to handle real values. Some of the null
models are sequential so that the next matrix is derived from the
current one. This makes models dependent from previous models, and usually
you must thin these matrices and study the sequences for stability:
see oecosimu
for details and instructions.
See Examples for structural constraints imposed by each algorithm and defining your own null model.
An object of class commsim
with elements
corresponding to the arguments (method
, binary
,
isSeq
, mode
, fun
).
If the input of make.comsimm
is a commsim
object,
it is returned without further evaluation. If this is not the case,
the character method
argument is matched against
predefined algorithm names. An error message is issued
if none such is found. If the method
argument is missing,
the function returns names of all currently available
null model algorithms as a character vector.
All binary null models preserve fill: number of presences or
conversely the number of absences. The classic models may also
preserve column (species) frequencies (c0
) or row frequencies
or species richness of each site (r0
) and take into account
commonness and rarity of species (r1
, r2
). Algorithms
swap
, tswap
, curveball
, quasiswap
and
backtracking
preserve both row and column frequencies. Three
first ones are sequential but the two latter are non-sequential
and produce independent matrices. Basic algorithms are reviewed by
Wright et al. (1998).
"r00"
: non-sequential algorithm for binary matrices that only preserves the number of presences (fill).
"r0"
: non-sequential algorithm for binary matrices that preserves the site (row) frequencies.
"r1"
: non-sequential algorithm for binary matrices that preserves the site (row) frequencies, but uses column marginal frequencies as probabilities of selecting species.
"r2"
: non-sequential algorithm for binary matrices that preserves the site (row) frequencies, and uses squared column marginal frequencies as as probabilities of selecting species.
"c0"
: non-sequential algorithm for binary matrices that preserves species frequencies (Jonsson 2001).
"swap"
: sequential algorithm for binary matrices that
changes the matrix structure, but does not influence marginal sums
(Gotelli & Entsminger 2003). This inspects submatrices so long that a swap can be done.
"tswap"
: sequential algorithm for binary matrices.
Same as the "swap"
algorithm, but it tries a fixed
number of times and performs zero to many swaps at one step
(according to the thin argument in the call). This
approach was suggested by Miklós & Podani (2004)
because they found that ordinary swap may lead to biased
sequences, since some columns or rows are more easily swapped.
"curveball"
: sequential method for binary matrices that implements the ‘Curveball’ algorithm of Strona et al. (2014). The algorithm selects two random rows and finds the set of unique species that occur only in one of these rows. The algorithm distributes the set of unique species to rows preserving the original row frequencies. Zero to several species are swapped in one step, and usually the matrix is perturbed more strongly than in other sequential methods.
"quasiswap"
: non-sequential algorithm for binary
matrices that implements a method where matrix is first filled
honouring row and column totals, but with integers that may be
larger than one. Then the method inspects random
matrices and performs a quasiswap on
them. In addition to ordinary swaps, quasiswap can reduce numbers
above one to ones preserving marginal totals (Miklós &
Podani 2004). The method is non-sequential, but it accepts
thin
argument: the convergence is checked at every
thin
steps. This allows performing several ordinary swaps in
addition to fill changing swaps which helps in reducing or removing
the bias.
"greedyqswap"
: A greedy variant of quasiswap. In
greedy step, one element of the matrix is
taken from
elements. The greedy steps are biased, but
the method can be thinned, and only the first of
thin
steps is greedy. Even modest thinning (say thin = 20
)
removes or reduces the bias, and thin = 100
(1% greedy
steps) looks completely safe and still speeds up simulation. The
code is experimental and it is provided here for further scrutiny,
and should be tested for bias before use.
"backtracking"
: non-sequential algorithm for binary matrices that implements a filling method with constraints both for row and column frequencies (Gotelli & Entsminger 2001). The matrix is first filled randomly, but typically row and column sums are reached before all incidences are filled in. After this begins "backtracking", where some of the incidences are removed, and filling is started again, and this backtracking is done so many times that all incidences will be filled into matrix. The results may be biased and should be inspected carefully before use.
These models shuffle individuals of counts and keep marginal sums
fixed, but marginal frequencies are not preserved. Algorithm
r2dtable
uses standard R function r2dtable
also
used for simulated -values in
chisq.test
.
Algorithm quasiswap_count
uses the same, but preserves the
original fill. Typically this means increasing numbers of zero cells
and the result is zero-inflated with respect to r2dtable
.
"r2dtable"
: non-sequential algorithm for count
matrices. This algorithm keeps matrix sum and row/column sums
constant. Based on r2dtable
.
"quasiswap_count"
: non-sequential algorithm for count
matrices. This algorithm is similar as Carsten Dormann's
swap.web
function in the package
bipartite. First, a random matrix is generated by the
r2dtable
function preserving row and column sums. Then
the original matrix fill is reconstructed by sequential steps to
increase or decrease matrix fill in the random matrix. These steps
are based on swapping submatrices (see
"swap_count"
algorithm for details) to maintain row and
column totals.
Quantitative swap models are similar to binary swap
, but they
swap the largest permissible value. The models in this section all
maintain the fill and perform a quantitative swap only if this can
be done without changing the fill. Single step of swap often changes
the matrix very little. In particular, if cell counts are variable,
high values change very slowly. Checking the chain stability and
independence is even more crucial than in binary swap, and very
strong thin
ning is often needed. These models should never be
used without inspecting their properties for the current data. These
null models can also be defined using permatswap
function.
"swap_count"
: sequential algorithm for count matrices.
This algorithm find submatrices that can be
swapped leaving column and row totals and fill unchanged. The
algorithm finds the largest value in the submatrix that can be
swapped (
). Swap means that the values in diagonal or
antidiagonal positions are decreased by
, while remaining
cells are increased by
. A swap is made only if fill does not
change.
"abuswap_r"
: sequential algorithm for count or
nonnegative real valued matrices with fixed row frequencies (see
also permatswap
). The algorithm is similar to
swap_count
, but uses different swap value for each row of the
submatrix. Each step changes the the
corresponding column sums, but honours matrix fill, row sums, and
row/column frequencies (Hardy 2008; randomization scheme 2x).
"abuswap_c"
: sequential algorithm for count or
nonnegative real valued matrices with fixed column frequencies
(see also permatswap
). The algorithm is similar as
the previous one, but operates on columns. Each step changes the
the corresponding row sums, but honours matrix fill, column sums,
and row/column frequencies (Hardy 2008; randomization scheme 3x).
Quantitative Swap and Shuffle methods (swsh
methods) preserve
fill and column and row frequencies, and also either row or column
sums. The methods first perform a binary quasiswap
and then
shuffle original quantitative data to non-zero cells. The
samp
methods shuffle original non-zero cell values and can be
used also with non-integer data. The both
methods
redistribute individuals randomly among non-zero cells and can only
be used with integer data. The shuffling is either free over the
whole matrix, or within rows (r
methods) or within columns
(c
methods). Shuffling within a row preserves row sums, and
shuffling within a column preserves column sums. These models can
also be defined with permatswap
.
"swsh_samp"
: non-sequential algorithm for quantitative data (either integer counts or non-integer values). Original non-zero values values are shuffled.
"swsh_both"
: non-sequential algorithm for count data. Individuals are shuffled freely over non-zero cells.
"swsh_samp_r"
: non-sequential algorithm for quantitative data. Non-zero values (samples) are shuffled separately for each row.
"swsh_samp_c"
: non-sequential algorithm for quantitative data. Non-zero values (samples) are shuffled separately for each column.
"swsh_both_r"
: non-sequential algorithm for count matrices. Individuals are shuffled freely for non-zero values within each row.
"swsh_both_c"
: non-sequential algorithm for count matrices. Individuals are shuffled freely for non-zero values with each column.
Quantitative shuffle methods are generalizations of binary models
r00
, r0
and c0
. The _ind
methods
shuffle individuals so that the grand sum, row sum or column sums
are preserved. These methods are similar as r2dtable
but
with still slacker constraints on marginal sums. The _samp
and _both
methods first apply the corresponding binary model
with similar restriction on marginal frequencies and then distribute
quantitative values over non-zero cells. The _samp
models
shuffle original cell values and can therefore handle also non-count
real values. The _both
models shuffle individuals among
non-zero values. The shuffling is over the whole matrix in
r00_
, and within row in r0_
and within column in
c0_
in all cases.
"r00_ind"
: non-sequential algorithm for count matrices. This algorithm preserves grand sum and individuals are shuffled among cells of the matrix.
"r0_ind"
: non-sequential algorithm for count matrices. This algorithm preserves row sums and individuals are shuffled among cells of each row of the matrix.
"c0_ind"
: non-sequential algorithm for count matrices. This algorithm preserves column sums and individuals are shuffled among cells of each column of the matrix.
"r00_samp"
: non-sequential algorithm for count
or nonnegative real valued (mode = "double"
) matrices.
This algorithm preserves grand sum and
cells of the matrix are shuffled.
"r0_samp"
: non-sequential algorithm for count
or nonnegative real valued (mode = "double"
) matrices.
This algorithm preserves row sums and
cells within each row are shuffled.
"c0_samp"
: non-sequential algorithm for count
or nonnegative real valued (mode = "double"
) matrices.
This algorithm preserves column sums constant and
cells within each column are shuffled.
"r00_both"
: non-sequential algorithm for count matrices. This algorithm preserves grand sum and cells and individuals among cells of the matrix are shuffled.
"r0_both"
: non-sequential algorithm for count matrices. This algorithm preserves grand sum and cells and individuals among cells of each row are shuffled.
"c0_both"
: non-sequential algorithm for count matrices. This algorithm preserves grand sum and cells and individuals among cells of each column are shuffled.
Jari Oksanen and Peter Solymos
Gotelli, N.J. & Entsminger, N.J. (2001). Swap and fill algorithms in null model analysis: rethinking the knight's tour. Oecologia 129, 281–291.
Gotelli, N.J. & Entsminger, N.J. (2003). Swap algorithms in null model analysis. Ecology 84, 532–535.
Hardy, O. J. (2008) Testing the spatial phylogenetic structure of local communities: statistical performances of different null models and test statistics on a locally neutral community. Journal of Ecology 96, 914–926.
Jonsson, B.G. (2001) A null model for randomization tests of nestedness in species assemblages. Oecologia 127, 309–313.
Miklós, I. & Podani, J. (2004). Randomization of presence-absence matrices: comments and new algorithms. Ecology 85, 86–92.
Patefield, W. M. (1981) Algorithm AS159. An efficient method of generating r x c tables with given row and column totals. Applied Statistics 30, 91–97.
Strona, G., Nappo, D., Boccacci, F., Fattorini, S. & San-Miguel-Ayanz, J. (2014). A fast and unbiased procedure to randomize ecological binary matrices with fixed row and column totals. Nature Communications 5:4114 doi:10.1038/ncomms5114.
Wright, D.H., Patterson, B.D., Mikkelson, G.M., Cutler, A. & Atmar, W. (1998). A comparative analysis of nested subset patterns of species composition. Oecologia 113, 1–20.
See permatfull
, permatswap
for
alternative specification of quantitative null models. Function
oecosimu
gives a higher-level interface for applying null
models in hypothesis testing and analysis of models. Function
nullmodel
and simulate.nullmodel
are used to
generate arrays of simulated null model matrices.
## write the r00 algorithm f <- function(x, n, ...) array(replicate(n, sample(x)), c(dim(x), n)) (cs <- commsim("r00", fun=f, binary=TRUE, isSeq=FALSE, mode="integer")) ## retrieving the sequential swap algorithm (cs <- make.commsim("swap")) ## feeding a commsim object as argument make.commsim(cs) ## making the missing c1 model using r1 as a template ## non-sequential algorithm for binary matrices ## that preserves the species (column) frequencies, ## but uses row marginal frequencies ## as probabilities of selecting sites f <- function (x, n, nr, nc, rs, cs, ...) { out <- array(0L, c(nr, nc, n)) J <- seq_len(nc) storage.mode(rs) <- "double" for (k in seq_len(n)) for (j in J) out[sample.int(nr, cs[j], prob = rs), j, k] <- 1L out } cs <- make.commsim("r1") cs$method <- "c1" cs$fun <- f ## structural constraints diagfun <- function(x, y) { c(sum = sum(y) == sum(x), fill = sum(y > 0) == sum(x > 0), rowSums = all(rowSums(y) == rowSums(x)), colSums = all(colSums(y) == colSums(x)), rowFreq = all(rowSums(y > 0) == rowSums(x > 0)), colFreq = all(colSums(y > 0) == colSums(x > 0))) } evalfun <- function(meth, x, n) { m <- nullmodel(x, meth) y <- simulate(m, nsim=n) out <- rowMeans(sapply(1:dim(y)[3], function(i) diagfun(attr(y, "data"), y[,,i]))) z <- as.numeric(c(attr(y, "binary"), attr(y, "isSeq"), attr(y, "mode") == "double")) names(z) <- c("binary", "isSeq", "double") c(z, out) } x <- matrix(rbinom(10*12, 1, 0.5)*rpois(10*12, 3), 12, 10) algos <- make.commsim() a <- t(sapply(algos, evalfun, x=x, n=10)) print(as.table(ifelse(a==1,1,0)), zero.print = ".")
## write the r00 algorithm f <- function(x, n, ...) array(replicate(n, sample(x)), c(dim(x), n)) (cs <- commsim("r00", fun=f, binary=TRUE, isSeq=FALSE, mode="integer")) ## retrieving the sequential swap algorithm (cs <- make.commsim("swap")) ## feeding a commsim object as argument make.commsim(cs) ## making the missing c1 model using r1 as a template ## non-sequential algorithm for binary matrices ## that preserves the species (column) frequencies, ## but uses row marginal frequencies ## as probabilities of selecting sites f <- function (x, n, nr, nc, rs, cs, ...) { out <- array(0L, c(nr, nc, n)) J <- seq_len(nc) storage.mode(rs) <- "double" for (k in seq_len(n)) for (j in J) out[sample.int(nr, cs[j], prob = rs), j, k] <- 1L out } cs <- make.commsim("r1") cs$method <- "c1" cs$fun <- f ## structural constraints diagfun <- function(x, y) { c(sum = sum(y) == sum(x), fill = sum(y > 0) == sum(x > 0), rowSums = all(rowSums(y) == rowSums(x)), colSums = all(colSums(y) == colSums(x)), rowFreq = all(rowSums(y > 0) == rowSums(x > 0)), colFreq = all(colSums(y > 0) == colSums(x > 0))) } evalfun <- function(meth, x, n) { m <- nullmodel(x, meth) y <- simulate(m, nsim=n) out <- rowMeans(sapply(1:dim(y)[3], function(i) diagfun(attr(y, "data"), y[,,i]))) z <- as.numeric(c(attr(y, "binary"), attr(y, "isSeq"), attr(y, "mode") == "double")) names(z) <- c("binary", "isSeq", "double") c(z, out) } x <- matrix(rbinom(10*12, 1, 0.5)*rpois(10*12, 3), 12, 10) algos <- make.commsim() a <- t(sapply(algos, evalfun, x=x, n=10)) print(as.table(ifelse(a==1,1,0)), zero.print = ".")
The contribution diversity approach is based in the differentiation of within-unit and among-unit diversity by using additive diversity partitioning and unit distinctiveness.
contribdiv(comm, index = c("richness", "simpson"), relative = FALSE, scaled = TRUE, drop.zero = FALSE) ## S3 method for class 'contribdiv' plot(x, sub, xlab, ylab, ylim, col, ...)
contribdiv(comm, index = c("richness", "simpson"), relative = FALSE, scaled = TRUE, drop.zero = FALSE) ## S3 method for class 'contribdiv' plot(x, sub, xlab, ylab, ylim, col, ...)
comm |
The community data matrix with samples as rows and species as column. |
index |
Character, the diversity index to be calculated. |
relative |
Logical, if |
scaled |
Logical, if |
drop.zero |
Logical, should empty rows dropped from the result?
If empty rows are not dropped, their corresponding results will be |
x |
An object of class |
sub , xlab , ylab , ylim , col
|
Graphical arguments passed to plot. |
... |
Other arguments passed to plot. |
This approach was proposed by Lu et al. (2007).
Additive diversity partitioning (see adipart
for more references)
deals with the relation of mean alpha and the total (gamma) diversity. Although
alpha diversity values often vary considerably. Thus, contributions of the sites
to the total diversity are uneven. This site specific contribution is measured by
contribution diversity components. A unit that has e.g. many unique species will
contribute more to the higher level (gamma) diversity than another unit with the
same number of species, but all of which common.
Distinctiveness of species can be defined as the number of sites where it
occurs (
), or the sum of its relative frequencies (
). Relative
frequencies are computed sitewise and
s at site
sum up
to
.
The contribution of site to the total diversity is given by
when dealing with richness and
for the Simpson index.
The unit distinctiveness of site is the average of the species
distinctiveness, averaging only those species which occur at site
.
For species richness:
(in the paper, the second
equation contains a typo,
is without index). For the Simpson index:
.
The Lu et al. (2007) gives an in-depth description of the different indices.
An object of class "contribdiv"
inheriting from data frame.
Returned values are alpha, beta and gamma components for each sites (rows)
of the community matrix. The "diff.coef"
attribute gives the
differentiation coefficient (see Examples).
Péter Sólymos, [email protected]
Lu, H. P., Wagner, H. H. and Chen, X. Y. 2007. A contribution diversity approach to evaluate species diversity. Basic and Applied Ecology, 8, 1–12.
## Artificial example given in ## Table 2 in Lu et al. 2007 x <- matrix(c( 1/3,1/3,1/3,0,0,0, 0,0,1/3,1/3,1/3,0, 0,0,0,1/3,1/3,1/3), 3, 6, byrow = TRUE, dimnames = list(LETTERS[1:3],letters[1:6])) x ## Compare results with Table 2 contribdiv(x, "richness") contribdiv(x, "simpson") ## Relative contribution (C values), compare with Table 2 (cd1 <- contribdiv(x, "richness", relative = TRUE, scaled = FALSE)) (cd2 <- contribdiv(x, "simpson", relative = TRUE, scaled = FALSE)) ## Differentiation coefficients attr(cd1, "diff.coef") # D_ST attr(cd2, "diff.coef") # D_DT ## BCI data set data(BCI) opar <- par(mfrow=c(2,2)) plot(contribdiv(BCI, "richness"), main = "Absolute") plot(contribdiv(BCI, "richness", relative = TRUE), main = "Relative") plot(contribdiv(BCI, "simpson")) plot(contribdiv(BCI, "simpson", relative = TRUE)) par(opar)
## Artificial example given in ## Table 2 in Lu et al. 2007 x <- matrix(c( 1/3,1/3,1/3,0,0,0, 0,0,1/3,1/3,1/3,0, 0,0,0,1/3,1/3,1/3), 3, 6, byrow = TRUE, dimnames = list(LETTERS[1:3],letters[1:6])) x ## Compare results with Table 2 contribdiv(x, "richness") contribdiv(x, "simpson") ## Relative contribution (C values), compare with Table 2 (cd1 <- contribdiv(x, "richness", relative = TRUE, scaled = FALSE)) (cd2 <- contribdiv(x, "simpson", relative = TRUE, scaled = FALSE)) ## Differentiation coefficients attr(cd1, "diff.coef") # D_ST attr(cd2, "diff.coef") # D_DT ## BCI data set data(BCI) opar <- par(mfrow=c(2,2)) plot(contribdiv(BCI, "richness"), main = "Absolute") plot(contribdiv(BCI, "richness", relative = TRUE), main = "Relative") plot(contribdiv(BCI, "simpson")) plot(contribdiv(BCI, "simpson", relative = TRUE)) par(opar)
Distance-based redundancy analysis (dbRDA) is an ordination method
similar to Redundancy Analysis (rda
), but it allows
non-Euclidean dissimilarity indices, such as Manhattan or
Bray-Curtis distance. Despite this non-Euclidean feature, the
analysis is strictly linear and metric. If called with Euclidean
distance, the results are identical to rda
, but dbRDA
will be less efficient. Functions dbrda
is constrained
versions of metric scaling, a.k.a. principal coordinates analysis,
which are based on the Euclidean distance but can be used, and are
more useful, with other dissimilarity measures. Function
capscale
is a simplified version based on Euclidean
approximation of dissimilarities. The functions can also perform
unconstrained principal coordinates analysis (PCO), optionally using
extended dissimilarities. pco()
is a wrapper to dbrda()
,
which performs PCO.
dbrda(formula, data, distance = "euclidean", sqrt.dist = FALSE, add = FALSE, dfun = vegdist, metaMDSdist = FALSE, na.action = na.fail, subset = NULL, ...) capscale(formula, data, distance = "euclidean", sqrt.dist = FALSE, comm = NULL, add = FALSE, dfun = vegdist, metaMDSdist = FALSE, na.action = na.fail, subset = NULL, ...) pco(X, ...)
dbrda(formula, data, distance = "euclidean", sqrt.dist = FALSE, add = FALSE, dfun = vegdist, metaMDSdist = FALSE, na.action = na.fail, subset = NULL, ...) capscale(formula, data, distance = "euclidean", sqrt.dist = FALSE, comm = NULL, add = FALSE, dfun = vegdist, metaMDSdist = FALSE, na.action = na.fail, subset = NULL, ...) pco(X, ...)
formula |
Model formula. The function can be called only with the
formula interface. Most usual features of |
X |
Community data matrix. |
data |
Data frame containing the variables on the right hand side of the model formula. |
distance |
The name of the dissimilarity (or distance) index if
the LHS of the |
sqrt.dist |
Take square roots of dissimilarities. See section
|
comm |
Community data frame which will be used for finding
species scores when the LHS of the |
add |
Add a constant to the non-diagonal dissimilarities such
that all eigenvalues are non-negative in the underlying Principal
Co-ordinates Analysis (see |
dfun |
Distance or dissimilarity function used. Any function
returning standard |
metaMDSdist |
Use |
na.action |
Handling of missing values in constraints or
conditions. The default ( |
subset |
Subset of data rows. This can be a logical vector
which is |
... |
Other parameters passed to underlying functions (e.g.,
|
Functions dbrda
and capscale
provide two alternative
implementations of dbRDA. Function dbrda
is based on McArdle
& Anderson (2001) and directly decomposes dissimilarities. With
Euclidean distances results are identical to rda
.
Non-Euclidean dissimilarities may give negative eigenvalues
associated with imaginary axes. Function capscale
is based on
Legendre & Anderson (1999): the dissimilarity data are first
ordinated using metric scaling, and the ordination results are
analysed as rda
. capscale
ignores the imaginary
component and will not give negative eigenvalues (but will report
the magnitude on imaginary component).
If the user supplied a community data frame instead of
dissimilarities, the functions will find dissimilarities using
vegdist
or distance function given in dfun
with
specified distance
. The functions will accept distance
objects from vegdist
, dist
, or any other
method producing compatible objects. The constraining variables can be
continuous or factors or both, they can have interaction terms, or
they can be transformed in the call. Moreover, there can be a
special term Condition
just like in rda
and
cca
so that “partial” analysis can be performed.
Function dbrda
does not return species scores, and they can
also be missing in capscale
, but they can be added after the
analysis using function sppscores
.
Non-Euclidean dissimilarities can produce negative eigenvalues
(Legendre & Anderson 1999, McArdle & Anderson 2001). If there are
negative eigenvalues, the printed output of capscale
will add
a column with sums of positive eigenvalues and an item of sum of
negative eigenvalues, and dbrda
will add a column giving the
number of real dimensions with positive eigenvalues. If negative
eigenvalues are disturbing, functions let you distort the
dissimilarities so that only non-negative eigenvalues will be
produced with argument add = TRUE
. Alternatively, with
sqrt.dist = TRUE
, square roots of dissimilarities can be
used which may help in avoiding negative eigenvalues (Legendre &
Anderson 1999).
The functions can be also used to perform ordinary metric scaling
a.k.a. principal coordinates analysis by using a formula with only a
constant on the right hand side, or comm ~ 1
. The new function
pco()
implements principal coordinates analysis via
dbrda()
directly, using this formula. With
metaMDSdist = TRUE
, the function can do automatic data
standardization and use extended dissimilarities using function
stepacross
similarly as in non-metric multidimensional
scaling with metaMDS
.
The functions return an object of class dbrda
or
capscale
which inherit from rda
. See
cca.object
for description of the result object. Function
pco()
returns an object of class "vegan_pco"
(which
inherits from class "dbrda"
) to avoid clashes with other packages.
Function dbrda
implements real distance-based RDA and is
preferred over capscale
. capscale
was originally
developed as a variant of constrained analysis of proximities
(Anderson & Willis 2003), but these developments made it more
similar to dbRDA. However, it discards the imaginary dimensions with
negative eigenvalues and ordination and significance tests area only
based on real dimensions and positive eigenvalues. capscale
may be removed from vegan in the future. It has been in
vegan
since 2003 (CRAN release 1.6-0) while dbrda
was
first released in 2016 (version 2.4-0), and removal of
capscale
may be disruptive to historical examples and
scripts, but in modern times dbrda
should be used.
The inertia is named after the dissimilarity index as defined in the
dissimilarity data, or as unknown distance
if such
information is missing. If the largest original dissimilarity was
larger than 4, capscale
handles input similarly as rda
and bases its analysis on variance instead of sum of
squares. Keyword mean
is added to the inertia in these cases,
e.g. with Euclidean and Manhattan distances. Inertia is based on
squared index, and keyword squared
is added to the name of
distance, unless data were square root transformed (argument
sqrt.dist=TRUE
). If an additive constant was used with
argument add
, Lingoes
or Cailliez adjusted
is
added to the the name of inertia, and the value of the constant is
printed.
Jari Oksanen
Anderson, M.J. & Willis, T.J. (2003). Canonical analysis of principal coordinates: a useful method of constrained ordination for ecology. Ecology 84, 511–525.
Gower, J.C. (1985). Properties of Euclidean and non-Euclidean distance matrices. Linear Algebra and its Applications 67, 81–97.
Legendre, P. & Anderson, M. J. (1999). Distance-based redundancy analysis: testing multispecies responses in multifactorial ecological experiments. Ecological Monographs 69, 1–24.
Legendre, P. & Legendre, L. (2012). Numerical Ecology. 3rd English Edition. Elsevier.
McArdle, B.H. & Anderson, M.J. (2001). Fitting multivariate models to community data: a comment on distance-based redundancy analysis. Ecology 82, 290–297.
rda
, cca
, plot.cca
,
anova.cca
, vegdist
,
dist
, cmdscale
, wcmdscale
for underlying and related functions. Function sppscores
can add species scores or replace existing species scores.
The function returns similar result object as rda
(see
cca.object
). This section for rda
gives a
more complete list of functions that can be used to access and
analyse dbRDA results.
data(varespec, varechem) ## dbrda dbrda(varespec ~ N + P + K + Condition(Al), varechem, dist="bray") ## avoid negative eigenvalues with sqrt distances dbrda(varespec ~ N + P + K + Condition(Al), varechem, dist="bray", sqrt.dist = TRUE) ## avoid negative eigenvalues also with Jaccard distances (m <- dbrda(varespec ~ N + P + K + Condition(Al), varechem, dist="jaccard")) ## add species scores sppscores(m) <- wisconsin(varespec) ## pco pco(varespec, dist = "bray", sqrt.dist = TRUE)
data(varespec, varechem) ## dbrda dbrda(varespec ~ N + P + K + Condition(Al), varechem, dist="bray") ## avoid negative eigenvalues with sqrt distances dbrda(varespec ~ N + P + K + Condition(Al), varechem, dist="bray", sqrt.dist = TRUE) ## avoid negative eigenvalues also with Jaccard distances (m <- dbrda(varespec ~ N + P + K + Condition(Al), varechem, dist="jaccard")) ## add species scores sppscores(m) <- wisconsin(varespec) ## pco pco(varespec, dist = "bray", sqrt.dist = TRUE)
Performs detrended correspondence analysis and basic reciprocal averaging or orthogonal correspondence analysis.
decorana(veg, iweigh=0, iresc=4, ira=0, mk=26, short=0, before=NULL, after=NULL) ## S3 method for class 'decorana' plot(x, choices=c(1,2), origin=TRUE, display=c("both","sites","species","none"), cex = 0.7, cols = c(1,2), type, xlim, ylim, ...) ## S3 method for class 'decorana' text(x, display = c("sites", "species"), labels, choices = 1:2, origin = TRUE, select, ...) ## S3 method for class 'decorana' points(x, display = c("sites", "species"), choices=1:2, origin = TRUE, select, ...) ## S3 method for class 'decorana' scores(x, display="sites", choices=1:4, origin=TRUE, tidy=FALSE, ...) downweight(veg, fraction = 5)
decorana(veg, iweigh=0, iresc=4, ira=0, mk=26, short=0, before=NULL, after=NULL) ## S3 method for class 'decorana' plot(x, choices=c(1,2), origin=TRUE, display=c("both","sites","species","none"), cex = 0.7, cols = c(1,2), type, xlim, ylim, ...) ## S3 method for class 'decorana' text(x, display = c("sites", "species"), labels, choices = 1:2, origin = TRUE, select, ...) ## S3 method for class 'decorana' points(x, display = c("sites", "species"), choices=1:2, origin = TRUE, select, ...) ## S3 method for class 'decorana' scores(x, display="sites", choices=1:4, origin=TRUE, tidy=FALSE, ...) downweight(veg, fraction = 5)
veg |
Community data, a matrix-like object. |
iweigh |
Downweighting of rare species (0: no). |
iresc |
Number of rescaling cycles (0: no rescaling). |
ira |
Type of analysis (0: detrended, 1: basic reciprocal averaging). |
mk |
Number of segments in rescaling. |
short |
Shortest gradient to be rescaled. |
before |
Hill's piecewise transformation: values before transformation. |
after |
Hill's piecewise transformation: values after
transformation – these must correspond to values in |
x |
A |
choices |
Axes shown. |
origin |
Use true origin even in detrended correspondence analysis. |
display |
Display only sites, only species, both or neither. |
cex |
Plot character size. |
cols |
Colours used for sites and species. |
type |
Type of plots, partial match to |
labels |
Optional text to be used instead of row names. |
select |
Items to be displayed. This can either be a logical
vector which is |
xlim , ylim
|
the x and y limits (min,max) of the plot. |
fraction |
Abundance fraction where downweighting begins. |
tidy |
Return scores that are compatible with ggplot2:
all scores are in a single |
... |
Other arguments for |
In late 1970s, correspondence analysis became the method of choice for ordination in vegetation science, since it seemed better able to cope with non-linear species responses than principal components analysis. However, even correspondence analysis can produce an arc-shaped configuration of a single gradient. Mark Hill developed detrended correspondence analysis to correct two assumed ‘faults’ in correspondence analysis: curvature of straight gradients and packing of sites at the ends of the gradient.
The curvature is removed by replacing the orthogonalization of axes
with detrending. In orthogonalization successive axes are made
non-correlated, but detrending should remove all systematic dependence
between axes. Detrending is performed using a smoothing window on
mk
segments. The packing of sites at the ends of the gradient
is undone by rescaling the axes after extraction. After rescaling,
the axis is supposed to be scaled by ‘SD’ units, so that the
average width of Gaussian species responses is supposed to be one over
whole axis. Other innovations were the piecewise linear transformation
of species abundances and downweighting of rare species which were
regarded to have an unduly high influence on ordination axes.
It seems that detrending actually works by twisting the ordination
space, so that the results look non-curved in two-dimensional
projections (‘lolly paper effect’). As a result, the points
usually have an easily recognized triangular or diamond shaped
pattern, obviously an artefact of detrending. Rescaling works
differently than commonly presented, too. decorana
does not
use, or even evaluate, the widths of species responses. Instead, it
tries to equalize the weighted standard deviation of species scores on
axis segments (parameter mk
has no effect, since
decorana
finds the segments internally). Function
tolerance
returns this internal criterion and can be
used to assess the success of rescaling.
The plot
method plots species and site scores. Classical
decorana
scaled the axes so that smallest site score was 0 (and
smallest species score was negative), but summary
, plot
and scores
use the true origin, unless origin = FALSE
.
In addition to proper eigenvalues, the function reports
‘decorana values’ in detrended analysis. These ‘decorana
values’ are the values that the legacy code of decorana
returns
as eigenvalues. They are estimated during iteration, and describe the
joint effects of axes and detrending. The ‘decorana values’ are
estimated before rescaling and do not show its effect on
eigenvalues. The proper eigenvalues are estimated after extraction of
the axes and they are the ratio of weighted sum of squares of site and
species scores even in detrended and rescaled solutions. These
eigenvalues are estimated for each axis separately, but they are not
additive, because higher decorana
axes can show effects already
explained by prior axes. ‘Additive eigenvalues’ are cleansed
from the effects of prior axes, and they can be assumed to add up to
total inertia (scaled Chi-square). For proportions and cumulative
proportions explained you can use eigenvals.decorana
.
decorana
returns an object of class "decorana"
, which
has print
, summary
, scores
, plot
,
points
and text
methods, and support functions
eigenvals
, bstick
, screeplot
,
predict
and tolerance
. downweight
is an
independent function that can also be used with other methods than
decorana
.
decorana
uses the central numerical engine of the
original Fortran code (which is in the public domain), or about 1/3 of
the original program. I have tried to implement the original
behaviour, although a great part of preparatory steps were written in
R language, and may differ somewhat from the original code. However,
well-known bugs are corrected and strict criteria used (Oksanen &
Minchin 1997).
Please note that there really is no need for piecewise transformation
or even downweighting within decorana
, since there are more
powerful and extensive alternatives in R, but these options are
included for compliance with the original software. If a different
fraction of abundance is needed in downweighting, function
downweight
must be applied before decorana
. Function
downweight
indeed can be applied prior to correspondence
analysis, and so it can be used together with cca
, too.
Github package natto has an R implementation of
decorana
which allows easier inspection of the
algorithm and also easier development of the function.
vegan 2.6-6 and earlier had a summary
method, but it did
nothing useful and is now defunct. All its former information can be
extracted with scores
or weights.decorana
.
Mark O. Hill wrote the original Fortran code, the R port was by Jari Oksanen.
Hill, M.O. and Gauch, H.G. (1980). Detrended correspondence analysis: an improved ordination technique. Vegetatio 42, 47–58.
Oksanen, J. and Minchin, P.R. (1997). Instability of ordination results under changes in input data order: explanations and remedies. Journal of Vegetation Science 8, 447–454.
For unconstrained ordination, non-metric multidimensional scaling in
monoMDS
may be more robust (see also
metaMDS
). Constrained (or ‘canonical’)
correspondence analysis can be made with cca
.
Orthogonal correspondence analysis can be made with decorana
or
cca
, but the scaling of results vary (and the one in
decorana
corresponds to scaling = "sites"
and hill
= TRUE
in cca
.). See predict.decorana
for adding new points to an ordination.
data(varespec) vare.dca <- decorana(varespec) vare.dca plot(vare.dca) ### the detrending rationale: gaussresp <- function(x,u) exp(-(x-u)^2/2) x <- seq(0,6,length=15) ## The gradient u <- seq(-2,8,len=23) ## The optima pack <- outer(x,u,gaussresp) matplot(x, pack, type="l", main="Species packing") opar <- par(mfrow=c(2,2)) plot(scores(prcomp(pack)), asp=1, type="b", main="PCA") plot(scores(decorana(pack, ira=1)), asp=1, type="b", main="CA") plot(scores(decorana(pack)), asp=1, type="b", main="DCA") plot(scores(cca(pack ~ x), dis="sites"), asp=1, type="b", main="CCA") ### Let's add some noise: noisy <- (0.5 + runif(length(pack)))*pack par(mfrow=c(2,1)) matplot(x, pack, type="l", main="Ideal model") matplot(x, noisy, type="l", main="Noisy model") par(mfrow=c(2,2)) plot(scores(prcomp(noisy)), type="b", main="PCA", asp=1) plot(scores(decorana(noisy, ira=1)), type="b", main="CA", asp=1) plot(scores(decorana(noisy)), type="b", main="DCA", asp=1) plot(scores(cca(noisy ~ x), dis="sites"), asp=1, type="b", main="CCA") par(opar)
data(varespec) vare.dca <- decorana(varespec) vare.dca plot(vare.dca) ### the detrending rationale: gaussresp <- function(x,u) exp(-(x-u)^2/2) x <- seq(0,6,length=15) ## The gradient u <- seq(-2,8,len=23) ## The optima pack <- outer(x,u,gaussresp) matplot(x, pack, type="l", main="Species packing") opar <- par(mfrow=c(2,2)) plot(scores(prcomp(pack)), asp=1, type="b", main="PCA") plot(scores(decorana(pack, ira=1)), asp=1, type="b", main="CA") plot(scores(decorana(pack)), asp=1, type="b", main="DCA") plot(scores(cca(pack ~ x), dis="sites"), asp=1, type="b", main="CCA") ### Let's add some noise: noisy <- (0.5 + runif(length(pack)))*pack par(mfrow=c(2,1)) matplot(x, pack, type="l", main="Ideal model") matplot(x, noisy, type="l", main="Noisy model") par(mfrow=c(2,2)) plot(scores(prcomp(noisy)), type="b", main="PCA", asp=1) plot(scores(decorana(noisy, ira=1)), type="b", main="CA", asp=1) plot(scores(decorana(noisy)), type="b", main="DCA", asp=1) plot(scores(cca(noisy ~ x), dis="sites"), asp=1, type="b", main="CCA") par(opar)
The function provides some popular (and effective) standardization methods for community ecologists.
decostand(x, method, MARGIN, range.global, logbase = 2, na.rm=FALSE, ...) wisconsin(x) decobackstand(x, zap = TRUE)
decostand(x, method, MARGIN, range.global, logbase = 2, na.rm=FALSE, ...) wisconsin(x) decobackstand(x, zap = TRUE)
x |
Community data, a matrix-like object. For
|
method |
Standardization method. See Details for available options. |
MARGIN |
Margin, if default is not acceptable. |
range.global |
Matrix from which the range is found in
|
logbase |
The logarithm base used in |
na.rm |
Ignore missing values in row or column
standardizations. The |
zap |
Make near-zero values exact zeros to avoid negative values and exaggerated estimates of species richness. |
... |
Other arguments to the function (ignored). |
The function offers following standardization methods for community data:
total
: divide by margin total (default MARGIN = 1
).
max
: divide by margin maximum (default MARGIN = 2
).
frequency
: divide by margin total and multiply by the
number of non-zero items, so that the average of non-zero entries is
one (Oksanen 1983; default MARGIN = 2
).
normalize
: make margin sum of squares equal to one (default
MARGIN = 1
).
range
: standardize values into range 0 ... 1 (default
MARGIN = 2
). If all values are constant, they will be
transformed to 0.
rank, rrank
: rank
replaces abundance values by
their increasing ranks leaving zeros unchanged, and rrank
is
similar but uses relative ranks with maximum 1 (default
MARGIN = 1
). Average ranks are used for tied values.
standardize
: scale x
to zero mean and unit variance
(default MARGIN = 2
).
pa
: scale x
to presence/absence scale (0/1).
chi.square
: divide by row sums and square root of
column sums, and adjust for square root of matrix total
(Legendre & Gallagher 2001). When used with the Euclidean
distance, the distances should be similar to the
Chi-square distance used in correspondence analysis. However, the
results from cmdscale
would still differ, since
CA is a weighted ordination method (default MARGIN = 1
).
hellinger
: square root of method = "total"
(Legendre & Gallagher 2001).
log
: logarithmic transformation as suggested by
Anderson et al. (2006): for
, where
is the base of the logarithm; zeros are
left as zeros. Higher bases give less weight to quantities and more
to presences, and
logbase = Inf
gives the presence/absence
scaling. Please note this is not .
Anderson et al. (2006) suggested this for their (strongly) modified
Gower distance (implemented as
method = "altGower"
in
vegdist
), but the standardization can be used
independently of distance indices.
alr
: Additive log ratio ("alr") transformation
(Aitchison 1986) reduces data skewness and compositionality
bias. The transformation assumes positive values, pseudocounts can
be added with the argument pseudocount
. One of the
rows/columns is a reference that can be given by reference
(name of index). The first row/column is used by default
(reference = 1
). Note that this transformation drops one
row or column from the transformed output data. The alr
transformation is defined formally as follows:
where the denominator sample can be chosen
arbitrarily. This transformation is often used with pH and other
chemistry measurenments. It is also commonly used as multinomial
logistic regression. Default
MARGIN = 1
uses row as the
reference
.
clr
: centered log ratio ("clr") transformation proposed by
Aitchison (1986) and it is used to reduce data skewness and compositionality bias.
This transformation has frequent applications in microbial ecology
(see e.g. Gloor et al., 2017). The clr transformation is defined as:
where is a single value, and g(x) is the geometric mean of
.
The method can operate only with positive data;
a common way to deal with zeroes is to add pseudocount
(e.g. the smallest positive value in the data), either by
adding it manually to the input data, or by using the argument
pseudocount
as in
decostand(x, method = "clr", pseudocount = 1)
. Adding
pseudocount will inevitably introduce some bias; see
the rclr
method for one available solution.
rclr
: robust clr ("rclr") is similar to regular clr
(see above) but allows data that contains zeroes. This method
does not use pseudocounts, unlike the standard clr.
The robust clr (rclr) divides the values by geometric mean
of the observed features; zero values are kept as zeroes, and not
taken into account. In high dimensional data,
the geometric mean of rclr approximates the true
geometric mean; see e.g. Martino et al. (2019)
The rclr
transformation is defined formally as follows:
where is a single value, and
is the geometric
mean of sample-wide values
that are positive (> 0).
Standardization, as contrasted to transformation, means that the entries are transformed relative to other entries.
All methods have a default margin. MARGIN=1
means rows (sites
in a normal data set) and MARGIN=2
means columns (species in a
normal data set).
Command wisconsin
is a shortcut to common Wisconsin double
standardization where species (MARGIN=2
) are first standardized
by maxima (max
) and then sites (MARGIN=1
) by
site totals (tot
).
Most standardization methods will give nonsense results with
negative data entries that normally should not occur in the community
data. If there are empty sites or species (or constant with
method = "range"
), many standardization will change these into
NaN
.
Function decobackstand
can be used to transform standardized
data back to original. This is not possible for all standardization
and may not be implemented to all cases where it would be
possible. There are round-off errors and back-transformation is not
exact, and it is wise not to overwrite the original data. With
zap=TRUE
original zeros should be exact.
Returns the standardized data frame, and adds an attribute
"decostand"
giving the name of applied standardization
"method"
and attribute "parameters"
with appropriate
transformation parameters.
Common transformations can be made with standard R functions.
Jari Oksanen, Etienne Laliberté
(method = "log"
), Leo Lahti (alr
,
"clr"
and "rclr"
).
Aitchison, J. The Statistical Analysis of Compositional Data (1986). London, UK: Chapman & Hall.
Anderson, M.J., Ellingsen, K.E. & McArdle, B.H. (2006) Multivariate dispersion as a measure of beta diversity. Ecology Letters 9, 683–693.
Egozcue, J.J., Pawlowsky-Glahn, V., Mateu-Figueras, G., Barcel'o-Vidal, C. (2003) Isometric logratio transformations for compositional data analysis. Mathematical Geology 35, 279–300.
Gloor, G.B., Macklaim, J.M., Pawlowsky-Glahn, V. & Egozcue, J.J. (2017) Microbiome Datasets Are Compositional: And This Is Not Optional. Frontiers in Microbiology 8, 2224.
Legendre, P. & Gallagher, E.D. (2001) Ecologically meaningful transformations for ordination of species data. Oecologia 129, 271–280.
Martino, C., Morton, J.T., Marotz, C.A., Thompson, L.R., Tripathi, A., Knight, R. & Zengler, K. (2019) A novel sparse compositional technique reveals microbial perturbations. mSystems 4, 1.
Oksanen, J. (1983) Ordination of boreal heath-like vegetation with principal component analysis, correspondence analysis and multidimensional scaling. Vegetatio 52, 181–189.
data(varespec) sptrans <- decostand(varespec, "max") apply(sptrans, 2, max) sptrans <- wisconsin(varespec) # CLR transformation for rows, with pseudocount varespec.clr <- decostand(varespec, "clr", pseudocount=1) # ALR transformation for rows, with pseudocount and reference sample varespec.alr <- decostand(varespec, "alr", pseudocount=1, reference=1) ## Chi-square: PCA similar but not identical to CA. ## Use wcmdscale for weighted analysis and identical results. sptrans <- decostand(varespec, "chi.square") plot(procrustes(rda(sptrans), cca(varespec)))
data(varespec) sptrans <- decostand(varespec, "max") apply(sptrans, 2, max) sptrans <- wisconsin(varespec) # CLR transformation for rows, with pseudocount varespec.clr <- decostand(varespec, "clr", pseudocount=1) # ALR transformation for rows, with pseudocount and reference sample varespec.alr <- decostand(varespec, "alr", pseudocount=1, reference=1) ## Chi-square: PCA similar but not identical to CA. ## Use wcmdscale for weighted analysis and identical results. sptrans <- decostand(varespec, "chi.square") plot(procrustes(rda(sptrans), cca(varespec)))
Function designdist
lets you define your own dissimilarities
using terms for shared and total quantities, number of rows and number
of columns. The shared and total quantities can be binary, quadratic
or minimum terms. In binary terms, the shared component is number of
shared species, and totals are numbers of species on sites. The
quadratic terms are cross-products and sums of squares, and minimum
terms are sums of parallel minima and row totals. Function
designdist2
is similar, but finds dissimilarities among two
data sets. Function chaodist
lets you define your own
dissimilarities using terms that are supposed to take into account the
“unseen species” (see Chao et al., 2005 and Details in
vegdist
).
designdist(x, method = "(A+B-2*J)/(A+B)", terms = c("binary", "quadratic", "minimum"), abcd = FALSE, alphagamma = FALSE, name, maxdist) designdist2(x, y, method = "(A+B-2*J)/(A+B)", terms = c("binary", "quadratic", "minimum"), abcd = FALSE, alphagamma = FALSE, name, maxdist) chaodist(x, method = "1 - 2*U*V/(U+V)", name)
designdist(x, method = "(A+B-2*J)/(A+B)", terms = c("binary", "quadratic", "minimum"), abcd = FALSE, alphagamma = FALSE, name, maxdist) designdist2(x, y, method = "(A+B-2*J)/(A+B)", terms = c("binary", "quadratic", "minimum"), abcd = FALSE, alphagamma = FALSE, name, maxdist) chaodist(x, method = "1 - 2*U*V/(U+V)", name)
x |
Input data. |
y |
Another input data set: dissimilarities will be calculated
among rows of |
method |
Equation for your dissimilarities. This can use terms
|
terms |
How shared and total components are found. For vectors
|
abcd |
Use 2x2 contingency table notation for binary data:
|
alphagamma |
Use beta diversity notation with terms
|
name |
The name you want to use for your index. The default is to
combine the |
maxdist |
Theoretical maximum of the dissimilarity, or |
Most popular dissimilarity measures in ecology can be expressed with
the help of terms J
, A
and B
, and some also involve
matrix dimensions N
and P
. Some examples you can define in
designdist
are:
A+B-2*J |
"quadratic" |
squared Euclidean |
A+B-2*J |
"minimum" |
Manhattan |
(A+B-2*J)/(A+B) |
"minimum" |
Bray-Curtis |
(A+B-2*J)/(A+B) |
"binary" |
Sørensen |
(A+B-2*J)/(A+B-J) |
"binary" |
Jaccard |
(A+B-2*J)/(A+B-J) |
"minimum" |
Ružička |
(A+B-2*J)/(A+B-J) |
"quadratic" |
(dis)similarity ratio |
1-J/sqrt(A*B) |
"binary" |
Ochiai |
1-J/sqrt(A*B) |
"quadratic" |
cosine complement |
1-phyper(J-1, A, P-A, B) |
"binary" |
Raup-Crick (but see raupcrick )
|
The function designdist
can implement most dissimilarity
indices in vegdist
or elsewhere, and it can also be
used to implement many other indices, amongst them, most of those
described in Legendre & Legendre (2012). It can also be used to
implement all indices of beta diversity described in Koleff et
al. (2003), but there also is a specific function
betadiver
for the purpose.
If you want to implement binary dissimilarities based on the 2x2
contingency table notation, you can set abcd = TRUE
. In this
notation a = J
, b = A-J
, c = B-J
, d = P-A-B+J
.
This notation is often used instead of the more more
tangible default notation for reasons that are opaque to me.
With alphagamma = TRUE
it is possible to use beta diversity
notation with terms alpha
for average alpha diversity and
gamma
for gamma diversity in two compared sites. The terms
are calculated as alpha = (A+B)/2
, gamma = A+B-J
and
delta = abs(A-B)/2
. Terms A
and B
are also
available and give the alpha diversities of the individual compared
sites. The beta diversity terms may make sense only for binary
terms (so that diversities are expressed in numbers of species), but
they are calculated for quadratic and minimum terms as well (with a
warning).
Function chaodist
is similar to designgist
, but uses
terms U
and V
of Chao et al. (2005). These terms are
supposed to take into account the effects of unseen species. Both
U
and V
are scaled to range . They take
the place of
A
and B
and the product U*V
is used
in the place of J
of designdist
. Function
chaodist
can implement any commonly used Chao et al. (2005)
style dissimilarity:
1 - 2*U*V/(U+V) |
Sørensen type |
1 - U*V/(U+V-U*V) |
Jaccard type |
1 - sqrt(U*V) |
Ochiai type |
(pmin(U,V) - U*V)/pmin(U,V) |
Simpson type |
Function vegdist
implements Jaccard-type Chao distance,
and its documentation contains more complete discussion on the
calculation of the terms.
designdist
returns an object of class dist
.
designdist
does not use compiled code, but it is based on
vectorized R code. The designdist
function can be much
faster than vegdist
, although the latter uses compiled
code. However, designdist
cannot skip missing values and uses
much more memory during calculations.
The use of sum terms can be numerically unstable. In particularly,
when these terms are large, the precision may be lost. The risk is
large when the number of columns is high, and particularly large with
quadratic terms. For precise calculations it is better to use
functions like dist
and vegdist
which are
more robust against numerical problems.
Jari Oksanen
Chao, A., Chazdon, R. L., Colwell, R. K. and Shen, T. (2005) A new statistical approach for assessing similarity of species composition with incidence and abundance data. Ecology Letters 8, 148–159.
Koleff, P., Gaston, K.J. and Lennon, J.J. (2003) Measuring beta diversity for presence–absence data. J. Animal Ecol. 72, 367–382.
Legendre, P. and Legendre, L. (2012) Numerical Ecology. 3rd English ed. Elsevier
vegdist
, betadiver
, dist
,
raupcrick
.
data(BCI) ## Four ways of calculating the same Sørensen dissimilarity d0 <- vegdist(BCI, "bray", binary = TRUE) d1 <- designdist(BCI, "(A+B-2*J)/(A+B)") d2 <- designdist(BCI, "(b+c)/(2*a+b+c)", abcd = TRUE) d3 <- designdist(BCI, "gamma/alpha - 1", alphagamma = TRUE) ## Arrhenius dissimilarity: the value of z in the species-area model ## S = c*A^z when combining two sites of equal areas, where S is the ## number of species, A is the area, and c and z are model parameters. ## The A below is not the area (which cancels out), but number of ## species in one of the sites, as defined in designdist(). dis <- designdist(BCI, "(log(A+B-J)-log(A+B)+log(2))/log(2)") ## This can be used in clustering or ordination... ordiplot(cmdscale(dis)) ## ... or in analysing beta diversity (without gradients) summary(dis)
data(BCI) ## Four ways of calculating the same Sørensen dissimilarity d0 <- vegdist(BCI, "bray", binary = TRUE) d1 <- designdist(BCI, "(A+B-2*J)/(A+B)") d2 <- designdist(BCI, "(b+c)/(2*a+b+c)", abcd = TRUE) d3 <- designdist(BCI, "gamma/alpha - 1", alphagamma = TRUE) ## Arrhenius dissimilarity: the value of z in the species-area model ## S = c*A^z when combining two sites of equal areas, where S is the ## number of species, A is the area, and c and z are model parameters. ## The A below is not the area (which cancels out), but number of ## species in one of the sites, as defined in designdist(). dis <- designdist(BCI, "(log(A+B-J)-log(A+B)+log(2))/log(2)") ## This can be used in clustering or ordination... ordiplot(cmdscale(dis)) ## ... or in analysing beta diversity (without gradients) summary(dis)
The functions extract statistics that resemble deviance and AIC from the
result of constrained correspondence analysis cca
or
redundancy analysis rda
. These functions are rarely
needed directly, but they are called by step
in
automatic model building. Actually, cca
and
rda
do not have AIC
and these functions
are certainly wrong.
## S3 method for class 'cca' deviance(object, ...) ## S3 method for class 'cca' extractAIC(fit, scale = 0, k = 2, ...)
## S3 method for class 'cca' deviance(object, ...) ## S3 method for class 'cca' extractAIC(fit, scale = 0, k = 2, ...)
object |
|
fit |
fitted model from constrained ordination. |
scale |
optional numeric specifying the scale parameter of the model,
see |
k |
numeric specifying the "weight" of the equivalent degrees of
freedom (=: |
... |
further arguments. |
The functions find statistics that
resemble deviance
and AIC
in constrained
ordination. Actually, constrained ordination methods do not have a
log-Likelihood, which means that they cannot have AIC and deviance.
Therefore you should not use these functions, and if you use them, you
should not trust them. If you use these functions, it remains as your
responsibility to check the adequacy of the result.
The deviance of cca
is equal to the Chi-square of
the residual data matrix after fitting the constraints. The deviance
of rda
is defined as the residual sum of squares. The
deviance function of rda
is also used for distance-based RDA
dbrda
. Function extractAIC
mimics
extractAIC.lm
in translating deviance to AIC.
There is little need to call these functions directly. However, they
are called implicitly in step
function used in automatic
selection of constraining variables. You should check the resulting
model with some other criteria, because the statistics used here are
unfounded. In particular, the penalty k
is not properly
defined, and the default k = 2
is not justified
theoretically. If you have only continuous covariates, the step
function will base the model building on magnitude of eigenvalues, and
the value of k
only influences the stopping point (but the
variables with the highest eigenvalues are not necessarily the most
significant in permutation tests in anova.cca
). If you
also have multi-class factors, the value of k
will have a
capricious effect in model building. The step
function
will pass arguments to add1.cca
and
drop1.cca
, and setting test = "permutation"
will provide permutation tests of each deletion and addition which
can help in judging the validity of the model building.
The deviance
functions return “deviance”, and
extractAIC
returns effective degrees of freedom and “AIC”.
These functions are unfounded and untested and they should not be used
directly or implicitly. Moreover, usual caveats in using
step
are very valid.
Jari Oksanen
Godínez-Domínguez, E. & Freire, J. (2003) Information-theoretic approach for selection of spatial and temporal models of community organization. Marine Ecology Progress Series 253, 17–24.
cca
, rda
, anova.cca
,
step
, extractAIC
,
add1.cca
, drop1.cca
.
# The deviance of correspondence analysis equals Chi-square data(dune) data(dune.env) chisq.test(dune) deviance(cca(dune)) # Stepwise selection (forward from an empty model "dune ~ 1") ord <- cca(dune ~ ., dune.env) step(cca(dune ~ 1, dune.env), scope = formula(ord))
# The deviance of correspondence analysis equals Chi-square data(dune) data(dune.env) chisq.test(dune) deviance(cca(dune)) # Stepwise selection (forward from an empty model "dune ~ 1") ord <- cca(dune ~ ., dune.env) step(cca(dune ~ 1, dune.env), scope = formula(ord))
Calculates the Morisita index of dispersion, standardized index values, and the so called clumpedness and uniform indices.
dispindmorisita(x, unique.rm = FALSE, crit = 0.05, na.rm = FALSE)
dispindmorisita(x, unique.rm = FALSE, crit = 0.05, na.rm = FALSE)
x |
community data matrix, with sites (samples) as rows and species as columns. |
unique.rm |
logical, if |
crit |
two-sided p-value used to calculate critical Chi-squared values. |
na.rm |
logical.
Should missing values (including |
The Morisita index of dispersion is defined as (Morisita 1959, 1962):
Imor = n * (sum(xi^2) - sum(xi)) / (sum(xi)^2 - sum(xi))
where is the count of individuals in sample
, and
is the number of samples (
).
has values from 0 to
. In uniform (hyperdispersed)
patterns its value falls between 0 and 1, in clumped patterns it falls
between 1 and
. For increasing sample sizes (i.e. joining
neighbouring quadrats),
goes to
as the
quadrat size approaches clump size. For random patterns,
and counts in the samples follow Poisson
frequency distribution.
The deviation from random expectation (null hypothesis)
can be tested using critical values of the Chi-squared
distribution with degrees of freedom.
Confidence intervals around 1 can be calculated by the clumped
and uniform
indices (Hairston et al. 1971, Krebs
1999) (Chi2Lower and Chi2Upper refers to e.g. 0.025 and 0.975 quantile
values of the Chi-squared distribution with
degrees of
freedom, respectively, for
crit = 0.05
):
Mclu = (Chi2Lower - n + sum(xi)) / (sum(xi) - 1)
Muni = (Chi2Upper - n + sum(xi)) / (sum(xi) - 1)
Smith-Gill (1975) proposed scaling of Morisita index from [0, n]
interval into [-1, 1], and setting up -0.5 and 0.5 values as
confidence limits around random distribution with rescaled value 0. To
rescale the Morisita index, one of the following four equations apply
to calculate the standardized index :
(a) Imor >= Mclu > 1
: Imst = 0.5 + 0.5 (Imor - Mclu) / (n - Mclu)
,
(b) Mclu > Imor >= 1
: Imst = 0.5 (Imor - 1) / (Mclu - 1)
,
(c) 1 > Imor > Muni
: Imst = -0.5 (Imor - 1) / (Muni - 1)
,
(d) 1 > Muni > Imor
: Imst = -0.5 + 0.5 (Imor - Muni) / Muni
.
Returns a data frame with as many rows as the number of columns
in the input data, and with four columns. Columns are: imor
the
unstandardized Morisita index, mclu
the clumpedness index,
muni
the uniform index, imst
the standardized Morisita
index, pchisq
the Chi-squared based probability for the null
hypothesis of random expectation.
A common error found in several papers is that when standardizing
as in the case (b), the denominator is given as Muni - 1
. This
results in a hiatus in the [0, 0.5] interval of the standardized
index. The root of this typo is the book of Krebs (1999), see the Errata
for the book (Page 217,
https://www.zoology.ubc.ca/~krebs/downloads/errors_2nd_printing.pdf).
Péter Sólymos, [email protected]
Morisita, M. 1959. Measuring of the dispersion of individuals and analysis of the distributional patterns. Mem. Fac. Sci. Kyushu Univ. Ser. E 2, 215–235.
Morisita, M. 1962. Id-index, a measure of dispersion of individuals. Res. Popul. Ecol. 4, 1–7.
Smith-Gill, S. J. 1975. Cytophysiological basis of disruptive pigmentary patterns in the leopard frog, Rana pipiens. II. Wild type and mutant cell specific patterns. J. Morphol. 146, 35–54.
Hairston, N. G., Hill, R. and Ritte, U. 1971. The interpretation of aggregation patterns. In: Patil, G. P., Pileou, E. C. and Waters, W. E. eds. Statistical Ecology 1: Spatial Patterns and Statistical Distributions. Penn. State Univ. Press, University Park.
Krebs, C. J. 1999. Ecological Methodology. 2nd ed. Benjamin Cummings Publishers.
data(dune) x <- dispindmorisita(dune) x y <- dispindmorisita(dune, unique.rm = TRUE) y dim(x) ## with unique species dim(y) ## unique species removed
data(dune) x <- dispindmorisita(dune) x y <- dispindmorisita(dune, unique.rm = TRUE) y dim(x) ## with unique species dim(y) ## unique species removed
Transform abundance data downweighting species that are overdispersed to the Poisson error.
dispweight(comm, groups, nsimul = 999, nullmodel = "c0_ind", plimit = 0.05) gdispweight(formula, data, plimit = 0.05) ## S3 method for class 'dispweight' summary(object, ...)
dispweight(comm, groups, nsimul = 999, nullmodel = "c0_ind", plimit = 0.05) gdispweight(formula, data, plimit = 0.05) ## S3 method for class 'dispweight' summary(object, ...)
comm |
Community data matrix. |
groups |
Factor describing the group structure. If missing, all
sites are regarded as belonging to one group. |
nsimul |
Number of simulations. |
nullmodel |
The |
plimit |
Downweight species if their |
formula , data
|
Formula where the left-hand side is the
community data frame and right-hand side gives the explanatory
variables. The explanatory variables are found in the data frame
given in |
object |
Result object from |
... |
Other parameters passed to functions. |
The dispersion index () is calculated as ratio between variance
and expected value for each species. If the species abundances follow
Poisson distribution, expected dispersion is
, and if
, the species is overdispersed. The inverse
can
be used to downweight species abundances. Species are only
downweighted when overdispersion is judged to be statistically
significant (Clarke et al. 2006).
Function dispweight
implements the original procedure of Clarke
et al. (2006). Only one factor can be used to group the sites and to
find the species means. The significance of overdispersion is assessed
freely distributing individuals of each species within factor
levels. This is achieved by using nullmodel
"c0_ind"
(which accords to Clarke et al. 2006), but other
nullmodels can be used, though they may not be meaningful (see
commsim
for alternatives). If a species is absent in
some factor level, the whole level is ignored in calculation of
overdispersion, and the number of degrees of freedom can vary among
species. The reduced number of degrees of freedom is used as a divisor
for overdispersion , and such species have higher dispersion
and hence lower weights in transformation.
Function gdispweight
is a generalized parametric version of
dispweight
. The function is based on glm
with
quasipoisson
error family
. Any
glm
model can be used, including several factors or
continuous covariates. Function gdispweight
uses the same test
statistic as dispweight
(Pearson Chi-square), but it does not
ignore factor levels where species is absent, and the number of
degrees of freedom is equal for all species. Therefore transformation
weights can be higher than in dispweight
. The
gdispweight
function evaluates the significance of
overdispersion parametrically from Chi-square distribution
(pchisq
).
Functions dispweight
and gdispweight
transform data, but
they add information on overdispersion and weights as attributes of
the result. The summary
can be used to extract and print that
information.
Function returns transformed data with the following new attributes:
D |
Dispersion statistic. |
df |
Degrees of freedom for each species. |
p |
|
weights |
weights applied to community data. |
nsimul |
Number of simulations used to assess the |
nullmodel |
The name of |
Eduard Szöcs [email protected] wrote the original
dispweight
, Jari Oksanen significantly modified the code,
provided support functions and developed gdispweight
.
Clarke, K. R., M. G. Chapman, P. J. Somerfield, and H. R. Needham. 2006. Dispersion-based weighting of species counts in assemblage analyses. Marine Ecology Progress Series, 320, 11–27.
data(mite, mite.env) ## dispweight and its summary mite.dw <- with(mite.env, dispweight(mite, Shrub, nsimul = 99)) ## IGNORE_RDIFF_BEGIN summary(mite.dw) ## IGNORE_RDIFF_END ## generalized dispersion weighting mite.dw <- gdispweight(mite ~ Shrub + WatrCont, data = mite.env) rda(mite.dw ~ Shrub + WatrCont, data = mite.env)
data(mite, mite.env) ## dispweight and its summary mite.dw <- with(mite.env, dispweight(mite, Shrub, nsimul = 99)) ## IGNORE_RDIFF_BEGIN summary(mite.dw) ## IGNORE_RDIFF_END ## generalized dispersion weighting mite.dw <- gdispweight(mite ~ Shrub + WatrCont, data = mite.env) rda(mite.dw ~ Shrub + WatrCont, data = mite.env)
Function distconnected
finds groups that are connected
disregarding dissimilarities that are at or above a threshold or
NA
. The function can be used to find groups that can be
ordinated together or transformed by
stepacross
. Function no.shared
returns a logical
dissimilarity object, where TRUE
means that sites have no
species in common. This is a minimal structure for
distconnected
or can be used to set missing values to
dissimilarities.
distconnected(dis, toolong = 1, trace = TRUE) no.shared(x)
distconnected(dis, toolong = 1, trace = TRUE) no.shared(x)
dis |
Dissimilarity data inheriting from class |
toolong |
Shortest dissimilarity regarded as |
trace |
Summarize results of |
x |
Community data. |
Data sets are disconnected if they have sample plots or groups of
sample plots which share no species with other sites or groups of
sites. Such data sets cannot be sensibly ordinated by any
unconstrained method because these subsets cannot be related to each
other. For instance, correspondence analysis will polarize these
subsets with eigenvalue 1. Neither can such dissimilarities be
transformed with stepacross
, because there is no path
between all points, and result will contain NA
s. Function
distconnected
will find such subsets in dissimilarity
matrices. The function will return a grouping vector that can be used
for sub-setting the data. If data are connected, the result vector will
be all s. The connectedness between two points can be defined
either by a threshold
toolong
or using input dissimilarities
with NA
s.
Function no.shared
returns a dist
structure having value
TRUE
when two sites have nothing in common, and value
FALSE
when they have at least one shared species. This is a
minimal structure that can be analysed with distconnected
. The
function can be used to select dissimilarities with no shared species
in indices which do not have a fixed upper limit.
Function distconnected
uses depth-first search
(Sedgewick 1990).
Function distconnected
returns a vector for
observations using integers to identify connected groups. If the data
are connected, values will be all 1
. Function no.shared
returns an object of class dist
.
Jari Oksanen
Sedgewick, R. (1990). Algorithms in C. Addison Wesley.
vegdist
or dist
for getting
dissimilarities, stepacross
for a case where you may need
distconnected
, and for connecting points spantree
.
## There are no disconnected data in vegan, and the following uses an ## extremely low threshold limit for connectedness. This is for ## illustration only, and not a recommended practice. data(dune) dis <- vegdist(dune) gr <- distconnected(dis, toolong=0.4) # Make sites with no shared species as NA in Manhattan dissimilarities dis <- vegdist(dune, "manhattan") is.na(dis) <- no.shared(dune)
## There are no disconnected data in vegan, and the following uses an ## extremely low threshold limit for connectedness. This is for ## illustration only, and not a recommended practice. data(dune) dis <- vegdist(dune) gr <- distconnected(dis, toolong=0.4) # Make sites with no shared species as NA in Manhattan dissimilarities dis <- vegdist(dune, "manhattan") is.na(dis) <- no.shared(dune)
Shannon, Simpson, and Fisher diversity indices and species richness.
diversity(x, index = "shannon", groups, equalize.groups = FALSE, MARGIN = 1, base = exp(1)) simpson.unb(x, inverse = FALSE) fisher.alpha(x, MARGIN = 1, ...) specnumber(x, groups, MARGIN = 1)
diversity(x, index = "shannon", groups, equalize.groups = FALSE, MARGIN = 1, base = exp(1)) simpson.unb(x, inverse = FALSE) fisher.alpha(x, MARGIN = 1, ...) specnumber(x, groups, MARGIN = 1)
x |
Community data, a matrix-like object or a vector. |
index |
Diversity index, one of |
MARGIN |
Margin for which the index is computed. |
base |
The logarithm |
inverse |
Use inverse Simpson similarly as in
|
groups |
A grouping factor: if given, finds the diversity of communities pooled by the groups. |
equalize.groups |
Instead of observed abundances, standardize all communities to unit total. |
... |
Parameters passed to the function. |
Shannon or Shannon–Weaver (or Shannon–Wiener) index is defined as
, where
is the proportional abundance of species
and
is the base of the logarithm. It is most popular to use natural
logarithms, but some argue for base
(which makes sense,
but no real difference).
Both variants of Simpson's index are based on . Choice
simpson
returns and
invsimpson
returns .
simpson.unb
finds unbiased Simpson indices for discrete
samples (Hurlbert 1971, eq. 5). These are less sensitive to sample
size than the basic Simpson indices. The unbiased indices can be only
calculated for data of integer counts.
The diversity
function can find the total (or gamma) diversity
of pooled communities with argument groups
. The average alpha
diversity can be found as the mean of diversities by the same groups,
and their difference or ratio is an estimate of beta diversity (see
Examples). The pooling can be based either on the observed
abundancies, or all communities can be equalized to unit total before
pooling; see Jost (2007) for discussion. Functions
adipart
and multipart
provide canned
alternatives for estimating alpha, beta and gamma diversities in
hierarchical settings.
fisher.alpha
estimates the parameter of
Fisher's logarithmic series (see
fisherfit
).
The estimation is possible only for genuine
counts of individuals.
None of these diversity indices is usable for empty sampling units without any species, but some of the indices can give a numeric value. Filtering out these cases is left for the user.
Function specnumber
finds the number of species. With
MARGIN = 2
, it finds frequencies of species. If groups
is given, finds the total number of species in each group (see
example on finding one kind of beta diversity with this option).
Better stories can be told about Simpson's index than about
Shannon's index, and still grander narratives about
rarefaction (Hurlbert 1971). However, these indices are all very
closely related (Hill 1973), and there is no reason to despise one
more than others (but if you are a graduate student, don't drag me in,
but obey your Professor's orders). In particular, the exponent of the
Shannon index is linearly related to inverse Simpson (Hill 1973)
although the former may be more sensitive to rare species. Moreover,
inverse Simpson is asymptotically equal to rarefied species richness
in sample of two individuals, and Fisher's is very
similar to inverse Simpson.
A vector of diversity indices or numbers of species.
Jari Oksanen and Bob O'Hara (fisher.alpha
).
Fisher, R.A., Corbet, A.S. & Williams, C.B. (1943). The relation between the number of species and the number of individuals in a random sample of animal population. Journal of Animal Ecology 12, 42–58.
Hurlbert, S.H. (1971). The nonconcept of species diversity: a critique and alternative parameters. Ecology 52, 577–586.
Jost, L. (2007) Partitioning diversity into independent alpha and beta components. Ecology 88, 2427–2439.
These functions calculate only some basic indices, but many
others can be derived with them (see Examples). Facilities related to
diversity are discussed in a vegan vignette that can be read
with browseVignettes("vegan")
. Functions renyi
and tsallis
estimate a series of generalized diversity
indices. Function rarefy
finds estimated number of
species for given sample size. Beta diversity can be estimated with
betadiver
. Diversities can be partitioned with
adipart
and multipart
.
data(BCI, BCI.env) H <- diversity(BCI) simp <- diversity(BCI, "simpson") invsimp <- diversity(BCI, "inv") ## Unbiased Simpson unbias.simp <- simpson.unb(BCI) ## Fisher alpha alpha <- fisher.alpha(BCI) ## Plot all pairs(cbind(H, simp, invsimp, unbias.simp, alpha), pch="+", col="blue") ## Species richness (S) and Pielou's evenness (J): S <- specnumber(BCI) ## rowSums(BCI > 0) does the same... J <- H/log(S) ## beta diversity defined as gamma/alpha - 1: ## alpha is the average no. of species in a group, and gamma is the ## total number of species in the group (alpha <- with(BCI.env, tapply(specnumber(BCI), Habitat, mean))) (gamma <- with(BCI.env, specnumber(BCI, Habitat))) gamma/alpha - 1 ## similar calculations with Shannon diversity (alpha <- with(BCI.env, tapply(diversity(BCI), Habitat, mean))) # average (gamma <- with(BCI.env, diversity(BCI, groups=Habitat))) # pooled ## additive beta diversity based on Shannon index gamma-alpha
data(BCI, BCI.env) H <- diversity(BCI) simp <- diversity(BCI, "simpson") invsimp <- diversity(BCI, "inv") ## Unbiased Simpson unbias.simp <- simpson.unb(BCI) ## Fisher alpha alpha <- fisher.alpha(BCI) ## Plot all pairs(cbind(H, simp, invsimp, unbias.simp, alpha), pch="+", col="blue") ## Species richness (S) and Pielou's evenness (J): S <- specnumber(BCI) ## rowSums(BCI > 0) does the same... J <- H/log(S) ## beta diversity defined as gamma/alpha - 1: ## alpha is the average no. of species in a group, and gamma is the ## total number of species in the group (alpha <- with(BCI.env, tapply(specnumber(BCI), Habitat, mean))) (gamma <- with(BCI.env, specnumber(BCI, Habitat))) gamma/alpha - 1 ## similar calculations with Shannon diversity (alpha <- with(BCI.env, tapply(diversity(BCI), Habitat, mean))) # average (gamma <- with(BCI.env, diversity(BCI, groups=Habitat))) # pooled ## additive beta diversity based on Shannon index gamma-alpha
The dune meadow vegetation data, dune
, has cover class values
of 30 species on 20 sites. The corresponding environmental data frame
dune.env
has following entries:
data(dune) data(dune.env)
data(dune) data(dune.env)
dune
is a data frame of observations of 30 species at 20
sites. The species names are abbreviated to 4+4 letters (see
make.cepnames
). The following names are changed from
the original source (Jongman et al. 1987): Leontodon
autumnalis to Scorzoneroides, and Potentilla
palustris to Comarum.
dune.env
is a data frame of 20 observations on the following
5 variables:
a numeric vector of thickness of soil A1 horizon.
an ordered factor with levels: 1
< 2
<
4
< 5
.
a factor with levels: BF
(Biological
farming), HF
(Hobby farming), NM
(Nature
Conservation Management), and SF
(Standard Farming).
an ordered factor of land-use with levels: Hayfield
< Haypastu
< Pasture
.
an ordered factor with levels: 0
< 1
<
2
< 3
< 4
.
Jongman, R.H.G, ter Braak, C.J.F & van Tongeren, O.F.R. (1987). Data Analysis in Community and Landscape Ecology. Pudoc, Wageningen.
data(dune) data(dune.env)
data(dune) data(dune.env)
Classification table of the species in the dune
data
set.
data(dune.taxon) data(dune.phylodis)
data(dune.taxon) data(dune.phylodis)
dune.taxon
is data frame with 30 species (rows) classified into
five taxonomic levels (columns). dune.phylodis
is a
dist
object of estimated coalescence ages extracted from
doi:10.5061/dryad.63q27 (Zanne et al. 2014) using tools in packages
ape and phylobase.
The families and orders are based on APG IV (2016) in vascular plants and on Hill et al. (2006) in mosses. The higher levels (superorder and subclass) are based on Chase & Reveal (2009). Chase & Reveal (2009) treat Angiosperms and mosses as subclasses of class Equisetopsida (land plants), but brylogists have traditionally used much more inflated levels which are adjusted here to match Angiosperm classification.
APG IV [Angiosperm Phylogeny Group] (2016) An update of the Angiosperm Phylogeny Group classification for the orders and families of flowering plants: APG IV. Bot. J. Linnean Soc. 181: 1–20.
Chase, M.W. & Reveal, J. L. (2009) A phylogenetic classification of the land plants to accompany APG III. Bot. J. Linnean Soc. 161: 122–127.
Hill, M.O et al. (2006) An annotated checklist of the mosses of Europe and Macaronesia. J. Bryology 28: 198–267.
Zanne A.E., Tank D.C., Cornwell, W.K., Eastman J.M., Smith, S.A., FitzJohn, R.G., McGlinn, D.J., O’Meara, B.C., Moles, A.T., Reich, P.B., Royer, D.L., Soltis, D.E., Stevens, P.F., Westoby, M., Wright, I.J., Aarssen, L., Bertin, R.I., Calaminus, A., Govaerts, R., Hemmings, F., Leishman, M.R., Oleksyn, J., Soltis, P.S., Swenson, N.G., Warman, L. & Beaulieu, J.M. (2014) Three keys to the radiation of angiosperms into freezing environments. Nature 506: 89–92.
Functions taxondive
, treedive
,
and treedist
use these data sets.
data(dune.taxon) data(dune.phylodis)
data(dune.taxon) data(dune.phylodis)
Function extracts eigenvalues from an object that has them. Many multivariate methods return such objects.
eigenvals(x, ...) ## S3 method for class 'cca' eigenvals(x, model = c("all", "unconstrained", "constrained"), constrained = NULL, ...) ## S3 method for class 'decorana' eigenvals(x, kind = c("additive", "axiswise", "decorana"), ...) ## S3 method for class 'eigenvals' summary(object, ...)
eigenvals(x, ...) ## S3 method for class 'cca' eigenvals(x, model = c("all", "unconstrained", "constrained"), constrained = NULL, ...) ## S3 method for class 'decorana' eigenvals(x, kind = c("additive", "axiswise", "decorana"), ...) ## S3 method for class 'eigenvals' summary(object, ...)
x |
An object from which to extract eigenvalues. |
object |
An |
model |
Which eigenvalues to return for objects that inherit from class
|
constrained |
Return only constrained eigenvalues. Deprecated as of vegan
2.5-0. Use |
kind |
Kind of eigenvalues returned for |
... |
Other arguments to the functions (usually ignored) |
This is a generic function that has methods for cca
,
wcmdscale
, pcnm
, prcomp
,
princomp
, dudi
(of ade4), and
pca
and pco
(of
labdsv) result objects. The default method also
extracts eigenvalues if the result looks like being from
eigen
or svd
. Functions
prcomp
and princomp
contain square roots
of eigenvalues that all called standard deviations, but
eigenvals
function returns their squares. Function
svd
contains singular values, but function
eigenvals
returns their squares. For constrained ordination
methods cca
, rda
and
capscale
the function returns the both constrained and
unconstrained eigenvalues concatenated in one vector, but the partial
component will be ignored. However, with argument
constrained = TRUE
only constrained eigenvalues are returned.
The summary
of eigenvals
result returns eigenvalues,
proportion explained and cumulative proportion explained. The result
object can have some negative eigenvalues (wcmdscale
,
dbrda
, pcnm
) which correspond to
imaginary axes of Euclidean mapping of non-Euclidean distances
(Gower 1985). In these case real axes (corresponding to positive
eigenvalues) will "explain" proportion >1 of total variation, and
negative eigenvalues bring the cumulative proportion to
1. capscale
will only find the positive eigenvalues
and only these are used in finding proportions. For
decorana
the importances and cumulative proportions
are only reported for kind = "additive"
, because other
alternatives do not add up to total inertia of the input data.
An object of class "eigenvals"
, which is a vector of
eigenvalues.
The summary
method returns an object of class
"summary.eigenvals"
, which is a matrix.
Jari Oksanen.
Gower, J. C. (1985). Properties of Euclidean and non-Euclidean distance matrices. Linear Algebra and its Applications 67, 81–97.
eigen
, svd
, prcomp
,
princomp
, cca
, rda
,
capscale
, wcmdscale
,
cca.object
.
data(varespec) data(varechem) mod <- cca(varespec ~ Al + P + K, varechem) ev <- eigenvals(mod) ev summary(ev) ## choose which eignevalues to return eigenvals(mod, model = "unconstrained")
data(varespec) data(varechem) mod <- cca(varespec ~ Al + P + K, varechem) ev <- eigenvals(mod) ev summary(ev) ## choose which eignevalues to return eigenvals(mod, model = "unconstrained")
The function fits environmental vectors or factors onto an
ordination. The projections of points onto vectors have maximum
correlation with corresponding environmental variables, and the
factors show the averages of factor levels. For continuous varaibles
this is equal to fitting a linear trend surface (plane in 2D) for a
variable (see ordisurf
); this trend surface can be
presented by showing its gradient (direction of steepest increase)
using an arrow. The environmental variables are the dependent
variables that are explained by the ordination scores, and each
dependent variable is analysed separately.
## Default S3 method: envfit(ord, env, permutations = 999, strata = NULL, choices=c(1,2), display = "sites", w = weights(ord, display), na.rm = FALSE, ...) ## S3 method for class 'formula' envfit(formula, data, ...) ## S3 method for class 'envfit' plot(x, choices = c(1,2), labels, arrow.mul, at = c(0,0), axis = FALSE, p.max = NULL, col = "blue", bg, add = TRUE, ...) ## S3 method for class 'envfit' scores(x, display, choices, arrow.mul=1, tidy = FALSE, ...) vectorfit(X, P, permutations = 0, strata = NULL, w, ...) factorfit(X, P, permutations = 0, strata = NULL, w, ...)
## Default S3 method: envfit(ord, env, permutations = 999, strata = NULL, choices=c(1,2), display = "sites", w = weights(ord, display), na.rm = FALSE, ...) ## S3 method for class 'formula' envfit(formula, data, ...) ## S3 method for class 'envfit' plot(x, choices = c(1,2), labels, arrow.mul, at = c(0,0), axis = FALSE, p.max = NULL, col = "blue", bg, add = TRUE, ...) ## S3 method for class 'envfit' scores(x, display, choices, arrow.mul=1, tidy = FALSE, ...) vectorfit(X, P, permutations = 0, strata = NULL, w, ...) factorfit(X, P, permutations = 0, strata = NULL, w, ...)
ord |
An ordination object or other structure from which the
ordination |
env |
Data frame, matrix or vector of environmental variables. The variables can be of mixed type (factors, continuous variables) in data frames. |
X |
Matrix or data frame of ordination scores. |
P |
Data frame, matrix or vector of environmental
variable(s). These must be continuous for |
permutations |
a list of control values for the permutations
as returned by the function |
formula , data
|
Model |
na.rm |
Remove points with missing values in ordination scores
or environmental variables. The operation is casewise: the whole
row of data is removed if there is a missing value and
|
x |
A result object from |
choices |
Axes to plotted. |
tidy |
Return scores that are compatible with ggplot2:
all scores are in a single |
labels |
Change plotting labels. The argument should be a list
with elements |
arrow.mul |
Multiplier for vector lengths. The arrows are
automatically scaled similarly as in |
at |
The origin of fitted arrows in the plot. If you plot arrows
in other places then origin, you probably have to specify
|
axis |
Plot axis showing the scaling of fitted arrows. |
p.max |
Maximum estimated |
col |
Colour in plotting. |
bg |
Background colour for labels. If |
add |
Results added to an existing ordination plot. |
strata |
An integer vector or factor specifying the strata for permutation. If supplied, observations are permuted only within the specified strata. |
display |
In fitting functions these are ordinary site scores or
linear combination scores
( |
w |
Weights used in fitting (concerns mainly |
... |
Parameters passed to |
Function envfit
finds vectors or factor averages of
environmental variables. Function plot.envfit
adds these in an
ordination diagram. If X
is a data.frame
,
envfit
uses factorfit
for factor
variables and
vectorfit
for other variables. If X
is a matrix or a
vector, envfit
uses only vectorfit
. Alternatively, the
model can be defined a simplified model formula
, where
the left hand side must be an ordination result object or a matrix of
ordination scores, and right hand
side lists the environmental variables. The formula interface can be
used for easier selection and/or transformation of environmental
variables. Only the main effects will be analysed even if interaction
terms were defined in the formula.
The ordination results are extracted with scores
and
all extra arguments are passed to the scores
. The fitted
models only apply to the results defined when extracting the scores
when using envfit
. For instance, scaling
in
constrained ordination (see scores.rda
,
scores.cca
) must be set in the same way in
envfit
and in the plot
or the ordination results (see
Examples).
The printed output of continuous variables (vectors) gives the
direction cosines which are the coordinates of the heads of unit
length vectors. In plot
these are scaled by their
correlation (square root of the column r2
) so that
“weak” predictors have shorter arrows than “strong”
predictors. You can see the scaled relative lengths using command
scores
. The plot
ted (and scaled) arrows are further
adjusted to the current graph using a constant multiplier: this will
keep the relative r2
-scaled lengths of the arrows but tries
to fill the current plot. You can see the multiplier using
ordiArrowMul(result_of_envfit)
, and set it with the
argument arrow.mul
.
Functions vectorfit
and factorfit
can be called directly.
Function vectorfit
finds directions in the ordination space
towards which the environmental vectors change most rapidly and to
which they have maximal correlations with the ordination
configuration. Function factorfit
finds averages of ordination
scores for factor levels. Function factorfit
treats ordered
and unordered factors similarly.
If permutations
, the significance of fitted vectors
or factors is assessed using permutation of environmental variables.
The goodness of fit statistic is squared correlation coefficient
(
).
For factors this is defined as
, where
and
are within-group and total sums of
squares. See
permutations
for additional details on
permutation tests in Vegan.
User can supply a vector of prior weights w
. If the ordination
object has weights, these will be used. In practise this means that
the row totals are used as weights with cca
or
decorana
results. If you do not like this, but want to
give equal weights to all sites, you should set w = NULL
. The
fitted vectors are similar to biplot arrows in constrained ordination
only when fitted to LC scores (display = "lc"
) and you set
scaling = "species"
(see scores.cca
). The
weighted fitting gives similar results to biplot arrows and class
centroids in cca
.
The lengths of arrows for fitted vectors are automatically adjusted
for the physical size of the plot, and the arrow lengths cannot be
compared across plots. For similar scaling of arrows, you must
explicitly set the arrow.mul
argument in the plot
command; see ordiArrowMul
and
ordiArrowTextXY
.
The results can be accessed with scores.envfit
function which
returns either the fitted vectors scaled by correlation coefficient or
the centroids of the fitted environmental variables, or a named list
of both.
Functions vectorfit
and factorfit
return lists of
classes vectorfit
and factorfit
which have a
print
method. The result object have the following items:
arrows |
Arrow endpoints from |
centroids |
Class centroids from |
r |
Goodness of fit statistic: Squared correlation coefficient |
permutations |
Number of permutations. |
control |
A list of control values for the permutations
as returned by the function |
pvals |
Empirical P-values for each variable. |
Function envfit
returns a list of class envfit
with
results of vectorfit
and envfit
as items.
Function plot.envfit
scales the vectors by correlation.
Fitted vectors have become the method of choice in displaying
environmental variables in ordination. Indeed, they are the optimal
way of presenting environmental variables in Constrained
Correspondence Analysis cca
, since there they are the
linear constraints.
In unconstrained ordination the relation between external variables
and ordination configuration may be less linear, and therefore other
methods than arrows may be more useful. The simplest is to adjust the
plotting symbol sizes (cex
, symbols
) by
environmental variables.
Fancier methods involve smoothing and regression methods that
abound in R, and ordisurf
provides a wrapper for some.
Jari Oksanen. The permutation test derives from the code suggested by Michael Scroggie.
A better alternative to vectors may be ordisurf
.
data(varespec, varechem) library(MASS) ord <- metaMDS(varespec) (fit <- envfit(ord, varechem, perm = 999)) scores(fit, "vectors") plot(ord) plot(fit) plot(fit, p.max = 0.05, col = "red") ## Adding fitted arrows to CCA. We use "lc" scores, and hope ## that arrows are scaled similarly in cca and envfit plots ord <- cca(varespec ~ Al + P + K, varechem) plot(ord, type="p") fit <- envfit(ord, varechem, perm = 999, display = "lc") plot(fit, p.max = 0.05, col = "red") ## 'scaling' must be set similarly in envfit and in ordination plot plot(ord, type = "p", scaling = "sites") fit <- envfit(ord, varechem, perm = 0, display = "lc", scaling = "sites") plot(fit, col = "red") ## Class variables, formula interface, and displaying the ## inter-class variability with ordispider, and semitransparent ## white background for labels (semitransparent colours are not ## supported by all graphics devices) data(dune) data(dune.env) ord <- cca(dune) fit <- envfit(ord ~ Moisture + A1, dune.env, perm = 0) plot(ord, type = "n") with(dune.env, ordispider(ord, Moisture, col="skyblue")) with(dune.env, points(ord, display = "sites", col = as.numeric(Moisture), pch=16)) plot(fit, cex=1.2, axis=TRUE, bg = rgb(1, 1, 1, 0.5)) ## Use shorter labels for factor centroids labels(fit) plot(ord) plot(fit, labels=list(factors = paste("M", c(1,2,4,5), sep = "")), bg = rgb(1,1,0,0.5))
data(varespec, varechem) library(MASS) ord <- metaMDS(varespec) (fit <- envfit(ord, varechem, perm = 999)) scores(fit, "vectors") plot(ord) plot(fit) plot(fit, p.max = 0.05, col = "red") ## Adding fitted arrows to CCA. We use "lc" scores, and hope ## that arrows are scaled similarly in cca and envfit plots ord <- cca(varespec ~ Al + P + K, varechem) plot(ord, type="p") fit <- envfit(ord, varechem, perm = 999, display = "lc") plot(fit, p.max = 0.05, col = "red") ## 'scaling' must be set similarly in envfit and in ordination plot plot(ord, type = "p", scaling = "sites") fit <- envfit(ord, varechem, perm = 0, display = "lc", scaling = "sites") plot(fit, col = "red") ## Class variables, formula interface, and displaying the ## inter-class variability with ordispider, and semitransparent ## white background for labels (semitransparent colours are not ## supported by all graphics devices) data(dune) data(dune.env) ord <- cca(dune) fit <- envfit(ord ~ Moisture + A1, dune.env, perm = 0) plot(ord, type = "n") with(dune.env, ordispider(ord, Moisture, col="skyblue")) with(dune.env, points(ord, display = "sites", col = as.numeric(Moisture), pch=16)) plot(fit, cex=1.2, axis=TRUE, bg = rgb(1, 1, 1, 0.5)) ## Use shorter labels for factor centroids labels(fit) plot(ord) plot(fit, labels=list(factors = paste("M", c(1,2,4,5), sep = "")), bg = rgb(1,1,0,0.5))
The function eventstar
finds the minimum () of the
evenness profile based on the Tsallis entropy. This scale factor
of the entropy represents a specific weighting of species
relative frequencies that leads to minimum evenness of the
community (Mendes et al. 2008).
eventstar(x, qmax = 5)
eventstar(x, qmax = 5)
x |
A community matrix or a numeric vector. |
qmax |
Maximum scale parameter of the Tsallis entropy to be used in
finding the minimum of Tsallis based evenness
in the range |
The function eventstar
finds a characteristic value of the scale
parameter of the Tsallis entropy corresponding to
minimum of the evenness (equitability) profile based on Tsallis entropy.
This value was proposed by Mendes et al. (2008) as
.
The index represents the scale parameter of
the one parameter Tsallis diversity family that leads to
the greatest deviation from the maximum equitability given the relative
abundance vector of a community.
The value of is found by identifying the minimum
of the evenness profile over scaling factor
by
one-dimensional minimization. Because evenness profile is
known to be a convex function, it is guaranteed that underlying
optimize
function will find a unique solution
if it is in the range c(0, qmax)
.
The scale parameter value is used to
find corresponding values of diversity (
),
evenness (
),
and numbers equivalent (
). For calculation
details, see
tsallis
and Examples below.
Mendes et al. (2008) advocated the use of
and corresponding diversity, evenness, and Hill numbers, because
it is a unique value representing the diversity profile, and is
is positively associated with rare species in the community,
thus it is a potentially useful indicator of certain
relative abundance distributions of the communities.
A data frame with columns:
qstar |
scale parameter value |
Estar |
Value of evenness based on normalized Tsallis
entropy at |
Hstar |
Value of Tsallis entropy at |
Dstar |
Value of Tsallis entropy at |
See tsallis
for calculation details.
Values for found by Mendes et al. (2008) ranged
from 0.56 and 1.12 presenting low variability, so an
interval between 0 and 5 should safely encompass
the possibly expected
values in practice,
but profiling the evenness and changing the value of
the
qmax
argument is advised if output values
near the range limits are found.
Eduardo Ribeiro Cunha [email protected] and Heloisa Beatriz Antoniazi Evangelista [email protected], with technical input of Péter Sólymos.
Mendes, R.S., Evangelista, L.R., Thomaz, S.M., Agostinho, A.A. and Gomes, L.C. (2008) A unified index to measure ecological diversity and species rarity. Ecography 31, 450–456.
Jost, L. (2007) Partitioning diversity into independent alpha and beta components. Ecology 88, 2427–2439.
Tsallis, C. (1988) Possible generalization of Boltzmann-Gibbs statistics. J. Stat. Phis. 52, 479–487.
Tsallis entropy: tsallis
data(BCI) (x <- eventstar(BCI[1:5,])) ## profiling y <- as.numeric(BCI[10,]) (z <- eventstar(y)) q <- seq(0, 2, 0.05) Eprof <- tsallis(y, scales=q, norm=TRUE) Hprof <- tsallis(y, scales=q) Dprof <- tsallis(y, scales=q, hill=TRUE) opar <- par(mfrow=c(3,1)) plot(q, Eprof, type="l", main="Evenness") abline(v=z$qstar, h=tsallis(y, scales=z$qstar, norm=TRUE), col=2) plot(q, Hprof, type="l", main="Diversity") abline(v=z$qstar, h=tsallis(y, scales=z$qstar), col=2) plot(q, Dprof, type="l", main="Effective number of species") abline(v=z$qstar, h=tsallis(y, scales=z$qstar, hill=TRUE), col=2) par(opar)
data(BCI) (x <- eventstar(BCI[1:5,])) ## profiling y <- as.numeric(BCI[10,]) (z <- eventstar(y)) q <- seq(0, 2, 0.05) Eprof <- tsallis(y, scales=q, norm=TRUE) Hprof <- tsallis(y, scales=q) Dprof <- tsallis(y, scales=q, hill=TRUE) opar <- par(mfrow=c(3,1)) plot(q, Eprof, type="l", main="Evenness") abline(v=z$qstar, h=tsallis(y, scales=z$qstar, norm=TRUE), col=2) plot(q, Hprof, type="l", main="Diversity") abline(v=z$qstar, h=tsallis(y, scales=z$qstar), col=2) plot(q, Dprof, type="l", main="Effective number of species") abline(v=z$qstar, h=tsallis(y, scales=z$qstar, hill=TRUE), col=2) par(opar)
Function fisherfit
fits Fisher's logseries to abundance
data. Function prestonfit
groups species frequencies into
doubling octave classes and fits Preston's lognormal model, and
function prestondistr
fits the truncated lognormal model
without pooling the data into octaves.
fisherfit(x, ...) prestonfit(x, tiesplit = TRUE, ...) prestondistr(x, truncate = -1, ...) ## S3 method for class 'prestonfit' plot(x, xlab = "Frequency", ylab = "Species", bar.col = "skyblue", line.col = "red", lwd = 2, ...) ## S3 method for class 'prestonfit' lines(x, line.col = "red", lwd = 2, ...) veiledspec(x, ...) as.fisher(x, ...) ## S3 method for class 'fisher' plot(x, xlab = "Frequency", ylab = "Species", bar.col = "skyblue", kind = c("bar", "hiplot", "points", "lines"), add = FALSE, ...) as.preston(x, tiesplit = TRUE, ...) ## S3 method for class 'preston' plot(x, xlab = "Frequency", ylab = "Species", bar.col = "skyblue", ...) ## S3 method for class 'preston' lines(x, xadjust = 0.5, ...)
fisherfit(x, ...) prestonfit(x, tiesplit = TRUE, ...) prestondistr(x, truncate = -1, ...) ## S3 method for class 'prestonfit' plot(x, xlab = "Frequency", ylab = "Species", bar.col = "skyblue", line.col = "red", lwd = 2, ...) ## S3 method for class 'prestonfit' lines(x, line.col = "red", lwd = 2, ...) veiledspec(x, ...) as.fisher(x, ...) ## S3 method for class 'fisher' plot(x, xlab = "Frequency", ylab = "Species", bar.col = "skyblue", kind = c("bar", "hiplot", "points", "lines"), add = FALSE, ...) as.preston(x, tiesplit = TRUE, ...) ## S3 method for class 'preston' plot(x, xlab = "Frequency", ylab = "Species", bar.col = "skyblue", ...) ## S3 method for class 'preston' lines(x, xadjust = 0.5, ...)
x |
Community data vector for fitting functions or their result
object for |
tiesplit |
Split frequencies |
truncate |
Truncation point for log-Normal model, in log2
units. Default value |
xlab , ylab
|
Labels for |
bar.col |
Colour of data bars. |
line.col |
Colour of fitted line. |
lwd |
Width of fitted line. |
kind |
Kind of plot to drawn: |
add |
Add to an existing plot. |
xadjust |
Adjustment of horizontal positions in octaves. |
... |
Other parameters passed to functions. Ignored in
|
In Fisher's logarithmic series the expected number of species
with
observed individuals is
(Fisher et al. 1943). The estimation is possible only for
genuine counts of individuals. The parameter
is used as
a diversity index which can be estimated with a separate function
fisher.alpha
. The parameter is taken as a
nuisance parameter which is not estimated separately but taken to be
. Helper function
as.fisher
transforms
abundance data into Fisher frequency table. Diversity will be given
as NA
for communities with one (or zero) species: there is no
reliable way of estimating their diversity, even if the equations
will return a bogus numeric value in some cases.
Preston (1948) was not satisfied with Fisher's model which seemed to
imply infinite species richness, and postulated that rare species is
a diminishing class and most species are in the middle of frequency
scale. This was achieved by collapsing higher frequency classes into
wider and wider “octaves” of doubling class limits: 1, 2, 3–4,
5–8, 9–16 etc. occurrences. It seems that Preston regarded
frequencies 1, 2, 4, etc.. as “tied” between octaves
(Williamson & Gaston 2005). This means that only half of the species
with frequency 1 are shown in the lowest octave, and the rest are
transferred to the second octave. Half of the species from the
second octave are transferred to the higher one as well, but this is
usually not as large a number of species. This practise makes data
look more lognormal by reducing the usually high lowest
octaves. This can be achieved by setting argument tiesplit = TRUE
.
With tiesplit = FALSE
the frequencies are not split,
but all ones are in the lowest octave, all twos in the second, etc.
Williamson & Gaston (2005) discuss alternative definitions in
detail, and they should be consulted for a critical review of
log-Normal model.
Any logseries data will look like lognormal when plotted in
Preston's way. The expected frequency at abundance octave
is defined by
, where
is the location of the mode and
the width,
both in
scale, and
is the expected
number of species at mode. The lognormal model is usually truncated
on the left so that some rare species are not observed. Function
prestonfit
fits the truncated lognormal model as a second
degree log-polynomial to the octave pooled data using Poisson (when
tiesplit = FALSE
) or quasi-Poisson (when tiesplit = TRUE
)
error. Function prestondistr
fits left-truncated
Normal distribution to transformed non-pooled
observations with direct maximization of log-likelihood. Function
prestondistr
is modelled after function
fitdistr
which can be used for alternative
distribution models.
The functions have common print
, plot
and lines
methods. The lines
function adds the fitted curve to the
octave range with line segments showing the location of the mode and
the width (sd) of the response. Function as.preston
transforms abundance data to octaves. Argument tiesplit
will
not influence the fit in prestondistr
, but it will influence
the barplot of the octaves.
The total extrapolated richness from a fitted Preston model can be
found with function veiledspec
. The function accepts results
both from prestonfit
and from prestondistr
. If
veiledspec
is called with a species count vector, it will
internally use prestonfit
. Function specpool
provides alternative ways of estimating the number of unseen
species. In fact, Preston's lognormal model seems to be truncated at
both ends, and this may be the main reason why its result differ
from lognormal models fitted in Rank–Abundance diagrams with
functions rad.lognormal
.
The function prestonfit
returns an object with fitted
coefficients
, and with observed (freq
) and fitted
(fitted
) frequencies, and a string describing the fitting
method
. Function prestondistr
omits the entry
fitted
. The function fisherfit
returns the result of
nlm
, where item estimate
is . The
result object is amended with the
nuisance
parameter and item
fisher
for the observed data from as.fisher
Bob O'Hara and Jari Oksanen.
Fisher, R.A., Corbet, A.S. & Williams, C.B. (1943). The relation between the number of species and the number of individuals in a random sample of animal population. Journal of Animal Ecology 12: 42–58.
Preston, F.W. (1948) The commonness and rarity of species. Ecology 29, 254–283.
Williamson, M. & Gaston, K.J. (2005). The lognormal distribution is not an appropriate null hypothesis for the species–abundance distribution. Journal of Animal Ecology 74, 409–422.
diversity
, fisher.alpha
,
radfit
, specpool
. Function
fitdistr
of MASS package was used as the
model for prestondistr
. Function density
can be used for
smoothed non-parametric estimation of responses, and
qqplot
is an alternative, traditional and more effective
way of studying concordance of observed abundances to any distribution model.
data(BCI) mod <- fisherfit(BCI[5,]) mod # prestonfit seems to need large samples mod.oct <- prestonfit(colSums(BCI)) mod.ll <- prestondistr(colSums(BCI)) mod.oct mod.ll plot(mod.oct) lines(mod.ll, line.col="blue3") # Different ## Smoothed density den <- density(log2(colSums(BCI))) lines(den$x, ncol(BCI)*den$y, lwd=2) # Fairly similar to mod.oct ## Extrapolated richness veiledspec(mod.oct) veiledspec(mod.ll)
data(BCI) mod <- fisherfit(BCI[5,]) mod # prestonfit seems to need large samples mod.oct <- prestonfit(colSums(BCI)) mod.ll <- prestondistr(colSums(BCI)) mod.oct mod.ll plot(mod.oct) lines(mod.ll, line.col="blue3") # Different ## Smoothed density den <- density(log2(colSums(BCI))) lines(den$x, ncol(BCI)*den$y, lwd=2) # Fairly similar to mod.oct ## Extrapolated richness veiledspec(mod.oct) veiledspec(mod.ll)
Functions goodness
and inertcomp
can
be used to assess the goodness of fit for individual sites or
species. Function vif.cca
and alias.cca
can be used to
analyse linear dependencies among constraints and conditions. In
addition, there are some other diagnostic tools (see 'Details').
## S3 method for class 'cca' goodness(object, choices, display = c("species", "sites"), model = c("CCA", "CA"), summarize = FALSE, addprevious = FALSE, ...) inertcomp(object, display = c("species", "sites"), unity = FALSE, proportional = FALSE) spenvcor(object) intersetcor(object) vif.cca(object) ## S3 method for class 'cca' alias(object, names.only = FALSE, ...)
## S3 method for class 'cca' goodness(object, choices, display = c("species", "sites"), model = c("CCA", "CA"), summarize = FALSE, addprevious = FALSE, ...) inertcomp(object, display = c("species", "sites"), unity = FALSE, proportional = FALSE) spenvcor(object) intersetcor(object) vif.cca(object) ## S3 method for class 'cca' alias(object, names.only = FALSE, ...)
object |
|
display |
Display |
choices |
Axes shown. Default is to show all axes of the
|
model |
Show constrained ( |
summarize |
Show only the accumulated total. |
addprevious |
Add the variation explained by previous components
when |
unity |
Scale inertia components to unit sum (sum of all items is 1). |
proportional |
Give the inertia components as proportional for
the corresponding total of the item (sum of each row is 1). This
option takes precedence over |
names.only |
Return only names of aliased variable(s) instead of defining equations. |
... |
Other parameters to the functions. |
Function goodness
gives cumulative proportion of inertia
accounted by species up to chosen axes. The proportions can be
assessed either by species or by sites depending on the argument
display
, but species are not available in distance-based
dbrda
. The function is not implemented for
capscale
.
Function inertcomp
decomposes the inertia into partial,
constrained and unconstrained components for each site or species.
Legendre & De Cáceres (2012) called these inertia
components as local contributions to beta-diversity (LCBD) and
species contributions to beta-diversity (SCBD), and they give these
as relative contributions summing up to unity (argument
unity = TRUE
). For this interpretation, appropriate dissimilarity
measures should be used in dbrda
or appropriate
standardization in rda
(Legendre & De
Cáceres 2012). The function is not implemented for
capscale
.
Function spenvcor
finds the so-called “species –
environment correlation” or (weighted) correlation of
weighted average scores and linear combination scores. This is a bad
measure of goodness of ordination, because it is sensitive to extreme
scores (like correlations are), and very sensitive to overfitting or
using too many constraints. Better models often have poorer
correlations. Function ordispider
can show the same
graphically.
Function intersetcor
finds the so-called “interset
correlation” or (weighted) correlation of weighted averages scores
and constraints. The defined contrasts are used for factor
variables. This is a bad measure since it is a correlation. Further,
it focuses on correlations between single contrasts and single axes
instead of looking at the multivariate relationship. Fitted vectors
(envfit
) provide a better alternative. Biplot scores
(see scores.cca
) are a multivariate alternative for
(weighted) correlation between linear combination scores and
constraints.
Function vif.cca
gives the variance inflation factors for each
constraint or contrast in factor constraints. In partial ordination,
conditioning variables are analysed together with constraints. Variance
inflation is a diagnostic tool to identify useless constraints. A
common rule is that values over 10 indicate redundant
constraints. If later constraints are complete linear combinations of
conditions or previous constraints, they will be completely removed
from the estimation, and no biplot scores or centroids are calculated
for these aliased constraints. A note will be printed with default
output if there are aliased constraints. Function alias
will
give the linear coefficients defining the aliased constraints, or
only their names with argument names.only = TRUE
.
The functions return matrices or vectors as is appropriate.
Jari Oksanen. The vif.cca
relies heavily on the code by
W. N. Venables. alias.cca
is a simplified version of
alias.lm
.
Greenacre, M. J. (1984). Theory and applications of correspondence analysis. Academic Press, London.
Gross, J. (2003). Variance inflation factors. R News 3(1), 13–15.
Legendre, P. & De Cáceres, M. (2012). Beta diversity as the variance of community data: dissimilarity coefficients and partitioning. Ecology Letters 16, 951–963. doi:10.1111/ele.12141
data(dune) data(dune.env) mod <- cca(dune ~ A1 + Management + Condition(Moisture), data=dune.env) goodness(mod, addprevious = TRUE) goodness(mod, addprevious = TRUE, summ = TRUE) # Inertia components inertcomp(mod, prop = TRUE) inertcomp(mod) # vif.cca vif.cca(mod) # Aliased constraints mod <- cca(dune ~ ., dune.env) mod vif.cca(mod) alias(mod) with(dune.env, table(Management, Manure)) # The standard correlations (not recommended) ## IGNORE_RDIFF_BEGIN spenvcor(mod) intersetcor(mod) ## IGNORE_RDIFF_END
data(dune) data(dune.env) mod <- cca(dune ~ A1 + Management + Condition(Moisture), data=dune.env) goodness(mod, addprevious = TRUE) goodness(mod, addprevious = TRUE, summ = TRUE) # Inertia components inertcomp(mod, prop = TRUE) inertcomp(mod) # vif.cca vif.cca(mod) # Aliased constraints mod <- cca(dune ~ ., dune.env) mod vif.cca(mod) alias(mod) with(dune.env, table(Management, Manure)) # The standard correlations (not recommended) ## IGNORE_RDIFF_BEGIN spenvcor(mod) intersetcor(mod) ## IGNORE_RDIFF_END
Function goodness.metaMDS
find goodness of fit measure for
points in nonmetric multidimensional scaling, and function
stressplot
makes a Shepard
diagram.
## S3 method for class 'metaMDS' goodness(object, dis, ...) ## Default S3 method: stressplot(object, dis, pch, p.col = "blue", l.col = "red", lwd = 2, ...)
## S3 method for class 'metaMDS' goodness(object, dis, ...) ## Default S3 method: stressplot(object, dis, pch, p.col = "blue", l.col = "red", lwd = 2, ...)
object |
|
dis |
Dissimilarities. This should not be used with
|
pch |
Plotting character for points. Default is dependent on the number of points. |
p.col , l.col
|
Point and line colours. |
lwd |
Line width. For |
... |
Other parameters to functions, e.g. graphical parameters. |
Function goodness.metaMDS
finds a goodness of fit statistic
for observations (points). This is defined so that sum of squared
values is equal to squared stress. Large values indicate poor fit.
The absolute values of the goodness statistic depend on the
definition of the stress: isoMDS
expresses
stress in percents, and therefore its goodness values are 100 times
higher than those of monoMDS
which expresses the
stress as a proportion.
Function stressplot
draws a Shepard diagram which is a plot
of ordination distances and monotone or linear fit line against
original dissimilarities. In addition, it displays two
correlation-like statistics on the goodness of fit in the graph.
The nonmetric fit is based on stress and defined as
. The “linear fit” is the squared
correlation between fitted values and ordination distances. For
monoMDS
, the “linear fit” and
from “stress type 2” are equal.
Both functions can be used with metaMDS
,
monoMDS
and isoMDS
. The original
dissimilarities should not be given for monoMDS
or
metaMDS
results (the latter tries to reconstruct the
dissimilarities using metaMDSredist
if
isoMDS
was used as its engine). With
isoMDS
the dissimilarities must be given. In
either case, the functions inspect that dissimilarities are
consistent with current ordination, and refuse to analyse
inconsistent dissimilarities. Function goodness.metaMDS
is
generic in vegan, but you must spell its name completely with
isoMDS
which has no class.
Function goodness
returns a vector of values. Function
stressplot
returns invisibly an object with items for
original dissimilarities, ordination distances and fitted values.
Jari Oksanen.
metaMDS
, monoMDS
,
isoMDS
, Shepard
. Similar
diagrams for eigenvector ordinations can be drawn with
stressplot.wcmdscale
, stressplot.cca
.
data(varespec) mod <- metaMDS(varespec) stressplot(mod) gof <- goodness(mod) gof plot(mod, display = "sites", type = "n") points(mod, display = "sites", cex = 2*gof/mean(gof))
data(varespec) mod <- metaMDS(varespec) stressplot(mod) gof <- goodness(mod) gof plot(mod, display = "sites", type = "n") points(mod, display = "sites", cex = 2*gof/mean(gof))
Indicator power calculation of Halme et al. (2009) or the congruence between indicator and target species.
indpower(x, type = 0)
indpower(x, type = 0)
x |
Community data frame or matrix. |
type |
The type of statistic to be returned. See Details for explanation. |
Halme et al. (2009) described an index of indicator power defined as
, where
and
.
is the number of sites,
is the number of shared occurrences of the indicator (
)
and the target (
) species.
and
are number
of occurrences of the indicator and target species. The
type
argument in the function call enables to choose which statistic to
return. type = 0
returns ,
type = 1
returns
,
type = 2
returns .
Total indicator power (TIP) of an indicator species is the column mean
(without its own value, see examples).
Halme et al. (2009) explain how to calculate confidence
intervals for these statistics, see Examples.
A matrix with indicator species as rows and target species as columns (this is indicated by the first letters of the row/column names).
Peter Solymos
Halme, P., Mönkkönen, M., Kotiaho, J. S, Ylisirniö, A-L. 2009. Quantifying the indicator power of an indicator species. Conservation Biology 23: 1008–1016.
data(dune) ## IP values ip <- indpower(dune) ## and TIP values diag(ip) <- NA (TIP <- rowMeans(ip, na.rm=TRUE)) ## p value calculation for a species ## from Halme et al. 2009 ## i is ID for the species i <- 1 fun <- function(x, i) indpower(x)[i,-i] ## 'c0' randomizes species occurrences os <- oecosimu(dune, fun, "c0", i=i, nsimul=99) ## get z values from oecosimu output z <- os$oecosimu$z ## p-value (p <- sum(z) / sqrt(length(z))) ## 'heterogeneity' measure (chi2 <- sum((z - mean(z))^2)) pchisq(chi2, df=length(z)-1) ## Halme et al.'s suggested output out <- c(TIP=TIP[i], significance=p, heterogeneity=chi2, minIP=min(fun(dune, i=i)), varIP=sd(fun(dune, i=i)^2)) out
data(dune) ## IP values ip <- indpower(dune) ## and TIP values diag(ip) <- NA (TIP <- rowMeans(ip, na.rm=TRUE)) ## p value calculation for a species ## from Halme et al. 2009 ## i is ID for the species i <- 1 fun <- function(x, i) indpower(x)[i,-i] ## 'c0' randomizes species occurrences os <- oecosimu(dune, fun, "c0", i=i, nsimul=99) ## get z values from oecosimu output z <- os$oecosimu$z ## p-value (p <- sum(z) / sqrt(length(z))) ## 'heterogeneity' measure (chi2 <- sum((z - mean(z))^2)) pchisq(chi2, df=length(z)-1) ## Halme et al.'s suggested output out <- c(TIP=TIP[i], significance=p, heterogeneity=chi2, minIP=min(fun(dune, i=i)), varIP=sd(fun(dune, i=i)^2)) out
This set of function extracts influence statistics and some other
linear model statistics directly from a constrained ordination result
object from cca
, rda
,
capscale
or dbrda
. The constraints are
linear model functions and these support functions return identical
results as the corresponding linear models (lm
), and you
can use their documentation. The main functions for normal usage are
leverage values (hatvalues
), standardized residuals
(rstandard
), studentized or leave-one-out residuals
(rstudent
), and Cook's distance
(cooks.distance
). In addition, vcov
returns the variance-covariance matrix of coefficients, and its
diagonal values the variances of coefficients. Other functions are
mainly support functions for these, but they can be used directly.
## S3 method for class 'cca' hatvalues(model, ...) ## S3 method for class 'cca' rstandard(model, type = c("response", "canoco"), ...) ## S3 method for class 'cca' rstudent(model, type = c("response", "canoco"), ...) ## S3 method for class 'cca' cooks.distance(model, type = c("response", "canoco"), ...) ## S3 method for class 'cca' sigma(object, type = c("response", "canoco"), ...) ## S3 method for class 'cca' vcov(object, type = "canoco", ...) ## S3 method for class 'cca' SSD(object, type = "canoco", ...) ## S3 method for class 'cca' qr(x, ...) ## S3 method for class 'cca' df.residual(object, ...)
## S3 method for class 'cca' hatvalues(model, ...) ## S3 method for class 'cca' rstandard(model, type = c("response", "canoco"), ...) ## S3 method for class 'cca' rstudent(model, type = c("response", "canoco"), ...) ## S3 method for class 'cca' cooks.distance(model, type = c("response", "canoco"), ...) ## S3 method for class 'cca' sigma(object, type = c("response", "canoco"), ...) ## S3 method for class 'cca' vcov(object, type = "canoco", ...) ## S3 method for class 'cca' SSD(object, type = "canoco", ...) ## S3 method for class 'cca' qr(x, ...) ## S3 method for class 'cca' df.residual(object, ...)
model , object , x
|
A constrained ordination result object. |
type |
Type of statistics used for extracting raw residuals and
residual standard deviation ( |
... |
Other arguments to functions (ignored). |
The vegan algorithm for constrained ordination uses linear model
(or weighted linear model in cca
) to find the fitted
values of dependent community data, and constrained ordination is
based on this fitted response (Legendre & Legendre 2012). The
hatvalues
give the leverage values of these constraints,
and the leverage is independent on the response data. Other influence
statistics (rstandard
, rstudent
,
cooks.distance
) are based on leverage, and on the raw
residuals and residual standard deviation (sigma
). With
type = "response"
the raw residuals are given by the
unconstrained component of the constrained ordination, and influence
statistics are a matrix with dimensions no. of observations times
no. of species. For cca
the statistics are the same as
obtained from the lm
model using Chi-square standardized
species data (see decostand
) as dependent variable, and
row sums of community data as weights, and for rda
the
lm
model uses non-modified community data and no
weights.
The algorithm in the CANOCO software constraints the results during
iteration by performing a linear regression of weighted averages (WA)
scores on constraints and taking the fitted values of this regression
as linear combination (LC) scores (ter Braak 1984). The WA scores are
directly found from species scores, but LC scores are linear
combinations of constraints in the regression. With type =
"canoco"
the raw residuals are the differences of WA and LC scores,
and the residual standard deviation (sigma
) is taken to
be the axis sum of squared WA scores minus one. These quantities have
no relationship to residual component of ordination, but they rather
are methodological artefacts of an algorithm that is not used in
vegan. The result is a matrix with dimensions no. of
observations times no. of constrained axes.
Function vcov
returns the matrix of variances and
covariances of regression coefficients. The diagonal values of this
matrix are the variances, and their square roots give the standard
errors of regression coefficients. The function is based on
SSD
that extracts the sum of squares and crossproducts
of residuals. The residuals are defined similarly as in influence
measures and with each type
they have similar properties and
limitations, and define the dimensions of the result matrix.
Function as.mlm
casts an ordination object to a multiple
linear model of class "mlm"
(see lm
), and similar
statistics can be derived from that modified object as with this set
of functions. However, there are some problems in the R
implementation of the further analysis of multiple linear model
objects. When the results differ, the current set of functions is more
probable to be correct. The use of as.mlm
objects should be
avoided.
Jari Oksanen
Legendre, P. and Legendre, L. (2012) Numerical Ecology. 3rd English ed. Elsevier.
ter Braak, C.J.F. (1984–): CANOCO – a FORTRAN program for canonical community ordination by [partial] [detrended] [canonical] correspondence analysis, principal components analysis and redundancy analysis. TNO Inst. of Applied Computer Sci., Stat. Dept. Wageningen, The Netherlands.
Corresponding lm
methods and
as.mlm.cca
. Function ordiresids
provides
lattice graphics for residuals.
data(varespec, varechem) mod <- cca(varespec ~ Al + P + K, varechem) ## leverage hatvalues(mod) plot(hatvalues(mod), type = "h") ## ordination plot with leverages: points with high leverage have ## similar LC and WA scores plot(mod, type = "n") ordispider(mod) # segment from LC to WA scores points(mod, dis="si", cex=5*hatvalues(mod), pch=21, bg=2) # WA scores text(mod, dis="bp", col=4) ## deviation and influence head(rstandard(mod)) head(cooks.distance(mod)) ## Influence measures from lm y <- decostand(varespec, "chi.square") # needed in cca y1 <- with(y, Cladstel) # take one species for lm lmod1 <- lm(y1 ~ Al + P + K, varechem, weights = rowSums(varespec)) ## numerically identical within 2e-15 all(abs(cooks.distance(lmod1) - cooks.distance(mod)[, "Cladstel"]) < 1e-8) ## t-values of regression coefficients based on type = "canoco" ## residuals coef(mod) coef(mod)/sqrt(diag(vcov(mod, type = "canoco")))
data(varespec, varechem) mod <- cca(varespec ~ Al + P + K, varechem) ## leverage hatvalues(mod) plot(hatvalues(mod), type = "h") ## ordination plot with leverages: points with high leverage have ## similar LC and WA scores plot(mod, type = "n") ordispider(mod) # segment from LC to WA scores points(mod, dis="si", cex=5*hatvalues(mod), pch=21, bg=2) # WA scores text(mod, dis="bp", col=4) ## deviation and influence head(rstandard(mod)) head(cooks.distance(mod)) ## Influence measures from lm y <- decostand(varespec, "chi.square") # needed in cca y1 <- with(y, Cladstel) # take one species for lm lmod1 <- lm(y1 ~ Al + P + K, varechem, weights = rowSums(varespec)) ## numerically identical within 2e-15 all(abs(cooks.distance(lmod1) - cooks.distance(mod)[, "Cladstel"]) < 1e-8) ## t-values of regression coefficients based on type = "canoco" ## residuals coef(mod) coef(mod)/sqrt(diag(vcov(mod, type = "canoco")))
The function performs isometric feature mapping which consists of three simple steps: (1) retain only some of the shortest dissimilarities among objects, (2) estimate all dissimilarities as shortest path distances, and (3) perform metric scaling (Tenenbaum et al. 2000).
isomap(dist, ndim=10, ...) isomapdist(dist, epsilon, k, path = "shortest", fragmentedOK =FALSE, ...) ## S3 method for class 'isomap' summary(object, ...) ## S3 method for class 'isomap' plot(x, net = TRUE, n.col = "gray", type = "points", ...)
isomap(dist, ndim=10, ...) isomapdist(dist, epsilon, k, path = "shortest", fragmentedOK =FALSE, ...) ## S3 method for class 'isomap' summary(object, ...) ## S3 method for class 'isomap' plot(x, net = TRUE, n.col = "gray", type = "points", ...)
dist |
Dissimilarities. |
ndim |
Number of axes in metric scaling (argument |
epsilon |
Shortest dissimilarity retained. |
k |
Number of shortest dissimilarities retained for a point. If
both |
path |
Method used in |
fragmentedOK |
What to do if dissimilarity matrix is
fragmented. If |
x , object
|
An |
net |
Draw the net of retained dissimilarities. |
n.col |
Colour of drawn net segments. This can also be a vector that is recycled for points, and the colour of the net segment is a mixture of joined points. |
type |
Plot observations either as |
... |
Other parameters passed to functions. |
The function isomap
first calls function isomapdist
for
dissimilarity transformation, and then performs metric scaling for the
result. All arguments to isomap
are passed to
isomapdist
. The functions are separate so that the
isompadist
transformation could be easily used with other
functions than simple linear mapping of cmdscale
.
Function isomapdist
retains either dissimilarities equal or shorter to
epsilon
, or if epsilon
is not given, at least k
shortest dissimilarities for a point. Then a complete dissimilarity
matrix is reconstructed using stepacross
using either
flexible shortest paths or extended dissimilarities (for details, see
stepacross
).
De'ath (1999) actually published essentially the same method before
Tenenbaum et al. (2000), and De'ath's function is available in function
xdiss
in non-CRAN package mvpart. The differences are that
isomap
introduced the k
criterion, whereas De'ath only
used epsilon
criterion. In practice, De'ath also retains
higher proportion of dissimilarities than typical isomap
.
The plot
function uses internally ordiplot
,
except that it adds text over net using ordilabel
. The
plot
function passes extra arguments to these functions. In
addition, vegan3d package has function
rgl.isomap
to make dynamic 3D plots that can
be rotated on the screen.
Function isomapdist
returns a dissimilarity object similar to
dist
. Function isomap
returns an object of class
isomap
with plot
and summary
methods. The
plot
function returns invisibly an object of class
ordiplot
. Function scores
can extract
the ordination scores.
Tenenbaum et al. (2000) justify isomap
as a tool of unfolding a
manifold (e.g. a 'Swiss Roll'). Even with a manifold structure, the
sampling must be even and dense so
that dissimilarities along a manifold are shorter than across the
folds. If data do not have such a manifold structure, the results are
very sensitive to parameter values.
Jari Oksanen
De'ath, G. (1999) Extended dissimilarity: a method of robust estimation of ecological distances from high beta diversity data. Plant Ecology 144, 191–199
Tenenbaum, J.B., de Silva, V. & Langford, J.C. (2000) A global network framework for nonlinear dimensionality reduction. Science 290, 2319–2323.
The underlying functions that do the proper work are
stepacross
, distconnected
and
cmdscale
. Function metaMDS
may trigger
stepacross
transformation, but usually only for
longest dissimilarities. The plot
method of vegan
minimum spanning tree function (spantree
) has even
more extreme way of isomapping things.
## The following examples also overlay minimum spanning tree to ## the graphics in red. op <- par(mar=c(4,4,1,1)+0.2, mfrow=c(2,2)) data(BCI) dis <- vegdist(BCI) tr <- spantree(dis) pl <- ordiplot(cmdscale(dis), main="cmdscale") lines(tr, pl, col="red") ord <- isomap(dis, k=3) ord pl <- plot(ord, main="isomap k=3") lines(tr, pl, col="red") pl <- plot(isomap(dis, k=5), main="isomap k=5") lines(tr, pl, col="red") pl <- plot(isomap(dis, epsilon=0.45), main="isomap epsilon=0.45") lines(tr, pl, col="red") par(op) ## colour points and web by the dominant species dom <- apply(BCI, 1, which.max) ## need nine colours, but default palette has only eight op <- palette(c(palette("default"), "sienna")) plot(ord, pch = 16, col = dom, n.col = dom) palette(op)
## The following examples also overlay minimum spanning tree to ## the graphics in red. op <- par(mar=c(4,4,1,1)+0.2, mfrow=c(2,2)) data(BCI) dis <- vegdist(BCI) tr <- spantree(dis) pl <- ordiplot(cmdscale(dis), main="cmdscale") lines(tr, pl, col="red") ord <- isomap(dis, k=3) ord pl <- plot(ord, main="isomap k=3") lines(tr, pl, col="red") pl <- plot(isomap(dis, k=5), main="isomap k=5") lines(tr, pl, col="red") pl <- plot(isomap(dis, epsilon=0.45), main="isomap epsilon=0.45") lines(tr, pl, col="red") par(op) ## colour points and web by the dominant species dom <- apply(BCI, 1, which.max) ## need nine colours, but default palette has only eight op <- palette(c(palette("default"), "sienna")) plot(ord, pch = 16, col = dom, n.col = dom) palette(op)
Function kendall.global
computes and tests the coefficient of
concordance among several judges (variables, species) through a
permutation test.
Function kendall.post
carries out a posteriori tests
of the contributions of individual judges (variables, species) to
the overall concordance of their group through permutation tests.
If several groups of judges are identified in the data table,
coefficients of concordance (kendall.global
) or a posteriori
tests (kendall.post
) will be computed for each group
separately. Use in ecology: to identify significant species
associations.
kendall.global(Y, group, nperm = 999, mult = "holm") kendall.post(Y, group, nperm = 999, mult = "holm")
kendall.global(Y, group, nperm = 999, mult = "holm") kendall.post(Y, group, nperm = 999, mult = "holm")
Y |
Data file (data frame or matrix) containing quantitative or semiquantitative data. Rows are objects and columns are judges (variables). In community ecology, that table is often a site-by-species table. |
group |
A vector defining how judges should be divided into groups. See example below. If groups are not explicitly defined, all judges in the data file will be considered as forming a single group. |
nperm |
Number of permutations to be performed. Default is 999. |
mult |
Correct P-values for multiple testing using the
alternatives described in |
Y
must contain quantitative data. They will be transformed to
ranks within each column before computation of the coefficient of
concordance.
The search for species associations described in Legendre (2005) proceeds in 3 steps:
(1) Correlation analysis of the species. A possible method is to
compute Ward's agglomerative clustering of a matrix of correlations
among the species. In detail: (1.1) compute a Pearson or Spearman
correlation matrix (correl.matrix
) among the species; (1.2)
turn it into a distance matrix: mat.D = as.dist(1-correl.matrix)
;
(1.3) carry out Ward's hierarchical
clustering of that matrix using hclust
:
clust.ward = hclust(mat.D, "ward")
; (1.4) plot the dendrogram:
plot(clust.ward, hang=-1)
; (1.5) cut the dendrogram in two
groups, retrieve the vector of species membership:
group.2 = cutree(clust.ward, k=2)
. (1.6) After steps 2 and 3 below,
you may
have to come back and try divisions of the species into k =
groups.
(2) Compute global tests of significance of the 2 (or more) groups
using the function kendall.global
and the vector defining the
groups. Groups that are not globally significant must be refined or
abandoned.
(3) Compute a posteriori tests of the contribution of individual
species to the concordance of their group using the function
kendall.post
and the vector defining the groups. If some
species have negative values for "Spearman.mean", this means that
these species clearly do not belong to the group, hence that group
is too inclusive. Go back to (1.5) and cut the dendrogram more
finely. The left and right groups can be cut separately,
independently of the levels along the dendrogram; write your own
vector of group membership if cutree
does not produce the
desired groups.
The corrections used for multiple testing are applied to the list of
P-values (P); they take into account the number of tests (k) carried
out simultaneously (number of groups in kendall.global
, or
number of species in kendall.post
). The corrections are
performed using function p.adjust
; see that function
for the description of the correction methods. In addition, there is
Šidák correction which defined as
.
A table containing the following information in rows. The columns
correspond to the groups of "judges" defined in vector "group". When
function Kendall.post
is used, there are as many tables as
the number of predefined groups.
W |
Kendall's coefficient of concordance, W. |
F |
F statistic. F = W*(m-1)/(1-W) where m is the number of judges. |
Prob.F |
Probability associated with the F statistic, computed from the F distribution with nu1 = n-1-(2/m) and nu2 = nu1*(m-1); n is the number of objects. |
Corrected prob.F |
Probabilities associated with F, corrected
using the method selected in parameter |
Chi2 |
Friedman's chi-square statistic (Friedman 1937) used in the permutation test of W. |
Prob.perm |
Permutational probabilities, uncorrected. |
Corrected prob.perm |
Permutational probabilities corrected
using the method selected in parameter |
Spearman.mean |
Mean of the Spearman correlations between the judge under test and all the other judges in the same group. |
W.per.species |
Contribution of the judge under test to the overall concordance statistic for that group. |
F. Guillaume Blanchet, University of Alberta, and Pierre Legendre, Université de Montréal
Friedman, M. 1937. The use of ranks to avoid the assumption of normality implicit in the analysis of variance. Journal of the American Statistical Association 32: 675-701.
Kendall, M. G. and B. Babington Smith. 1939. The problem of m rankings. Annals of Mathematical Statistics 10: 275-287.
Legendre, P. 2005. Species associations: the Kendall coefficient of concordance revisited. Journal of Agricultural, Biological, and Environmental Statistics 10: 226-245.
Legendre, P. 2009. Coefficient of concordance. In: Encyclopedia of Research Design. SAGE Publications (in press).
Siegel, S. and N. J. Castellan, Jr. 1988. Nonparametric statistics for the behavioral sciences. 2nd edition. McGraw-Hill, New York.
cor
, friedman.test
,
hclust
, cutree
, kmeans
,
cascadeKM
.
data(mite) mite.hel <- decostand(mite, "hel") # Reproduce the results shown in Table 2 of Legendre (2005), a single group mite.small <- mite.hel[c(4,9,14,22,31,34,45,53,61,69),c(13:15,23)] kendall.global(mite.small, nperm=49) kendall.post(mite.small, mult="holm", nperm=49) # Reproduce the results shown in Tables 3 and 4 of Legendre (2005), 2 groups group <-c(1,1,2,1,1,1,1,1,2,1,1,1,1,1,1,2,1,2,1,1,1,1,2,1,2,1,1,1,1,1,2,2,2,2,2) kendall.global(mite.hel, group=group, nperm=49) kendall.post(mite.hel, group=group, mult="holm", nperm=49) # NOTE: 'nperm' argument usually needs to be larger than 49. # It was set to this low value for demonstration purposes.
data(mite) mite.hel <- decostand(mite, "hel") # Reproduce the results shown in Table 2 of Legendre (2005), a single group mite.small <- mite.hel[c(4,9,14,22,31,34,45,53,61,69),c(13:15,23)] kendall.global(mite.small, nperm=49) kendall.post(mite.small, mult="holm", nperm=49) # Reproduce the results shown in Tables 3 and 4 of Legendre (2005), 2 groups group <-c(1,1,2,1,1,1,1,1,2,1,1,1,1,1,1,2,1,2,1,1,1,1,2,1,2,1,1,1,1,1,2,2,2,2,2) kendall.global(mite.hel, group=group, nperm=49) kendall.post(mite.hel, group=group, mult="holm", nperm=49) # NOTE: 'nperm' argument usually needs to be larger than 49. # It was set to this low value for demonstration purposes.
Function linestack
plots vertical one-dimensional plots for
numeric vectors. The plots are always labelled, but the labels
are moved vertically to avoid overwriting.
linestack(x, labels, cex = 0.8, side = "right", hoff = 2, air = 1.1, at = 0, add = FALSE, axis = FALSE, ...)
linestack(x, labels, cex = 0.8, side = "right", hoff = 2, air = 1.1, at = 0, add = FALSE, axis = FALSE, ...)
x |
Numeric vector to be plotted. |
labels |
Labels used instead of default (names of |
cex |
Size of the labels. |
side |
Put labels to the |
hoff |
Distance from the vertical axis to the label in units of the width of letter “m”. |
air |
Multiplier to string height to leave empty space between labels. |
at |
Position of plot in horizontal axis. |
add |
Add to an existing plot. |
axis |
Add axis to the plot. |
... |
Other graphical parameters to labels. |
The function returns invisibly the shifted positions of labels in user coordinates.
The function always draws labelled diagrams. If you want to have
unlabelled diagrams, you can use, e.g., plot
,
stripchart
or rug
.
Jari Oksanen with modifications by Gavin L. Simpson
## First DCA axis data(dune) ord <- decorana(dune) linestack(scores(ord, choices=1, display="sp")) linestack(scores(ord, choices=1, display="si"), side="left", add=TRUE) title(main="DCA axis 1") ## Expressions as labels N <- 10 # Number of sites df <- data.frame(Ca = rlnorm(N, 2), NO3 = rlnorm(N, 4), SO4 = rlnorm(N, 10), K = rlnorm(N, 3)) ord <- rda(df, scale = TRUE) ### vector of expressions for labels labs <- expression(Ca^{2+phantom()}, NO[3]^{-phantom()}, SO[4]^{2-phantom()}, K^{+phantom()}) scl <- "sites" linestack(scores(ord, choices = 1, display = "species", scaling = scl), labels = labs, air = 2) linestack(scores(ord, choices = 1, display = "site", scaling = scl), side = "left", add = TRUE) title(main = "PCA axis 1")
## First DCA axis data(dune) ord <- decorana(dune) linestack(scores(ord, choices=1, display="sp")) linestack(scores(ord, choices=1, display="si"), side="left", add=TRUE) title(main="DCA axis 1") ## Expressions as labels N <- 10 # Number of sites df <- data.frame(Ca = rlnorm(N, 2), NO3 = rlnorm(N, 4), SO4 = rlnorm(N, 10), K = rlnorm(N, 3)) ord <- rda(df, scale = TRUE) ### vector of expressions for labels labs <- expression(Ca^{2+phantom()}, NO[3]^{-phantom()}, SO[4]^{2-phantom()}, K^{+phantom()}) scl <- "sites" linestack(scores(ord, choices = 1, display = "species", scaling = scl), labels = labs, air = 2) linestack(scores(ord, choices = 1, display = "site", scaling = scl), side = "left", add = TRUE) title(main = "PCA axis 1")
Function is based on abbreviate
, and will
take given number of characters from the first (genus) and last
(epithet) component of botanical or zoological Latin name and combine
these into one shorter character string. The names will be unique and
more characters will be used if needed. The default usage makes names
with 4+4 characters popularized in Cornell Ecology Programs (CEP) and
often known as CEP names. Abbreviated names are useful in ordination
plots and other graphics to reduce clutter.
make.cepnames(names, minlengths = c(4,4), seconditem = FALSE, uniqgenera = FALSE, named = FALSE, method)
make.cepnames(names, minlengths = c(4,4), seconditem = FALSE, uniqgenera = FALSE, named = FALSE, method)
names |
The names to be abbreviated into a vector abbreviated names. |
minlengths |
The minimum lengths of first and second part of the abbreviation. If abbreviations are not unique, the parts can be longer. |
seconditem |
Take always the second part of the original name to the abbreviated name instead of the last part. |
uniqgenera |
Should the first part of the abbreviation (genus) also be unique. Unique genus can take space from the second part (epithet). |
method |
The |
named |
Should the result vector be named by original
|
Cornell Ecology Programs (CEP) used eight-letter abbreviations
for species and site names. In species, the names were formed by
taking four first letters of the generic name and four first letters
of the specific or subspecific epithet. The current function produces
CEP names as default, but it can also use other lengths. The function
is based on abbreviate
and can produce longer names if
basic names are not unique. If generic name is shorter than specified
minimun length, more characters can be used by the epithet. If
uniqgenera = TRUE
genus can use more characters, and these
reduce the number of characters available for the epithet. The
function drops characters from the end, but with method =
"both.sides"
the function tries to drop characters from other
positions, starting with lower-case wovels, in the final attempt to
abbreviate abbreviations.
Function returns a vector of abbreviated names.
The function does not handle Author names except strictly
two-part names with seconditem = TRUE
. It is often useful to
edit abbreviations manually.
Jari Oksanen
names <- c("Aa maderoi", "Capsella bursa-pastoris", "Taraxacum", "Cladina rangiferina", "Cladonia rangiformis", "Cladonia cornuta", "Cladonia cornuta var. groenlandica", "Rumex acetosa", "Rumex acetosella") make.cepnames(names) make.cepnames(names, uniqgenera = TRUE) make.cepnames(names, method = "both.sides")
names <- c("Aa maderoi", "Capsella bursa-pastoris", "Taraxacum", "Cladina rangiferina", "Cladonia rangiformis", "Cladonia cornuta", "Cladonia cornuta var. groenlandica", "Rumex acetosa", "Rumex acetosella") make.cepnames(names) make.cepnames(names, uniqgenera = TRUE) make.cepnames(names, method = "both.sides")
Function mantel
finds the Mantel statistic as a matrix
correlation between two dissimilarity matrices, and function
mantel.partial
finds the partial Mantel statistic as the
partial matrix correlation between three dissimilarity matrices. The
significance of the statistic is evaluated by permuting rows and
columns of the first dissimilarity matrix.
mantel(xdis, ydis, method="pearson", permutations=999, strata = NULL, na.rm = FALSE, parallel = getOption("mc.cores")) mantel.partial(xdis, ydis, zdis, method = "pearson", permutations = 999, strata = NULL, na.rm = FALSE, parallel = getOption("mc.cores"))
mantel(xdis, ydis, method="pearson", permutations=999, strata = NULL, na.rm = FALSE, parallel = getOption("mc.cores")) mantel.partial(xdis, ydis, zdis, method = "pearson", permutations = 999, strata = NULL, na.rm = FALSE, parallel = getOption("mc.cores"))
xdis , ydis , zdis
|
Dissimilarity matrices or |
method |
Correlation method, as accepted by |
permutations |
a list of control values for the permutations
as returned by the function |
strata |
An integer vector or factor specifying the strata for permutation. If supplied, observations are permuted only within the specified strata. |
na.rm |
Remove missing values in calculation of Mantel correlation. Use this option with care: Permutation tests can be biased, in particular if two matrices had missing values in matching positions. |
parallel |
Number of parallel processes or a predefined socket
cluster. With |
Mantel statistic is simply a correlation between entries of two
dissimilarity matrices (some use cross products, but these are
linearly related). However, the significance cannot be directly
assessed, because there are entries for just
observations. Mantel developed asymptotic test, but here we use
permutations of
rows and columns of dissimilarity
matrix. Only the first matrix (
xdist
) will be permuted, and
the second is kept constant. See permutations
for
additional details on permutation tests in Vegan.
Partial Mantel statistic uses partial correlation
conditioned on the third matrix. Only the first matrix is permuted so
that the correlation structure between second and first matrices is
kept constant. Although mantel.partial
silently accepts other
methods than "pearson"
, partial correlations will probably be
wrong with other methods.
Borcard & Legendre (2012) warn against using partial Mantel test and
recommend instead Mantel correlogram
(mantel.correlog
).
The function uses cor
, which should accept
alternatives pearson
for product moment correlations and
spearman
or kendall
for rank correlations.
The function returns a list of class mantel
with following
components:
Call |
Function call. |
method |
Correlation method used, as returned by
|
statistic |
The Mantel statistic. |
signif |
Empirical significance level from permutations. |
perm |
A vector of permuted values. The distribution of
permuted values can be inspected with |
permutations |
Number of permutations. |
control |
A list of control values for the permutations
as returned by the function |
Jari Oksanen
Borcard, D. & Legendre, P. (2012) Is the Mantel correlogram powerful enough to be useful in ecological analysis? A simulation study. Ecology 93: 1473-1481.
Legendre, P. and Legendre, L. (2012) Numerical Ecology. 3rd English Edition. Elsevier.
## Is vegetation related to environment? data(varespec) data(varechem) veg.dist <- vegdist(varespec) # Bray-Curtis env.dist <- vegdist(scale(varechem), "euclid") mantel(veg.dist, env.dist) mantel(veg.dist, env.dist, method="spear")
## Is vegetation related to environment? data(varespec) data(varechem) veg.dist <- vegdist(varespec) # Bray-Curtis env.dist <- vegdist(scale(varechem), "euclid") mantel(veg.dist, env.dist) mantel(veg.dist, env.dist, method="spear")
Function mantel.correlog
computes a multivariate
Mantel correlogram. Proposed by Sokal (1986) and Oden and Sokal
(1986), the method is also described in Legendre and Legendre (2012,
pp. 819–821) and tested and compared in Borcard and Legendere (2012).
mantel.correlog(D.eco, D.geo=NULL, XY=NULL, n.class=0, break.pts=NULL, cutoff=TRUE, r.type="pearson", nperm=999, mult="holm", progressive=TRUE) ## S3 method for class 'mantel.correlog' plot(x, alpha=0.05, ...)
mantel.correlog(D.eco, D.geo=NULL, XY=NULL, n.class=0, break.pts=NULL, cutoff=TRUE, r.type="pearson", nperm=999, mult="holm", progressive=TRUE) ## S3 method for class 'mantel.correlog' plot(x, alpha=0.05, ...)
D.eco |
An ecological distance matrix, with class
either |
D.geo |
A geographic distance matrix, with class either
|
XY |
A file of Cartesian geographic coordinates of the
points. Default: |
n.class |
Number of classes. If |
break.pts |
Vector containing the break points of the distance
distribution. Provide (n.class+1) breakpoints, that is, a list with
a beginning and an ending point. Default: |
cutoff |
For the second half of the distance classes,
|
r.type |
Type of correlation in calculation of the Mantel
statistic. Default: |
nperm |
Number of permutations for the tests of
significance. Default: |
mult |
Correct P-values for multiple testing. The correction
methods are |
progressive |
Default: |
x |
Output of |
alpha |
Significance level for the points drawn with black
symbols in the correlogram. Default: |
... |
Other parameters passed from other functions. |
A correlogram is a graph in which spatial correlation values
are plotted, on the ordinate, as a function of the geographic distance
classes among the study sites along the abscissa. In a Mantel
correlogram, a Mantel correlation (Mantel 1967) is computed between a
multivariate (e.g. multi-species) distance matrix of the user's choice
and a design matrix representing each of the geographic distance
classes in turn. The Mantel statistic is tested through a
permutational Mantel test performed by vegan
's
mantel
function.
Borcard and Legendre (2012) show that the testing method in the
Mantel correlogram has correct type I error and power, contrary to
the simple and partial Mantel tests so often used by ecologists and
geneticists in spatial analysis (see mantel.partial
).
They also show that the test in Mantel correlograms is the same test
as used by Wagner (2004) in multiscale ordination
(mso
), and that it is closely related to the Geary’s
test in univariate correlograms.
When a correction for multiple testing is applied, more permutations
are necessary than in the no-correction case, to obtain significant
-values in the higher correlogram classes.
The print.mantel.correlog
function prints out the
correlogram. See examples.
mantel.res |
A table with the distance classes as rows and the
class indices, number of distances per class, Mantel statistics
(computed using Pearson's r, Spearman's r, or Kendall's tau), and
p-values as columns. A positive Mantel statistic indicates positive
spatial correlation. An additional column with p-values corrected for
multiple testing is added unless |
n.class |
The n umber of distance classes. |
break.pts |
The break points provided by the user or computed by the program. |
mult |
The name of the correction for multiple testing. No
correction: |
progressive |
A logical ( |
n.tests |
The number of distance classes for which Mantel tests have been computed and tested for significance. |
call |
The function call. |
Pierre Legendre, Université de Montréal
Borcard, D. & P. Legendre. 2012. Is the Mantel correlogram powerful enough to be useful in ecological analysis? A simulation study. Ecology 93: 1473-1481.
Legendre, P. and L. Legendre. 2012. Numerical ecology, 3rd English edition. Elsevier Science BV, Amsterdam.
Mantel, N. 1967. The detection of disease clustering and a generalized regression approach. Cancer Res. 27: 209-220.
Oden, N. L. and R. R. Sokal. 1986. Directional autocorrelation: an extension of spatial correlograms to two dimensions. Syst. Zool. 35: 608-617.
Sokal, R. R. 1986. Spatial data analysis and historical processes. 29-43 in: E. Diday et al. [eds.] Data analysis and informatics, IV. North-Holland, Amsterdam.
Sturges, H. A. 1926. The choice of a class interval. Journal of the American Statistical Association 21: 65–66.
Wagner, H.H. 2004. Direct multi-scale ordination with canonical correspondence analysis. Ecology 85: 342-351.
# Mite data available in "vegan" data(mite) data(mite.xy) mite.hel <- decostand(mite, "hellinger") # Detrend the species data by regression on the site coordinates mite.hel.resid <- resid(lm(as.matrix(mite.hel) ~ ., data=mite.xy)) # Compute the detrended species distance matrix mite.hel.D <- dist(mite.hel.resid) # Compute Mantel correlogram with cutoff, Pearson statistic mite.correlog <- mantel.correlog(mite.hel.D, XY=mite.xy, nperm=49) summary(mite.correlog) mite.correlog # or: print(mite.correlog) # or: print.mantel.correlog(mite.correlog) plot(mite.correlog) # Compute Mantel correlogram without cutoff, Spearman statistic mite.correlog2 <- mantel.correlog(mite.hel.D, XY=mite.xy, cutoff=FALSE, r.type="spearman", nperm=49) summary(mite.correlog2) mite.correlog2 plot(mite.correlog2) # NOTE: 'nperm' argument usually needs to be larger than 49. # It was set to this low value for demonstration purposes.
# Mite data available in "vegan" data(mite) data(mite.xy) mite.hel <- decostand(mite, "hellinger") # Detrend the species data by regression on the site coordinates mite.hel.resid <- resid(lm(as.matrix(mite.hel) ~ ., data=mite.xy)) # Compute the detrended species distance matrix mite.hel.D <- dist(mite.hel.resid) # Compute Mantel correlogram with cutoff, Pearson statistic mite.correlog <- mantel.correlog(mite.hel.D, XY=mite.xy, nperm=49) summary(mite.correlog) mite.correlog # or: print(mite.correlog) # or: print.mantel.correlog(mite.correlog) plot(mite.correlog) # Compute Mantel correlogram without cutoff, Spearman statistic mite.correlog2 <- mantel.correlog(mite.hel.D, XY=mite.xy, cutoff=FALSE, r.type="spearman", nperm=49) summary(mite.correlog2) mite.correlog2 plot(mite.correlog2) # NOTE: 'nperm' argument usually needs to be larger than 49. # It was set to this low value for demonstration purposes.
Add new points to an existing metaMDS
or
monoMDS
ordination.
MDSaddpoints(nmds, dis, neighbours = 5, maxit = 200) dist2xy(dist, pick, type = c("xy", "xx"), invert = FALSE)
MDSaddpoints(nmds, dis, neighbours = 5, maxit = 200) dist2xy(dist, pick, type = c("xy", "xx"), invert = FALSE)
nmds |
Result object from |
dis |
Rectangular non-symmetric dissimilarity matrix among new
points (rows) and old fixed points (columns). Such matrix can be
extracted from complete dissimilarities of both old and new points
with |
neighbours |
Number of nearest points used to get the starting locations for new points. |
maxit |
Maximum number of iterations. |
dist |
Input dissimilarities. |
pick |
Indices (integers) of selected observations or a logical
vector that is |
type |
|
invert |
Invert |
Function provides an interface to monoMDS
Fortran code to add new
points to an existing ordination that will be regarded as fixed. The
function has a similar role as predict
functions with
newdata
in Euclidean ordination (e.g. predict.cca
).
Input data must be a rectangular matrix of distances among new added
points (rows) and all fixed old points (columns). Such matrices can be
extracted from complete dissimilarities with helper function
dist2xy
. Function designdist2
can directly
calculate such rectangular dissimilarity matrices between sampling units
(rows) in two matries. In addition, analogue has distance
function that can calculate dissimilarities among two matrices,
including functions that cannot be specified in
designdist2
.
Great care is needed in preparing the dissimilarities for the input. The dissimilarity index must be exactly the same as in the fixed ordination, and columns must match old fixed points, and rows added new points.
Function return a list of class "nmds"
(there are no other
objects of that type in vegan) with following elements
points |
Coordinates of added new points |
seeds |
Starting coordinates for new points. |
deltastress |
Change of stress with added points. |
iters |
Number of iterations. |
cause |
Cause of termination of iterations. Integer for
convergence criteria in |
## Cross-validation: remove a point when performing NMDS and add as ## a new points data(dune) d <- vegdist(dune) ## remove point 3 from ordination mod3 <- metaMDS(dist2xy(d, 3, "xx", invert = TRUE), trace=0) ## add point 3 to the result MDSaddpoints(mod3, dist2xy(d, 3)) ## Use designdist2 d15 <- designdist(dune[1:15,]) m15 <- metaMDS(d15, trace=0) MDSaddpoints(m15, designdist2(dune[1:15,], dune[16:20,]))
## Cross-validation: remove a point when performing NMDS and add as ## a new points data(dune) d <- vegdist(dune) ## remove point 3 from ordination mod3 <- metaMDS(dist2xy(d, 3, "xx", invert = TRUE), trace=0) ## add point 3 to the result MDSaddpoints(mod3, dist2xy(d, 3)) ## Use designdist2 d15 <- designdist(dune[1:15,]) m15 <- metaMDS(d15, trace=0) MDSaddpoints(m15, designdist2(dune[1:15,], dune[16:20,]))
Function rotates a multidimensional scaling result so
that its first dimension is parallel to an external (environmental
variable). The function can handle the results from
metaMDS
or monoMDS
functions.
MDSrotate(object, vec, na.rm = FALSE, ...)
MDSrotate(object, vec, na.rm = FALSE, ...)
object |
|
vec |
An environmental variable or a matrix of such
variables. The number of variables must be lower than the number
of dimensions, and the solution is rotated to these variables in
the order they appear in the matrix. Alternatively |
na.rm |
Remove missing values from the continuous variable
|
... |
Other arguments (ignored). |
The orientation and rotation are undefined in multidimensional
scaling. Functions metaMDS
and metaMDS
can rotate their solutions to principal components so that the
dispersion of the points is highest on the first dimension. Sometimes
a different rotation is more intuitive, and MDSrotate
allows
rotation of the result so that the first axis is parallel to a given
external variable or two first variables are completely in a
two-dimensional plane etc. If several external variables are supplied,
they are applied in the order they are in the matrix. First axis is
rotated to the first supplied variable, and the second axis to the
second variable. Because variables are usually correlated, the second
variable is not usually aligned with the second axis, but it is
uncorrelated to later dimensions. There must be at least one free
dimension: the number of external variables must be lower than the
number of dimensions, and all used environmental variables are
uncorrelated with that free dimension.
Alternatively the method can rotate to discriminate the levels of a
factor using linear discriminant analysis
(lda
). This is hardly meaningful for
two-dimensional solutions, since all rotations in two dimensions
have the same separation of cluster levels. However, the function
can be useful in finding a two-dimensional projection of clusters
from more than two dimensions. The last dimension will always show
the residual variation, and for dimensions, only
discrimination vectors are used.
Function returns the original ordination result, but with
rotated scores (both site and species if available), and the
pc
attribute of scores set to FALSE
.
Rotation to a factor variable is an experimental feature and may
be removed. The discriminant analysis weights dimensions by their
discriminating power, but MDSrotate
performs a rigid
rotation. Therefore the solution may not be optimal.
Jari Oksanen
data(varespec) data(varechem) mod <- monoMDS(vegdist(varespec)) mod <- with(varechem, MDSrotate(mod, pH)) plot(mod) ef <- envfit(mod ~ pH, varechem, permutations = 0) plot(ef) ordisurf(mod ~ pH, varechem, knots = 1, add = TRUE)
data(varespec) data(varechem) mod <- monoMDS(vegdist(varespec)) mod <- with(varechem, MDSrotate(mod, pH)) plot(mod) ef <- envfit(mod ~ pH, varechem, permutations = 0) plot(ef) ordisurf(mod ~ pH, varechem, knots = 1, add = TRUE)
Function metaMDS
performs Nonmetric
Multidimensional Scaling (NMDS), and tries to find a stable solution
using several random starts. In addition, it standardizes the
scaling in the result, so that the configurations are easier to
interpret, and adds species scores to the site ordination. The
metaMDS
function does not provide actual NMDS, but it calls
another function for the purpose. Currently monoMDS
is
the default choice, and it is also possible to call the
isoMDS
(MASS package).
metaMDS(comm, distance = "bray", k = 2, try = 20, trymax = 20, engine = c("monoMDS", "isoMDS"), autotransform =TRUE, noshare = (engine == "isoMDS"), wascores = TRUE, expand = TRUE, trace = 1, plot = FALSE, previous.best, ...) ## S3 method for class 'metaMDS' plot(x, display = c("sites", "species"), choices = c(1, 2), type = "p", shrink = FALSE, cex = 0.7, ...) ## S3 method for class 'metaMDS' points(x, display = c("sites", "species"), choices = c(1,2), shrink = FALSE, select, cex = 0.7, ...) ## S3 method for class 'metaMDS' text(x, display = c("sites", "species"), labels, choices = c(1,2), shrink = FALSE, select, cex = 0.7, ...) ## S3 method for class 'metaMDS' scores(x, display = c("sites", "species"), shrink = FALSE, choices, tidy = FALSE, ...) metaMDSdist(comm, distance = "bray", autotransform = TRUE, noshare = TRUE, trace = 1, commname, zerodist = "ignore", distfun = vegdist, ...) metaMDSiter(dist, k = 2, try = 20, trymax = 20, trace = 1, plot = FALSE, previous.best, engine = "monoMDS", maxit = 200, parallel = getOption("mc.cores"), ...) initMDS(x, k=2) postMDS(X, dist, pc=TRUE, center=TRUE, halfchange, threshold=0.8, nthreshold=10, plot=FALSE, ...) metaMDSredist(object, ...)
metaMDS(comm, distance = "bray", k = 2, try = 20, trymax = 20, engine = c("monoMDS", "isoMDS"), autotransform =TRUE, noshare = (engine == "isoMDS"), wascores = TRUE, expand = TRUE, trace = 1, plot = FALSE, previous.best, ...) ## S3 method for class 'metaMDS' plot(x, display = c("sites", "species"), choices = c(1, 2), type = "p", shrink = FALSE, cex = 0.7, ...) ## S3 method for class 'metaMDS' points(x, display = c("sites", "species"), choices = c(1,2), shrink = FALSE, select, cex = 0.7, ...) ## S3 method for class 'metaMDS' text(x, display = c("sites", "species"), labels, choices = c(1,2), shrink = FALSE, select, cex = 0.7, ...) ## S3 method for class 'metaMDS' scores(x, display = c("sites", "species"), shrink = FALSE, choices, tidy = FALSE, ...) metaMDSdist(comm, distance = "bray", autotransform = TRUE, noshare = TRUE, trace = 1, commname, zerodist = "ignore", distfun = vegdist, ...) metaMDSiter(dist, k = 2, try = 20, trymax = 20, trace = 1, plot = FALSE, previous.best, engine = "monoMDS", maxit = 200, parallel = getOption("mc.cores"), ...) initMDS(x, k=2) postMDS(X, dist, pc=TRUE, center=TRUE, halfchange, threshold=0.8, nthreshold=10, plot=FALSE, ...) metaMDSredist(object, ...)
comm |
Community data. Alternatively, dissimilarities either as
a |
distance |
Dissimilarity index used in |
k |
Number of dimensions. NB., the number of points |
try , trymax
|
Minimum and maximum numbers of random starts in
search of stable solution. After |
engine |
The function used for MDS. The default is to use the
|
autotransform |
Use simple heuristics for possible data
transformation of typical community data (see below). If you do
not have community data, you should probably set
|
noshare |
Triggering of calculation step-across or extended
dissimilarities with function |
wascores |
Calculate species scores using function
|
expand |
Expand weighted averages of species in
|
trace |
Trace the function; |
plot |
Graphical tracing: plot interim results. You may want to set
|
previous.best |
Start searches from a previous solution. This can
also be a |
x |
|
choices |
Axes shown. |
type |
Plot type: |
display |
Display |
shrink |
Shrink back species scores if they were expanded originally. |
cex |
Character expansion for plotting symbols. |
tidy |
Return scores that are compatible with ggplot2:
all scores are in a single |
labels |
Optional test to be used instead of row names. |
select |
Items to be displayed. This can either be a logical
vector which is |
X |
Configuration from multidimensional scaling. |
commname |
The name of |
zerodist |
Handling of zero dissimilarities: either
|
distfun |
Dissimilarity function. Any function returning a
|
maxit |
Maximum number of iterations in the single NMDS run;
passed to the |
parallel |
Number of parallel processes or a predefined socket
cluster. If you use pre-defined socket clusters (say,
|
dist |
Dissimilarity matrix used in multidimensional scaling. |
pc |
Rotate to principal components. |
center |
Centre the configuration. |
halfchange |
Scale axes to half-change units. This defaults
|
threshold |
Largest dissimilarity used in half-change scaling. If
dissimilarities have a known (or inferred) ceiling, |
nthreshold |
Minimum number of points in half-change scaling. |
object |
A result object from |
... |
Other parameters passed to functions. Function
|
Non-metric Multidimensional Scaling (NMDS) is commonly
regarded as the most robust unconstrained ordination method in
community ecology (Minchin 1987). Function metaMDS
is a
wrapper function that calls several other functions to combine
Minchin's (1987) recommendations into one command. The complete
steps in metaMDS
are:
Transformation: If the data values are larger than common
abundance class scales, the function performs a Wisconsin double
standardization (wisconsin
). If the values look
very large, the function also performs sqrt
transformation. Both of these standardizations are generally found
to improve the results. However, the limits are completely
arbitrary (at present, data maximum 50 triggers sqrt
and triggers
wisconsin
). If you want to
have a full control of the analysis, you should set
autotransform = FALSE
and standardize and transform data
independently. The autotransform
is intended for community
data, and for other data types, you should set
autotransform = FALSE
. This step is perfomed using
metaMDSdist
, and the step is skipped if input were
dissimilarities.
Choice of dissimilarity: For a good result, you should use
dissimilarity indices that have a good rank order relation to
ordering sites along gradients (Faith et al. 1987). The default
is Bray-Curtis dissimilarity, because it often is the test
winner. However, any other dissimilarity index in
vegdist
can be used. Function
rankindex
can be used for finding the test winner
for you data and gradients. The default choice may be bad if you
analyse other than community data, and you should probably select
an appropriate index using argument distance
. This step is
performed using metaMDSdist
, and the step is skipped if
input were dissimilarities.
Step-across dissimilarities: Ordination may be very difficult
if a large proportion of sites have no shared species. In this
case, the results may be improved with stepacross
dissimilarities, or flexible shortest paths among all sites. The
default NMDS engine
is monoMDS
which is able
to break tied values at the maximum dissimilarity, and this often
is sufficient to handle cases with no shared species, and
therefore the default is not to use stepacross
with
monoMDS
. Function isoMDS
does
not handle tied values adequately, and therefore the default is to
use stepacross
always when there are sites with no
shared species with engine = "isoMDS"
. The
stepacross
is triggered by option noshare
. If
you do not like manipulation of original distances, you should set
noshare = FALSE
. This step is skipped if input data were
dissimilarities instead of community data. This step is performed
using metaMDSdist
, and the step is skipped always when
input were dissimilarities.
NMDS with random starts: NMDS easily gets trapped into local
optima, and you must start NMDS several times from random starts
to be confident that you have found the global solution. The
strategy in metaMDS
is to first run NMDS starting with the
metric scaling (cmdscale
which usually finds a good
solution but often close to a local optimum), or use the
previous.best
solution if supplied, and take its solution
as the standard (Run 0
). Then metaMDS
starts NMDS
from several random starts (minimum number is given by try
and maximum number by trymax
). These random starts are
generated by initMDS
. If a solution is better (has a lower
stress) than the previous standard, it is taken as the new
standard. If the solution is better or close to a standard,
metaMDS
compares two solutions using Procrustes analysis
(function procrustes
with option
symmetric = TRUE
). If the solutions are very similar in their
Procrustes rmse
and the largest residual is very small, the
solutions are regarded as repeated and the better one is taken
as the new standard. The conditions are stringent, and you may
have found good and relatively similar solutions although the
function is not yet satisfied. Setting trace = TRUE
will
monitor the final stresses, and plot = TRUE
will display
Procrustes overlay plots from each comparison. This step is
performed using metaMDSiter
. This is the first step
performed if input data (comm
) were dissimilarities. Random
starts can be run with parallel processing (argument
parallel
).
Scaling of the results: metaMDS
will run postMDS
for the final result. Function postMDS
provides the
following ways of “fixing” the indeterminacy of scaling and
orientation of axes in NMDS: Centring moves the origin to the
average of the axes; Principal components rotate the configuration
so that the variance of points is maximized on first dimension
(with function MDSrotate
you can alternatively
rotate the configuration so that the first axis is parallel to an
environmental variable); Half-change scaling scales the
configuration so that one unit means halving of community
similarity from replicate similarity. Half-change scaling is
based on closer dissimilarities where the relation between
ordination distance and community dissimilarity is rather linear
(the limit is set by argument threshold
). If there are
enough points below this threshold (controlled by the parameter
nthreshold
), dissimilarities are regressed on distances.
The intercept of this regression is taken as the replicate
dissimilarity, and half-change is the distance where similarity
halves according to linear regression. Obviously the method is
applicable only for dissimilarity indices scaled to , such as Kulczynski, Bray-Curtis and Canberra indices. If
half-change scaling is not used, the ordination is scaled to the
same range as the original dissimilarities. Half-change scaling is
skipped by default if input were dissimilarities, but can be
turned on with argument
halfchange = TRUE
. NB., The PC
rotation only changes the directions of reference axes, and it
does not influence the configuration or solution in general.
Species scores: Function adds the species scores to the final
solution as weighted averages using function
wascores
with given value of parameter
expand
. The expansion of weighted averages can be undone
with shrink = TRUE
in plot
or scores
functions, and the calculation of species scores can be suppressed
with wascores = FALSE
. This step is skipped if input were
dissimilarities and community data were unavailable. However, the
species scores can be added or replaced with
sppscores
.
Function metaMDS
returns an object of class
metaMDS
. The final site ordination is stored in the item
points
, and species ordination in the item species
,
and the stress in item stress
(NB, the scaling of the stress
depends on the engine
: isoMDS
uses
percents, and monoMDS
proportions in the range ). The other items store the information on the steps taken
and the items returned by the
engine
function. The object has
print
, plot
, points
and text
methods.
Functions metaMDSdist
and metaMDSredist
return
vegdist
objects. Function initMDS
returns a
random configuration which is intended to be used within
isoMDS
only. Functions metaMDSiter
and
postMDS
returns the result of NMDS with updated
configuration.
Non-linear optimization is a hard task, and the best possible solution
(“global optimum”) may not be found from a random starting
configuration. Most software solve this by starting from the result of
metric scaling (cmdscale
). This will probably give a
good result, but not necessarily the “global
optimum”. Vegan does the same, but metaMDS
tries to
verify or improve this first solution (“try 0”) using several
random starts and seeing if the result can be repeated or improved and
the improved solution repeated. If this does not succeed, you get a
message that the result could not be repeated. However, the result
will be at least as good as the usual standard strategy of starting
from metric scaling or it may be improved. You may not need to do
anything after such a message, but you can be satisfied with the
result. If you want to be sure that you probably have a “global
optimum” you may try the following instructions.
With default engine = "monoMDS"
the function will
tabulate the stopping criteria used, so that you can see which
criterion should be made more stringent. The criteria can be given
as arguments to metaMDS
and their current values are
described in monoMDS
. In particular, if you reach
the maximum number of iterations, you should increase the value of
maxit
. You may ask for a larger number of random starts
without losing the old ones giving the previous solution in
argument previous.best
.
In addition to slack convergence criteria and too low number
of random starts, wrong number of dimensions (argument k
)
is the most common reason for not being able to repeat similar
solutions. NMDS is usually run with a low number dimensions
(k=2
or k=3
), and for complex data increasing
k
by one may help. If you run NMDS with much higher number
of dimensions (say, k=10
or more), you should reconsider
what you are doing and drastically reduce k
. For very
heterogeneous data sets with partial disjunctions, it may help to
set stepacross
, but for most data sets the default
weakties = TRUE
is sufficient.
Please note that you can give all arguments of other
metaMDS*
functions and NMDS engine (default
monoMDS
) in your metaMDS
command,and you
should check documentation of these functions for details.
NMDS is often misunderstood and wrong claims of its properties are common on the Web and even in publications. It is often claimed that the NMDS configuration is non-metric which means that you cannot fit environmental variables or species onto that space. This is a false statement. In fact, the result configuration of NMDS is metric, and it can be used like any other ordination result. In NMDS the rank orders of Euclidean distances among points in ordination have a non-metric monotone relationship to any observed dissimilarities. The transfer function from observed dissimilarities to ordination distances is non-metric (Kruskal 1964a, 1964b), but the ordination result configuration is metric and observed dissimilarities can be of any kind (metric or non-metric).
The ordination configuration is usually rotated to principal
components in metaMDS
. The rotation is performed after
finding the result, and it only changes the direction of the
reference axes. The only important feature in the NMDS solution are
the ordination distances, and these do not change in
rotation. Similarly, the rank order of distances does not change in
uniform scaling or centring of configuration of points. You can also
rotate the NMDS solution to external environmental variables with
MDSrotate
. This rotation will also only change the
orientation of axes, but will not change the configuration of points
or distances between points in ordination space.
Function stressplot
displays the method graphically:
it plots the observed dissimilarities against distances in
ordination space, and also shows the non-metric monotone
regression.
metaMDS
uses monoMDS
as its
NMDS engine
from vegan version 2.0-0, when it replaced
the isoMDS
function. You can set argument
engine
to select the old engine.
Function metaMDS
is a simple wrapper for an NMDS engine
(either monoMDS
or isoMDS
) and
some support functions (metaMDSdist
,
stepacross
, metaMDSiter
, initMDS
,
postMDS
, wascores
). You can call these support
functions separately for better control of results. Data
transformation, dissimilarities and possible
stepacross
are made in function metaMDSdist
which returns a dissimilarity result. Iterative search (with
starting values from initMDS
with monoMDS
) is
made in metaMDSiter
. Processing of result configuration is
done in postMDS
, and species scores added by
wascores
. If you want to be more certain of reaching
a global solution, you can compare results from several independent
runs. You can also continue analysis from previous results or from
your own configuration. Function may not save the used
dissimilarity matrix (monoMDS
does), but
metaMDSredist
tries to reconstruct the used dissimilarities
with original data transformation and possible
stepacross
.
The metaMDS
function was designed to be used with community
data. If you have other type of data, you should probably set some
arguments to non-default values: probably at least wascores
,
autotransform
and noshare
should be FALSE
. If
you have negative data entries, metaMDS
will set the previous
to FALSE
with a warning.
Jari Oksanen
Faith, D. P, Minchin, P. R. and Belbin, L. (1987). Compositional dissimilarity as a robust measure of ecological distance. Vegetatio 69, 57–68.
Kruskal, J.B. (1964a). Multidimensional scaling by optimizing goodness-of-fit to a nonmetric hypothesis. Psychometrika 29, 1–28.
Kruskal, J.B. (1964b). Nonmetric multidimensional scaling: a numerical method. Psychometrika 29, 115–129.
Minchin, P.R. (1987). An evaluation of relative robustness of techniques for ecological ordinations. Vegetatio 69, 89–107.
monoMDS
(and isoMDS
),
decostand
, wisconsin
,
vegdist
, rankindex
, stepacross
,
procrustes
, wascores
, sppscores
,
MDSrotate
, ordiplot
, stressplot
.
## The recommended way of running NMDS (Minchin 1987) ## data(dune) ## IGNORE_RDIFF_BEGIN ## Global NMDS using monoMDS sol <- metaMDS(dune) sol plot(sol, type="t") ## Start from previous best solution sol <- metaMDS(dune, previous.best = sol) ## Local NMDS and stress 2 of monoMDS sol2 <- metaMDS(dune, model = "local", stress=2) sol2 ## Use Arrhenius exponent 'z' as a binary dissimilarity measure sol <- metaMDS(dune, distfun = betadiver, distance = "z") sol ## IGNORE_RDIFF_END
## The recommended way of running NMDS (Minchin 1987) ## data(dune) ## IGNORE_RDIFF_BEGIN ## Global NMDS using monoMDS sol <- metaMDS(dune) sol plot(sol, type="t") ## Start from previous best solution sol <- metaMDS(dune, previous.best = sol) ## Local NMDS and stress 2 of monoMDS sol2 <- metaMDS(dune, model = "local", stress=2) sol2 ## Use Arrhenius exponent 'z' as a binary dissimilarity measure sol <- metaMDS(dune, distfun = betadiver, distance = "z") sol ## IGNORE_RDIFF_END
Oribatid mite data. 70 soil cores collected by Daniel Borcard in 1989. See Borcard et al. (1992, 1994) for details.
data(mite) data(mite.env) data(mite.pcnm) data(mite.xy)
data(mite) data(mite.env) data(mite.pcnm) data(mite.xy)
There are three linked data sets: mite
that contains the data
on 35 species of Oribatid mites, mite.env
that contains
environmental data in the same sampling sites, mite.xy
that contains geographic coordinates, and mite.pcnm
that contains 22 PCNM base functions (columns) computed from the geographic
coordinates of the 70 sampling sites (Borcard & Legendre 2002).
The whole sampling area was 2.5 m x 10 m in size.
The fields in the environmental data are:
Substrate density (g/L)
Water content of the substrate (g/L)
Substrate type, factor with levels Sphagn1,
Sphagn2 Sphagn3 Sphagn Litter Barepeat Interface
Shrub density, an ordered factor with levels 1
<
2
< 3
Microtopography, a factor with levels Blanket
and Hummock
Pierre Legendre
Borcard, D., P. Legendre and P. Drapeau. 1992. Partialling out the spatial component of ecological variation. Ecology 73: 1045-1055.
Borcard, D. and P. Legendre. 1994. Environmental control and spatial structure in ecological communities: an example using Oribatid mites (Acari, Oribatei). Environmental and Ecological Statistics 1: 37-61.
Borcard, D. and P. Legendre. 2002. All-scale spatial analysis of ecological data by means of principal coordinates of neighbour matrices. Ecological Modelling 153: 51-68.
data(mite)
data(mite)
Function implements Kruskal's (1964a,b) non-metric multidimensional scaling (NMDS) using monotone regression and primary (“weak”) treatment of ties. In addition to traditional global NMDS, the function implements local NMDS, linear and hybrid multidimensional scaling.
monoMDS(dist, y, k = 2, model = c("global", "local", "linear", "hybrid"), threshold = 0.8, maxit = 200, weakties = TRUE, stress = 1, scaling = TRUE, pc = TRUE, smin = 1e-4, sfgrmin = 1e-7, sratmax=0.999999, ...) ## S3 method for class 'monoMDS' scores(x, display = "sites", shrink = FALSE, choices, tidy = FALSE, ...) ## S3 method for class 'monoMDS' plot(x, display = "sites", choices = c(1,2), type = "t", ...) ## S3 method for class 'monoMDS' points(x, choices = c(1,2), select, ...) ## S3 method for class 'monoMDS' text(x, labels, choices = c(1,2), select, ...)
monoMDS(dist, y, k = 2, model = c("global", "local", "linear", "hybrid"), threshold = 0.8, maxit = 200, weakties = TRUE, stress = 1, scaling = TRUE, pc = TRUE, smin = 1e-4, sfgrmin = 1e-7, sratmax=0.999999, ...) ## S3 method for class 'monoMDS' scores(x, display = "sites", shrink = FALSE, choices, tidy = FALSE, ...) ## S3 method for class 'monoMDS' plot(x, display = "sites", choices = c(1,2), type = "t", ...) ## S3 method for class 'monoMDS' points(x, choices = c(1,2), select, ...) ## S3 method for class 'monoMDS' text(x, labels, choices = c(1,2), select, ...)
dist |
Input dissimilarities. |
y |
Starting configuration. A random configuration will be generated if this is missing. |
k |
Number of dimensions. NB., the number of points |
model |
MDS model: |
threshold |
Dissimilarity below which linear regression is used alternately with monotone regression. |
maxit |
Maximum number of iterations. |
weakties |
Use primary or weak tie treatment, where equal
observed dissimilarities are allowed to have different fitted
values. if |
stress |
Use stress type 1 or 2 (see Details). |
scaling |
Scale final scores to unit root mean squares. |
pc |
Rotate final scores to principal components. |
smin , sfgrmin , sratmax
|
Convergence criteria: iterations stop
when stress drops below |
x |
A |
display |
Kind of scores. Normally there are only scores for
|
shrink |
Shrink back species scores if they were expanded in
|
tidy |
Return scores that are compatible with ggplot2:
all scores are in a single |
choices |
Dimensions returned or plotted. The default |
type |
The type of the plot: |
select |
Items to be displayed. This can either be a logical
vector which is |
labels |
Labels to be use used instead of row names. |
... |
Other parameters to the functions (ignored in
|
There are several versions of non-metric multidimensional
scaling in R, but monoMDS
offers the following unique
combination of features:
“Weak” treatment of ties (Kruskal 1964a,b), where tied dissimilarities can be broken in monotone regression. This is especially important for cases where compared sites share no species and dissimilarities are tied to their maximum value of one. Breaking ties allows these points to be at different distances and can help in recovering very long coenoclines (gradients). Functions in the smacof package also hav adequate tie treatment.
Handles missing values in a meaningful way.
Offers “local” and “hybrid” scaling in addition to usual “global” NMDS (see below).
Uses fast compiled code (isoMDS
of the
MASS package also uses compiled code).
Function monoMDS
uses Kruskal's (1964b) original monotone
regression to minimize the stress. There are two alternatives of
stress: Kruskal's (1964a,b) original or “stress 1” and an
alternative version or “stress 2” (Sibson 1972). Both of
these stresses can be expressed with a general formula
where are distances among points in ordination configuration,
are the fitted ordination distances, and
are the ordination distances under null model. For
“stress 1”
, and for “stress 2”
or mean distances. “Stress 2”
can be expressed as
,
where
is squared correlation between fitted values and
ordination distances, and so related to the “linear fit” of
stressplot
.
Function monoMDS
can fit several alternative NMDS variants that
can be selected with argument model
. The default model =
"global"
fits global NMDS, or Kruskal's (1964a,b) original NMDS
similar to isoMDS
(MASS). Alternative
model = "local"
fits local NMDS where independent monotone
regression is used for each point (Sibson 1972). Alternative
model = "linear"
fits a linear MDS. This fits a linear
regression instead of monotone, and is not identical to metric scaling
or principal coordinates analysis (cmdscale
) that
performs an eigenvector decomposition of dissimilarities (Gower
1966). Alternative model = "hybrid"
implements hybrid MDS that
uses monotone regression for all points and linear regression for
dissimilarities below or at a threshold
dissimilarity in
alternating steps (Faith et al. 1987). Function
stressplot
can be used to display the kind of regression
in each model
.
Scaling, orientation and direction of the axes is arbitrary.
However, the function always centres the axes, and the default
scaling
is to scale the configuration of unit root mean
square and to rotate the axes (argument pc
) to principal
components so that the first dimension shows the major variation.
It is possible to rotate the solution so that the first axis is
parallel to a given environmental variable using function
MDSrotate
.
Returns an object of class "monoMDS"
. The final scores
are returned in item points
(function scores
extracts
these results), and the stress in item stress
. In addition,
there is a large number of other items (but these may change without
notice in the future releases). There are no species scores, but these
can be added with sppscores
function.
NMDS is iterative, and the function stops when any of its
convergence criteria is met. There is actually no criterion of
assured convergence, and any solution can be a local optimum. You
should compare several random starts (or use monoMDS
via
metaMDS
) to assess if the solutions is likely a global
optimum.
The stopping criteria are:
maxit
: Maximum number of iterations. Reaching this
criterion means that solutions was almost certainly not found,
and maxit
should be increased.
smin
: Minimum stress. If stress is nearly zero,
the fit is almost perfect. Usually this means that data set is
too small for the requested analysis, and there may be several
different solutions that are almost as perfect. You should reduce
the number of dimensions (k
), get more data (more
observations) or use some other method, such as metric scaling
(cmdscale
, wcmdscale
).
sratmax
:Change in stress. Values close to one mean almost unchanged stress. This may mean a solution, but it can also signal stranding on suboptimal solution with flat stress surface.
sfgrmin
:Minimum scale factor. Values close to zero mean almost unchanged configuration. This may mean a solution, but will also happen in local optima.
This is the default NMDS function used in
metaMDS
. Function metaMDS
adds support
functions so that NMDS can be run like recommended by Minchin
(1987).
Peter R. Michin (Fortran core) and Jari Oksanen (R interface).
Faith, D.P., Minchin, P.R and Belbin, L. 1987. Compositional dissimilarity as a robust measure of ecological distance. Vegetatio 69, 57–68.
Gower, J.C. (1966). Some distance properties of latent root and vector methods used in multivariate analysis. Biometrika 53, 325–328.
Kruskal, J.B. 1964a. Multidimensional scaling by optimizing goodness-of-fit to a nonmetric hypothesis. Psychometrika 29, 1–28.
Kruskal, J.B. 1964b. Nonmetric multidimensional scaling: a numerical method. Psychometrika 29, 115–129.
Minchin, P.R. 1987. An evaluation of relative robustness of techniques for ecological ordinations. Vegetatio 69, 89–107.
Sibson, R. 1972. Order invariant methods for data analysis. Journal of the Royal Statistical Society B 34, 311–349.
metaMDS
for the vegan way of running NMDS,
and isoMDS
and smacof for some
alternative implementations of NMDS.
data(dune) dis <- vegdist(dune) m <- monoMDS(dis, model = "loc") m plot(m)
data(dune) dis <- vegdist(dune) m <- monoMDS(dis, model = "loc") m plot(m)
Mitchell-Olds & Shaw test concerns the location of the highest (hump)
or lowest (pit) value of a quadratic curve at given points. Typically,
it is used to study whether the quadratic hump or pit is located
within a studied interval. The current test is generalized so that it
applies generalized linear models (glm
) with link
function instead of simple quadratic curve. The test was popularized
in ecology for the analysis of humped species richness patterns
(Mittelbach et al. 2001), but it is more general. With logarithmic
link function, the quadratic response defines the Gaussian response
model of ecological gradients (ter Braak & Looman 1986), and the test
can be used for inspecting the location of Gaussian optimum within a
given range of the gradient. It can also be used to replace Tokeshi's
test of “bimodal” species frequency distribution.
MOStest(x, y, interval, ...) ## S3 method for class 'MOStest' plot(x, which = c(1,2,3,6), ...) fieller.MOStest(object, level = 0.95) ## S3 method for class 'MOStest' profile(fitted, alpha = 0.01, maxsteps = 10, del = zmax/5, ...) ## S3 method for class 'MOStest' confint(object, parm = 1, level = 0.95, ...)
MOStest(x, y, interval, ...) ## S3 method for class 'MOStest' plot(x, which = c(1,2,3,6), ...) fieller.MOStest(object, level = 0.95) ## S3 method for class 'MOStest' profile(fitted, alpha = 0.01, maxsteps = 10, del = zmax/5, ...) ## S3 method for class 'MOStest' confint(object, parm = 1, level = 0.95, ...)
x |
The independent variable or plotting object in |
y |
The dependent variable. |
interval |
The two points at which the test statistic is
evaluated. If missing, the extremes of |
which |
Subset of plots produced. Values |
object , fitted
|
A result object from |
level |
The confidence level required. |
alpha |
Maximum significance level allowed. |
maxsteps |
Maximum number of steps in the profile. |
del |
A step length parameter for the profile (see code). |
parm |
Ignored. |
... |
Other variables passed to functions. Function
|
The function fits a quadratic curve with given
family
and link function. If , this defines a unimodal curve with highest point at
(ter Braak & Looman 1986). If
, the
parabola has a minimum at
and the response is sometimes
called “bimodal”. The null hypothesis is that the extreme
point
is located within the interval given by points
and
. If the extreme point
is exactly at
, then
on shifted axis
. In the
test, origin of
x
is shifted to the values and
, and the test statistic is based on the differences of
deviances between the original model and model where the origin is
forced to the given location using the standard
anova.glm
function (Oksanen et al. 2001).
Mitchell-Olds & Shaw (1987) used the first degree coefficient with
its significance as estimated by the summary.glm
function. This give identical results with Normal error, but for
other error distributions it is preferable to use the test based on
differences in deviances in fitted models.
The test is often presented as a general test for the location of the hump, but it really is dependent on the quadratic fitted curve. If the hump is of different form than quadratic, the test may be insignificant.
Because of strong assumptions in the test, you should use the support
functions to inspect the fit. Function plot(..., which=1)
displays the data points, fitted quadratic model, and its approximate
95% confidence intervals (2 times SE). Function plot
with
which = 2
displays the approximate confidence interval of
the polynomial coefficients, together with two lines indicating the
combinations of the coefficients that produce the evaluated points of
x
. Moreover, the cross-hair shows the approximate confidence
intervals for the polynomial coefficients ignoring their
correlations. Higher values of which
produce corresponding
graphs from plot.lm
. That is, you must add 2 to the
value of which
in plot.lm
.
Function fieller.MOStest
approximates the confidence limits
of the location of the extreme point (hump or pit) using Fieller's
theorem following ter Braak & Looman (1986). The test is based on
quasideviance except if the family
is poisson
or binomial
. Function profile
evaluates the profile
deviance of the fitted model, and confint
finds the profile
based confidence limits following Oksanen et al. (2001).
The test is typically used in assessing the significance of diversity hump against productivity gradient (Mittelbach et al. 2001). It also can be used for the location of the pit (deepest points) instead of the Tokeshi test. Further, it can be used to test the location of the the Gaussian optimum in ecological gradient analysis (ter Braak & Looman 1986, Oksanen et al. 2001).
The function is based on glm
, and it returns the result
of object of glm
amended with the result of the test. The new
items in the MOStest
are:
isHump |
|
isBracketed |
|
hump |
Sorted vector of location of the hump or the pit and the points where the test was evaluated. |
coefficients |
Table of test statistics and their significances. |
Function fieller.MOStest
is based on package optgrad in
the Ecological Archives
(https://figshare.com/articles/dataset/Full_Archive/3521975)
accompanying Oksanen et al. (2001). The Ecological Archive package
optgrad also contains profile deviance method for the location
of the hump or pit, but the current implementation of profile
and confint
rather follow the example of
profile.glm
and confint.glm
in
the MASS package.
Jari Oksanen
Mitchell-Olds, T. & Shaw, R.G. 1987. Regression analysis of natural selection: statistical inference and biological interpretation. Evolution 41, 1149–1161.
Mittelbach, G.C. Steiner, C.F., Scheiner, S.M., Gross, K.L., Reynolds, H.L., Waide, R.B., Willig, R.M., Dodson, S.I. & Gough, L. 2001. What is the observed relationship between species richness and productivity? Ecology 82, 2381–2396.
Oksanen, J., Läärä, E., Tolonen, K. & Warner, B.G. 2001. Confidence intervals for the optimum in the Gaussian response function. Ecology 82, 1191–1197.
ter Braak, C.J.F & Looman, C.W.N 1986. Weighted averaging, logistic regression and the Gaussian response model. Vegetatio 65, 3–11.
The no-interaction model can be fitted with humpfit
.
## The Al-Mufti data analysed in humpfit(): mass <- c(140,230,310,310,400,510,610,670,860,900,1050,1160,1900,2480) spno <- c(1, 4, 3, 9, 18, 30, 20, 14, 3, 2, 3, 2, 5, 2) mod <- MOStest(mass, spno) ## Insignificant mod ## ... but inadequate shape of the curve op <- par(mfrow=c(2,2), mar=c(4,4,1,1)+.1) plot(mod) ## Looks rather like log-link with Poisson error and logarithmic biomass mod <- MOStest(log(mass), spno, family=quasipoisson) mod plot(mod) par(op) ## Confidence Limits fieller.MOStest(mod) confint(mod) plot(profile(mod))
## The Al-Mufti data analysed in humpfit(): mass <- c(140,230,310,310,400,510,610,670,860,900,1050,1160,1900,2480) spno <- c(1, 4, 3, 9, 18, 30, 20, 14, 3, 2, 3, 2, 5, 2) mod <- MOStest(mass, spno) ## Insignificant mod ## ... but inadequate shape of the curve op <- par(mfrow=c(2,2), mar=c(4,4,1,1)+.1) plot(mod) ## Looks rather like log-link with Poisson error and logarithmic biomass mod <- MOStest(log(mass), spno, family=quasipoisson) mod plot(mod) par(op) ## Confidence Limits fieller.MOStest(mod) confint(mod) plot(profile(mod))
Multiple Response Permutation Procedure (MRPP) provides a
test of whether there is a significant difference between two or more
groups of sampling units. Function meandist
finds the mean within
and between block dissimilarities.
mrpp(dat, grouping, permutations = 999, distance = "euclidean", weight.type = 1, strata = NULL, parallel = getOption("mc.cores")) meandist(dist, grouping, ...) ## S3 method for class 'meandist' summary(object, ...) ## S3 method for class 'meandist' plot(x, kind = c("dendrogram", "histogram"), cluster = "average", ylim, axes = TRUE, ...)
mrpp(dat, grouping, permutations = 999, distance = "euclidean", weight.type = 1, strata