A wrapper around various PCoA-based analyses implemented in vegan. The wrapper can handle groups and metadata. PCoA is carried out sample-wise. The na.action is set to na.omit, however envfit cannot deal with missing values, therefore if metadata are provided, they should be free of missing values.

seqPCoA(
  abundances,
  reference = NULL,
  rarefyRef = FALSE,
  addToRefStepwise = FALSE,
  refName = "ref",
  metadata = NULL,
  groupAttrib = "",
  groups = c(),
  groupColors = NULL,
  colors = c(),
  clusters = c(),
  labels = c(),
  sizes = c(),
  size.legend = "",
  time = c(),
  hiddenTaxa = c(),
  hiddenSamples = c(),
  dis = "bray",
  rda = FALSE,
  scale = FALSE,
  doScree = FALSE,
  topTaxa = 10,
  topMetadata = 10,
  arrowFactor = 0.5,
  metadataFactor = 1,
  centroidFactor = 1,
  taxonColor = "brown",
  metadataColor = "blue",
  hideGroupLegend = FALSE,
  drawEllipse = FALSE,
  ellipseOnClusters = FALSE,
  ellipseColorMap = NULL,
  clusterQualityIndex = "silhouette",
  groupDispersion = FALSE,
  xlim = NULL,
  ylim = NULL,
  permut = 1000,
  env.permut = 1000,
  pAdjMethod = "BH",
  qvalThreshold = 0.05,
  ellipseConf = 0.95,
  dimensions = c(1, 2),
  ...
)

Arguments

abundances

a matrix with taxa as rows and samples as columns

reference

an optional reference data set on which abundances are mapped; data are merged by matching row names (nonmatching ones are kept as sum); cannot be combined with rda or topMetadata (topMetadata needs to be set to zero)

rarefyRef

rarefy abundance and reference samples to the minimum total count found in any of the samples; recommended when the total counts differ

addToRefStepwise

compute ordination coordinates for each sample added to the reference separately; cannot be combined with metadata, rda or drawEllipse

refName

group name for reference samples

metadata

an optional data frame with metadata items as columns, where samples are in the same order as in abundances and data types (factor vs numeric) are supposed to be correct; if provided and rda is FALSE, envfit is carried out

groupAttrib

optional: the name of a metadata item that refers to a vector that provides for each sample its group membership

groups

an optional vector that provides for each sample its group membership and which is overridden by groupAttrib, if provided

groupColors

an optional map of predefined colors for groups that matches names in groups (which should be strings); adds a color legend to the plot; if reference is provided, refName is added if absent

colors

an optional vector of colors with entries in the same order as samples to be colored; it overrides groupColors if provided

clusters

an optional vector that provides for each sample its cluster membership (cluster membership is visualized through shape, up to 10 different shapes are possible)

labels

an optional vector that provides for each sample a label to display

sizes

a vector of numeric values that will be displayed as varying sample sizes (sizes will be shifted into positive range if necessary and scaled between 0.5 and 2.5)

size.legend

a string displayed as a legend for size

time

an optional vector with as many time points as samples, adds arrows between consecutive time points (time points should be in ascending order)

hiddenTaxa

an optional vector with names of taxa to be hidden (they will be taken into account for PCoA/RDA/envfit, but are not displayed among top co-varying taxa)

hiddenSamples

an optional vector with indices of samples to be hidden (they will be taken into account for PCoA/RDA/envfit, but are not displayed)

dis

dissimilarity or distance supported by vegan's vegdist function (if set to cor, a PCA is carried out using vegan's function rda with scale set to true)

rda

carry out an RDA instead of a PCoA using vegan's capscale function

scale

scale numeric metadata (subtract the mean and divide by standard deviation)

doScree

do a Scree plot

topTaxa

if larger than zero: show the top N taxa most strongly covarying with principal components as arrows in the PCoA if they are significant according to a permutation test

topMetadata

if larger than zero, metadata provided and rda false: show the top N most significant numeric metadata as arrows and the top N most significant factor metadata as text in the PCoA

arrowFactor

the length of taxon arrows (determined by scaled covariance) is multiplied with this factor

metadataFactor

the length of numeric metadata arrows (determined by Pearson correlation) is multiplied with this factor

centroidFactor

centroid positions (representing categoric metadata) are multiplied with this factor

taxonColor

the color of the taxon arrows and text

metadataColor

the color of the metadata arrows and text

hideGroupLegend

do not show the group legend

drawEllipse

if groups or groupAttrib given, draw polygons encapsulating groups using vegan's ordiellipse function (kind is sd, conf given via ellipseConf); print adonis R2 and p-value (permutation number given via env.permut)

ellipseOnClusters

draw ellipse around cluster members (indicated by shape) instead of group members (indicated by color)

ellipseColorMap

color ellipses according to this color list; entries have to be group names or, if ellipseOnClusters is true, cluster names; overrides groupColors for ellipses

clusterQualityIndex

if groups or groupAttrib given, report cluster quality according to silhouette function or any criterium supported by package clusterCrit (default: silhouette, set to none to disable computation of cluster quality)

groupDispersion

if groups or groupAttrib given, report Tukey's HSD test on differences in group dispersions (avg distance of group members to centroid) and do boxplot (wraps vegan's betadisper)

xlim

range shown on the x axis, by default the minimum and maximum of the first selected component

ylim

range shown on the y axis, by default the minimum and maximum of the second selected component

permut

number of permutations for top-covarying taxa; if NA, NULL or smaller than 1, no permutation test is carried out

env.permut

number of permutations for envfit, if drawEllipse is true, for adonis

pAdjMethod

method for multiple testing correction supported by p.adjust for top-covarying taxon and envfit p-values

qvalThreshold

threshold on multiple-testing corrected top-covarying taxon and envfit p-values

ellipseConf

confidence limit for drawEllipse

dimensions

the principal components used for plotting, by default the first and second

\dots

Additional arguments passed to plot()

Details

When a reference and groups are provided and the number of group memberships does not equal the number of samples in the combined abundance table, groups are automatically extended such that reference samples are assigned to a single group with name refName, which is colored in gray. The color vector is likewise extended if provided. If a clusters, time and/or labels vector is provided together with a referene, it has to refer to both data sets. Samples in the abundance matrix are appended after the reference samples, so cluster memberships, time points and/or labels have to be provided in the same order. Different total counts in abundances and reference samples may bias the result, therefore rarefyRef allows to rarefy both to the same total count after matching. RarefyRef will rarefy all samples to the lowest total count found in any sample. Rows with zero counts after rarefaction are removed. When topTaxa is set larger zero, significant top-varying taxa are shown. The permutation test is carried out by shuffling the selected number of top-covarying taxa. Multiple testing correction on parameter-free p-values is then only applied to these top-covarying taxa. The strength of covariance is determined as the norm of the vectors resulting from multiplying the standardized eigen vectors with the taxa. In contrast, envfit p-values are computed for all metadata and multiple-testing correction is consequently applied to all metadata provided, though only the selected number of most significant metadata are shown. Thus, topTaxa ranks taxa by covariance with significance as a filter, whereas topMetadata ranks metadata by significance. The number in the axis label brackets refers to the proportion of variance explained as computed with vegan's eigenvals function. Note that ordination values in plots are not scaled (equivalent to plot.cca(scaling=0)). If groups are provided, drawEllipse can be enabled to draw ellipses with vegan's ordiellipse and to run vegan's adonis (PERMANOVA) to assess whether group composition differs significantly. Adonis is sensitive to differences in within-group variation. To assess whether within-group variation differs, betadisper can be carried out by enabling groupDispersion. In addition, if groups are provided, seqPCoA computes cluster quality indices to assess group separation with package clusterCrit. Concerning interpretation of cluster quality indices: The silhouette index ranges between -1 and 1, a large Dunn's, Silhouette or Calinski-Harabasz index and a small Davies-Bouldin or C-index indicate well-defined clusters, respectively.

Examples

data("ibd_taxa") data("ibd_metadata") ibd_metadata=assignMetadataTypes(ibd_metadata,categoric=c("SRA_metagenome_name","Diagnosis"))
#> [1] "Metadata SRA_metagenome_name is categoric" #> [1] "Metadata Age is numeric" #> [1] "Metadata Diagnosis is categoric" #> [1] "Metadata Fecal.Calprotectin is numeric" #> [1] "Metadata antibiotic is binary" #> [1] "Metadata immunosuppressant is binary" #> [1] "Metadata mesalamine is binary" #> [1] "Metadata steroids is binary"
seqPCoA(ibd_taxa,groups=as.vector(ibd_metadata$Diagnosis),topTaxa=30, drawEllipse=TRUE)
#> [1] "Adonis to test for significant difference in group compositions" #> [1] "Adonis R2: 0.0604, p-value: 0.001" #> [1] "Cluster quality index silhouette" #> [1] -0.02240243 #> [1] "Among the top 30 covarying taxa, 2 are significant." #> [1] "Subdoligranulum_unclassified" #> [1] "Escherichia_unclassified"
# remove 65 samples with missing calprotectin measurements or other missing values in the metadata na.indices=unique(which(is.na(ibd_metadata),arr.ind=TRUE)[,1]) indices.to.keep=setdiff(1:nrow(ibd_metadata),na.indices) ibd_metadata=ibd_metadata[indices.to.keep,] ibd_taxa=ibd_taxa[,indices.to.keep] seqPCoA(ibd_taxa,metadata=ibd_metadata,groups=as.vector(ibd_metadata$Diagnosis),topTaxa=30)
#> [1] "1 significant numeric metadata found, in order of significance:" #> [1] "Fecal.Calprotectin" #> [1] "4 significant categoric metadata found, in order of significance:" #> [1] "Diagnosis" #> [1] "antibiotic" #> [1] "immunosuppressant" #> [1] "steroids"
#> [1] "Cluster quality index silhouette" #> [1] -0.01835481 #> [1] "Among the top 30 covarying taxa, 4 are significant." #> [1] "Faecalibacterium_prausnitzii" #> [1] "Ruminococcus_bromii" #> [1] "Streptococcus_thermophilus" #> [1] "Bacteroides_uniformis"