--- title: "mvabund" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{mvabund} %\VignetteEncoding{UTF-8} %\VignetteEngine{knitr::rmarkdown} editor_options: chunk_output_type: console --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.align='center' ) ``` This vignette takes you through the main functions in `mvabund` to help you get started! We recommend reading the [manuscript associated with the package](https://besjournals.onlinelibrary.wiley.com/doi/10.1111/j.2041-210X.2012.00190.x) and taking a look at `Other Resources` in our README. ### First things first Let's load the package and get our hands on the `Tasmania` data set to look at the effects of disturbance treatment on invertebrate abundances. Note that the `Tasmania` data set is a list object. We will only look at the `copepods` data frame for this walk-through. The `copepods` data frame can be accessed using `Tasmania$copepods` or the `attach()` function which will make the contents of `Tasmania` searchable. ```{r setup} library(mvabund) data(Tasmania) attach(Tasmania) skimr::skim(copepods) # Great function to get an overview of the data ``` ### Visualise the multivariate data We first need to turn our data into a `mvabund` object so functions for this package and work with the data ```{r} copepod_abund <- mvabund(copepods) ``` Now lets take a look at abundance for each species across our treatment sites (Disturbed vs. Undisturbed). Observations were collected using a spatially blocked design where researchers took four samples at each block (2 per treatment). We can set the colour (`col`) of the points to represent that four sampling blocks ```{r, fig.height=6, fig.width= 7} plot(copepod_abund~treatment, col = block) ``` ### Fitting Predictive Models It was hypothesised that the abundance of *Ameira* and *Ectinosoma* was reduced in Disturbed sites, whereas the abundance of *Mictyricola* may have increased. Lets test this hypothesis using the `manyglm()` function. This function fits a generalised linear model for each species. We specified `family = "negative.binomial"` as count data tends to follow a negative binomial distribution. Other distributions are available too! See `?manyglm()` ```{r} cope.nb <- manyglm(copepods ~ treatment*block, family = "negative.binomial") ``` ### Checking Model Assumptions Before we look at the model output, we should check on the model residuals. What we want to see is *little pattern* as this implies that our choice of negative binomial distribution is appropriate. ```{r, fig.height=5, fig.width= 6} plot(cope.nb) ``` Now, lets proceed to check on the *mean-variance relationship*. We want to to see if the *mean-variance relationship* of our data adheres to that of a negative binomial distribution which tends to be quadratic rather than linear. The `meanvar.plot()` function plots the sample variance against the sample mean for each species within each factor level of (`tr.block`). A quadratic relationship seems appropriate for our sample mean and variance. ```{r, fig.height=5, fig.width= 6} meanvar.plot(copepods~tr.block, col = treatment) ``` ### Hypothesis Testing To test whether `treatment` and `block` had an effect on the abundances of copepods we can use the `anova()` function. This function returns a Analysis of Deviance table which tests the significance of each model term. Setting `p.uni = "adjusted"` allows for our p-values to be adjusted for multiple testing of different species. ```{r, c} anova(cope.nb, p.uni = "adjusted") ``` The test statistics in this table are by default calculated by summing the change in deviance across all responses. You could use a different type of test statistic by changing the `test` and `cor.type` arguments. We can see that there is a **significant effect of the treatment factor** meaning that treatment has a significant multiplicative effect on mean abundance. The interaction between blocks and treatments is **not significant**, meaning that the multiplicative treatment effect is consistent across blocks. If you do not have a specific hypothesis in mind that you want to test, and are instead interested in which model terms are statistically significant, then the `summary()` function will come in handy. However results aren’t quite as trustworthy as for `anova()`. The reason is that re-samples are taken under the alternative hypothesis for `summary()`, where there is a greater chance of fitted values being zero, especially for rarer taxa (e.g. if there is a treatment combination in which a taxon is never present). Abundances don’t re-sample well if their predicted mean is zero. ```{r} summary(cope.nb) ``` If obtaining predicted values from the model is the goal, you may use the `predict()` function. Note that `type = response` will produce values on the scale of the response variable (i.e. counts) ```{r} predict(cope.nb, type = "response") ```