---
title: "mvabund"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{mvabund}
  %\VignetteEncoding{UTF-8}
  %\VignetteEngine{knitr::rmarkdown}
editor_options: 
  chunk_output_type: console
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.align='center'
  )
```

This vignette takes you through the main functions in `mvabund` to help you get started! We recommend reading the [manuscript associated with the package](https://besjournals.onlinelibrary.wiley.com/doi/10.1111/j.2041-210X.2012.00190.x) and taking a look at `Other Resources` in our README.

### First things first

Let's load the package and get our hands on the `Tasmania` data set to look at the effects of disturbance treatment on invertebrate abundances. Note that the `Tasmania` data set is a list object. We will only look at the `copepods` data frame for this walk-through. The `copepods` data frame can be accessed using `Tasmania$copepods` or the `attach()` function which will make the contents of `Tasmania` searchable. 

```{r setup}
library(mvabund)

data(Tasmania) 
attach(Tasmania)

skimr::skim(copepods) # Great function to get an overview of the data
```

### Visualise the multivariate data 

We first need to turn our data into a `mvabund` object so functions for this package and work with the data

```{r}
copepod_abund <- mvabund(copepods)
```

Now lets take a look at abundance for each species across our treatment sites (Disturbed vs. Undisturbed). Observations were collected using a spatially blocked design where researchers took four samples at each block (2 per treatment). We can set the colour (`col`) of the points to represent that four sampling blocks

```{r, fig.height=6, fig.width= 7}
plot(copepod_abund~treatment, col = block)
```

### Fitting Predictive Models

It was hypothesised that the abundance of *Ameira* and *Ectinosoma* was reduced in Disturbed sites, whereas the abundance of *Mictyricola* may have increased. Lets test this hypothesis using the `manyglm()` function. This function fits a generalised linear model for each species. We specified `family =  "negative.binomial"` as count data tends to follow a negative binomial distribution. Other distributions are available too! See `?manyglm()`

```{r}
cope.nb <- manyglm(copepods ~ treatment*block, family =  "negative.binomial")
```

### Checking Model Assumptions

Before we look at the model output, we should check on the model residuals. What we want to see is *little pattern* as this implies that our choice of negative binomial distribution is appropriate. 

```{r, fig.height=5, fig.width= 6}
plot(cope.nb)
```

Now, lets proceed to check on the *mean-variance relationship*. We want to to see if the *mean-variance relationship* of our data adheres to that of a negative binomial distribution which tends to be quadratic rather than linear. The `meanvar.plot()` function plots the sample variance against the sample mean for each species within each factor level of (`tr.block`). A quadratic relationship seems appropriate for our sample mean and variance.

```{r, fig.height=5, fig.width= 6}
meanvar.plot(copepods~tr.block, col = treatment)
```

### Hypothesis Testing

To test whether `treatment` and `block` had an effect on the abundances of copepods we can use the `anova()` function. This function returns a Analysis of Deviance table which tests the significance of each model term. Setting `p.uni = "adjusted"` allows for our p-values to be adjusted for multiple testing of different species. 

```{r, c}
anova(cope.nb, p.uni = "adjusted")
```
The test statistics in this table are by default calculated by summing the change in deviance across all responses.  You could use a different type of test statistic by changing the `test` and `cor.type` arguments. We can see that there is a **significant effect of the treatment factor** meaning that treatment has a significant multiplicative effect on mean abundance. The interaction between blocks and treatments is **not significant**, meaning that the multiplicative treatment effect is consistent across blocks. 

If you do not have a specific hypothesis in mind that you want to test, and are instead interested in which model terms are statistically significant, then the `summary()` function will come in handy. However  results aren’t quite as trustworthy as for `anova()`. The reason is that re-samples are taken under the alternative hypothesis for `summary()`, where there is a greater chance of fitted values being zero, especially for rarer taxa (e.g. if there is a treatment combination in which a taxon is never present). Abundances don’t re-sample well if their predicted mean is zero.

```{r}
summary(cope.nb) 
```

If obtaining predicted values from the model is the goal, you may use the `predict()` function. Note that `type = response` will produce values on the scale of the response variable (i.e. counts)

```{r}
predict(cope.nb, type = "response") 
```