This vignette takes you through a case study where offsets
are used during analysis with mvabund
. We recommend reading
the “Analysing discrete data” chapter from the Ecostats
textbook.
Under some circumstances, we may tempted to standardise or average our abundance counts across replicates before analysing them. Often this is to account for differences in sampling intensity (different sized plots, different numbers of replicates), but we caution users from doing so! Standardisation/averaging counts can change the distribution of your data, and affect the mean-variance relationship. Instead you should use an offset, a term specially designed to account for known sources of variation among replicates.
Lets take a look at a case study that uses an offset
in mvabund
Anthony wants to evaluate how well earthworms numbers were doing after some bush regeneration efforts. Here are some worm counts from pitfall traps across each of 10 sites (where C=control, R=bush regen). Notice there were only 4 pitfall traps at one site!
Treatment | C | R | R | R | C | R | R | R | R | R |
Count | 0 | 3 | 1 | 3 | 1 | 2 | 12 | 1 | 18 | 0 |
# pitfalls | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 4 | 5 | 5 |
Anthony would like to model the number of worms
(Haplotaxida
) from the reveg
dataset as a
function of treatment (bush regeneration) while accounting variation in
sampling effort.
Lets load mvabund
and the dataset first:
library(mvabund)
library(ecostats)
data(reveg)
attach(reveg)
skimr::skim(abund$Haplotaxida) # Great function to get an overview of the data
Name | abund$Haplotaxida |
Number of rows | 10 |
Number of columns | 1 |
_______________________ | |
Column type frequency: | |
numeric | 1 |
________________________ | |
Group variables | None |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
data | 0 | 1 | 4.1 | 6.01 | 0 | 1 | 1.5 | 3 | 18 | ▇▁▁▁▁ |
Now we will use the manyglm
function. We specified
family = "negative.binomial"
as count data often follows a
negative binomial distribution. We have also included an offset term to
deal with the fact that sampling effort varied across sites – with 5
pitfall traps at most sites but just 4 at one site. Offsets are most
commonly used in a log-linear model, where an offset is a predictor
known to have an exactly proportional effect on the response. The offset
in Anthony’s model for is log(pitfalls)
. By including this
term, our model changes from predicting mean abundance to modelling mean
abundance per pitfall trap. This is what you would be trying to do if
you averaged the counts across pitfalls prior to analysing them, but
using an offset achieves this without messing with the data and its
mean-variance relationship.
worms_offset <- manyglm(Haplotaxida~treatment+offset(log(pitfalls)), family="negative.binomial", data=abund)
anova(worms_offset)
#> Time elapsed: 0 hr 0 min 0 sec
#> Analysis of Deviance Table
#>
#> Model: Haplotaxida ~ treatment + offset(log(pitfalls))
#>
#> Multivariate test:
#> Res.Df Df.diff Dev Pr(>Dev)
#> (Intercept) 9
#> treatment 8 1 2.889 0.134
#> Arguments: P-value calculated using 999 iterations via PIT-trap resampling.