---
title: "Chapter 6 -- Mixed effect models -- Exercise solutions and Code Boxes"
author: "David Warton"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Chapter 6 -- Mixed effect models -- Exercise solutions and Code Boxes}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```


## Exercise 6.1: Effects of water pollution on subtidal marine micro-invertebrates

*What factors are there? Fixed or random?*

He samples in seven estuaries along the New South Wales coast (three of which are "Pristine", four are "Modified"), and in each estuary, he takes 4-7 samples and counts the creepy crawlies therein.

*Modification* is a factor, taking levels "Pristine" and "Modified"

*Estuary* is a factor, taking seven levels. If these were sampled randomly, and we want to make inferences across all estuaries on the New South Wales coast, it oculd be treated as a random factor.

The 4-7 samples at each estuary are the replicates, so they shouldn't be added to the model,  variation in replicates will enter through the error term.


## code for Fig 6.1
```{r fig6.1, fig.width=6, fig.height=4}
library(ecostats)
data(estuaries)
plot(Total~Estuary,data=estuaries,col=c(4,2,2,4,2,4,2))
legend("bottomleft",legend=c("Modified","Pristine"),col=c(4,2),pch=15,pt.cex=2)
```


## Code Box 6.1: Fitting a linear mixed model to the estuary data of Exercise 6.1

```{r box6.1}
library(ecostats)
data(estuaries)
library(lme4)
ft_estu = lmer(Total~Mod+(1|Estuary),data=estuaries)
summary(ft_estu)
```

There is some evidence of an effect of `Mod`, since the estimated coefficient is more than double its standard error (so a 95\% confidence interval would not cover zero). The effect appears to be a decrease in total abundance in pristine estuaries.


## Code Box 6.2: Residual plots from a mixed model for Exercise 6.1

```{r code62, warning=FALSE, fig.width=8, fig.height=4}
par(mfrow=c(1,2),mar=c(3,3,1,1),mgp=c(1.75,0.75,0))
ft_estu = lmer(Total~Mod+(1|Estuary),data=estuaries)
scatter.smooth(residuals(ft_estu)~fitted(ft_estu),
    xlab="Fitted values",ylab="Residuals")
abline(h=0,col="red")
scatter.smooth(residuals(ft_estu)~predict(ft_estu,re.form=NA),
    xlab="Fitted values (no random effects)",ylab="Residuals")
abline(h=0,col="red")
```

## Code Box 6.3: Using `anova` to compare mixed effects models for the estuary data
```{r code63}
ft_estu = lmer(Total~Mod+(1|Estuary),data=estuaries,REML=F)
ft_estuInt = lmer(Total~(1|Estuary),data=estuaries,REML=F)
anova(ft_estuInt,ft_estu)
```

There is some evidence of an effect of modification.


## Code Box 6.4: Confidence intervals for parameters from a mixed effects model for the estuary data
```{r code64}
confint(ft_estu)
```


## Code Box 6.5: Prediction intervals for random effects terms in a mixed effects model
```{r code65, fig.width=5, fig.height=4}
rft=ranef(ft_estu,condVar=T)
library(lattice)
dotplot(rft)
```

## Exercise 6.2: Fitting random effects with different variances

```{r ex62}
estuaries$isMod = as.numeric(estuaries$Mod=="Modified")
estuaries$isPri = as.numeric(estuaries$Mod!="Modified")
ft_estuDiff = lmer(Total~Mod+(0+isMod|Estuary)+(0+isPri|Estuary),data=estuaries,REML=F)
summary(ft_estuDiff)
BIC(ft_estu,ft_estuDiff)
```

BIC suggests that we didn't need a different variance term for each level of `Mod`. (It also estimated the cross-estuary variance to be zero for modified estuaries, leading to a warning in the output.)


## Exercise 6.3: Bird exclusion and biological control
```{r ex63}
data(aphidsBACI)
str(aphidsBACI)
```

OK so we are looking for a `Treatment:Time` interaction, but to account for repeated measures of each plot, we want a random effect for `Plot` in the model.

```{r ex63mod}
ft_aphids=lmer(logcount~Treatment*Time+(1|Plot),data=aphidsBACI)
ft_aphidNull=lmer(logcount~Time+(1|Plot),data=aphidsBACI)
anova(ft_aphidNull,ft_aphids)
```

Which gives us marginal evidence of an effect of bird exclusion of aphid counts.

Compare this to what we got when adding `Plot` as a fixed effect:
```{r ex63an}
lm_aphids=lm(logcount~Plot+Treatment*Time,data=aphidsBACI)
anova(lm_aphids)
```

Interestingly, this $P$-value is a lot smaller.

```{r ex63summ}
summary(lm_aphids)
summary(ft_aphids)
```

We get the same point estimate for the `Treatment:Time` effect, but the standard error is slightly smaller in the random effects model.


## Exercise 6.4: Estuary data in different zones

```{r ex64 plot, fig.width=8, fig.height=4}
data(estuaryZone)
cols=c("blue","red","lightblue","pink")
plot(Total~interaction(Estuary,Zone),data=estuaryZone,col=cols[c(1,2,2,1,2,1,2,3,4,4,3,4,3,4)])
legend("bottomright",legend=c("Mod-Inner","Prist-Inner","Mod-Outer","Pris-Outer"),col=cols,pch=15,pt.cex=2)
```

It looks like there is an effect of Modification, not sure if there is an interaction (the effect seems more striking in Outer than Inner zones)

```{r ex64 lme, fig.width=8, fig.height=4}
par(mfrow=c(1,2),mar=c(3,3,1,1),mgp=c(1.75,0.75,0))
library(lme4)
lme_MZ = lmer(Total~Zone*Mod + (Zone|Estuary), data=estuaryZone )

scatter.smooth(residuals(lme_MZ)~fitted(lme_MZ),
    xlab="Fitted values",ylab="Residuals")
abline(h=0,col="red")
scatter.smooth(residuals(lme_MZ)~predict(lme_MZ,re.form=NA),
    xlab="Fitted values (no random effects)",ylab="Residuals")
abline(h=0,col="red")

```

There is a suggestion of less total abundance as fitted values increase, which is super-weird. But it's not too alarming...


```{r ex64 anova}
lme_MplusZ = lmer(Total~Zone+Mod + (Zone|Estuary), data=estuaryZone )
anova(lme_MplusZ,lme_MZ)
```

No evidence of an interaction between Zone and Modification. Testing for a `Mod` main effect:

```{r ex64 mod}
lme_Z = lmer(Total~Zone + (Zone|Estuary), data=estuaryZone )
anova(lme_Z,lme_MplusZ)
```

There is strong evidence that total abundance is different between Modified and Pristine estuaries. The boxplot suggests abundance is higher in Modified estuaries, and looking at the data, this appears to be mostly due to high counts of `Balanus.variegatus`, especially in outer modified zones.


## Code Box 6.6: Using the parametric bootstrap to compute the standard error of the `Mod` fixed effect in Exercise 6.1.

```{r box6.6, message=FALSE}
nBoot=500
bStat=rep(NA,nBoot)
ft_estu = lmer(Total~Mod+(1|Estuary),data=estuaries)
for(iBoot in 1:nBoot)
{
   estuaries$TotalSim=unlist(simulate(ft_estu))
   ft_i = lmer(TotalSim~Mod+(1|Estuary),data=estuaries)
   bStat[iBoot] = fixef(ft_i)[2]
}
sd(bStat) #standard error of Mod effect
```

And if we compare this to the standard error from `summary`:
```{r summft_estu}
summary(ft_estu)
```

we see the estimated standard error is `r summary(ft_estu)$coef[2,2]`, which is pretty close to the value we got by simulation.


## Code Box 6.7: A parametric bootstrap to test for an effect of Estuary in Exercise 6.1.

```{r box67, message=FALSE}
ft_noestu = lm(Total~Mod,data=estuaries)
library(ecostats)
anovaPB(ft_noestu,ft_estu,n.sim=99)
```

So we have no evidence of an `Estuary` effect.


## Exercise 6.6: Accurate inferences about the estuary data

_Use the parametric bootstrap to get a formal test for a `zone:mod` interaction._

We can just run the old analysis and change from `anova` to `anovaPB`:

```{r ex66 interaction, message=FALSE}
lme_MZ = lmer(Total~Zone*Mod + (Zone|Estuary), data=estuaryZone, REML=FALSE )
lme_MplusZ = lmer(Total~Zone+Mod + (Zone|Estuary), data=estuaryZone, REML=FALSE )
anovaPB(lme_MplusZ,lme_MZ,n.sim=99)
```

There is no evidence of an interaction.

(Ignore the warnings in the output -- this is random stuff that was thrown up in bootstrap resamples that didn't get a good fit.)

_How do results compare to those from when you were using the `anova` function?_

Results are similar to what we saw before. The only thing that is different is the $P$-value, but it is very similar (suggesting there was no need for a parametric bootstrap here!).

_This would all have been so much easier if there wasn't a random effect in the model... do we really need `Estuary` in there?_

```{r ex66 estuary, message=FALSE}
lme_MZ = lmer(Total~Zone*Mod + (Zone|Estuary), data=estuaryZone, REML=FALSE )
lme_MZnoest = lm(Total~Zone+Mod, data=estuaryZone)
anovaPB(lme_MZnoest,lme_MZ,n.sim=99)
```

We have no evidence of an `Estuary` effect either!