So far in ANOVA we have treated all categorical vars as the same
There are actually two types of categorical variables
We need to deal with random effects differently, but it can be tricky at first to tell them apart
ANOVA model: \[Y_{ij}=\mu + A_i + \epsilon_{ij}\]
Fixed effects: (\(A_i\)) affect the mean of groups in a meaningful way
Mixed Model: \[Y_{ij}=\mu + A_i + U_i + \epsilon_{ij}\]
Random effects: (\(U_i\)) structure variance, but not in an additive way like fixed effects. Think of these as additional, structured error terms
Fixed Effects:
Random Effects:
Question: do aggression rates differ by parity status?
You follow a group of 10 habituated monkeys for 2 years. Each dau you watch each monkey for an hour, and record the number of agonistic encounters, social group to which individuals belong, and the parity state of the individual. In the end you have 7300 observations (10 monkeys \(\times\) 365 days in a year \(\times\) 2 years).
| Individual | Parity | AggEncounters | SocialGroup |
|---|---|---|---|
| Pat | nulliparous | 4 | GoldStars |
| Cindy | parous | 3 | BlueStars |
| Pat | parous | 7 | GoldStars |
| Louise | parous | 2 | GoldStars |
| … | … | … | … |
| Individual | Parity | AggEncounters | SocialGroup |
|---|---|---|---|
| Pat | nulliparous | 4 | GoldStars |
| Cindy | parous | 3 | BlueStars |
| Pat | parous | 7 | GoldStars |
| Louise | parous | 2 | GoldStars |
| … | … | … | … |
You have a single response variable: AggEncounters
Three predictor variables: Parity, Individual, SocialGroup
Which are fixed and which are random effects?
Almost always, you choose LMM because you have one or more fixed effects, but you also have pseudo-replication
Recall from our discussion of ANOVA that independent observations in a treatment group are called replicates
Pseudoreplication occurs when something masquerading as a replicate of the treatment actually isn’t independent
This situation is common, and is easy to identify with a bit of practice.
There are several major kinds of pseudo-replication, most important for us are:
Challenge: give me some examples of instances where data may be pseudoreplicated
| Individual | Parity | AggEncounters | SocialGroup |
|---|---|---|---|
| Pat | nulliparous | 4 | GoldStars |
| Cindy | parous | 3 | BlueStars |
| Pat | parous | 7 | GoldStars |
| Louise | parous | 2 | GoldStars |
| … | … | … | … |
Recall our dataset of 7300 observations of parity and aggressive encounter rate, measured daily for 2 years for 10 individuals.
This dataset is massively pseudo-replicated…..why?
Crossed effects is the more common situation. Crossed effects occur when all levels of one effect appear within all the levels of a second effect. For example Parity and Social Group are crossed.
Nested effects occur when levels of one effect only occur within a single level of a second effect.
In our primate dataset, Individual is nested within SocialGroup, e.g. Louise and Pat only occur within the GoldStars, and they may have similarities due to being in the same group that we want to account for.
lme4 packageQuestion: with our primate data, do individuals of parity status differ in aggression rates?
| Individual | Parity | AggEncounters | SocialGroup |
|---|---|---|---|
| Pat | nulliparous | 4 | GoldStars |
| Cindy | parous | 3 | BlueStars |
| Pat | parous | 7 | GoldStars |
| Louise | parous | 2 | GoldStars |
| … | … | … | … |
lme4 packageModels are specified much like normal linear models, using a formula
Random effects are specified as: (1|randomEffect)
The 1 stands in for the intercept. Effectively, you are saying: “allow each level of randomEffect to have its own independent intercept”
Fixed effects are specified just like you always have in standard linear models
lme4 packageQuestion: with our primate data, do individuals of parity status differ in aggression rates?
Note: this is fake data, the primatedata object doesn’t exist. I just want to show you the model formula.
libray(lme4) lmer(AggEncounters ~ Parity + (1|SocialGroup/Individual), data=primatedata)
Notice: Parity is a fixed effect. The nested random effects are given after the | symbol, starting from the most general, to most specific.
library(lme4) head(sleepstudy, 2)
## Reaction Days Subject ## 1 249.5600 0 308 ## 2 258.7047 1 308
Question: does the number of days a subject has gone without sleep increase their reaction time?
lme4 packageOne option is to fit a model with a random effect term to allow each subject to have its own intercept:
library(lme4) randIntercept <- lmer(Reaction ~ Days + (1 | Subject), sleepstudy) summary(randIntercept)
## Linear mixed model fit by REML ['lmerMod'] ## Formula: Reaction ~ Days + (1 | Subject) ## Data: sleepstudy ## ## REML criterion at convergence: 1786.5 ## ## Scaled residuals: ## Min 1Q Median 3Q Max ## -3.2257 -0.5529 0.0109 0.5188 4.2506 ## ## Random effects: ## Groups Name Variance Std.Dev. ## Subject (Intercept) 1378.2 37.12 ## Residual 960.5 30.99 ## Number of obs: 180, groups: Subject, 18 ## ## Fixed effects: ## Estimate Std. Error t value ## (Intercept) 251.4051 9.7467 25.79 ## Days 10.4673 0.8042 13.02 ## ## Correlation of Fixed Effects: ## (Intr) ## Days -0.371
Another option is to fit a model allowing both the slope and the intercept to vary for each subject:
randSlopeInt <- lmer(Reaction ~ Days + (Days | Subject), sleepstudy) summary(randIntercept)
## Linear mixed model fit by REML ['lmerMod'] ## Formula: Reaction ~ Days + (1 | Subject) ## Data: sleepstudy ## ## REML criterion at convergence: 1786.5 ## ## Scaled residuals: ## Min 1Q Median 3Q Max ## -3.2257 -0.5529 0.0109 0.5188 4.2506 ## ## Random effects: ## Groups Name Variance Std.Dev. ## Subject (Intercept) 1378.2 37.12 ## Residual 960.5 30.99 ## Number of obs: 180, groups: Subject, 18 ## ## Fixed effects: ## Estimate Std. Error t value ## (Intercept) 251.4051 9.7467 25.79 ## Days 10.4673 0.8042 13.02 ## ## Correlation of Fixed Effects: ## (Intr) ## Days -0.371
One tool is the Akaike Information Criterion (AIC)
Quantifies goodness of fit, while penalizing for model complexity.
AIC(randIntercept)
## [1] 1794.465
AIC(randSlopeInt)
## [1] 1755.628
The lower the score, the better
brains <- read.table("https://stats.are-awesome.com/datasets/Isler_et_al_brains.txt", header=TRUE, sep="\t")
library(dplyr)
brains <- brains %>%
select(Species, ECV..cc., Body.mass..g., Wild.captive) %>%
filter(Wild.captive %in% c("Wild", "Captive"))
head(brains)
## Species ECV..cc. Body.mass..g. Wild.captive ## 1 nigroviridis 59.07 NA Wild ## 2 belzebul 60.00 NA Wild ## 3 belzebul 51.00 NA Wild ## 4 belzebul 54.00 NA Wild ## 5 belzebul 55.50 NA Wild ## 6 belzebul 54.00 NA Wild
Simplest model just uses body mass
mod1 <- lm(log(ECV..cc.) ~ log(Body.mass..g.), data=brains)
Slightly more complicated model includes wild versus captive
mod2 <- lm(log(ECV..cc.) ~ log(Body.mass..g.) + Wild.captive, data=brains)
Which model is better? In this simple case, we can look at significance of individual terms, and \(R^2\), but in LMM we don’t have this option.
anova() function - new use for old friendRecall that anova() doesn’t do analysis of variance, it creates an ANOVA table from a model
You can also compare two models with anova() to test the hypothesis that they are significantly different
The models are compared with a likelihood-ratio test
Only valid for models that are nested: i.e., one is a subset of the other
simpler <- lm(resp ~ fac1 + fac2) morecomplex <- lm(resp ~ fac1 + fac2 + fac3) anova(simpler, morecomplex)
Note: more complex models ALWAYS fit the data better, but likelihood ratio test asks if this difference is significant
anova(mod1, mod2)
## Analysis of Variance Table ## ## Model 1: log(ECV..cc.) ~ log(Body.mass..g.) ## Model 2: log(ECV..cc.) ~ log(Body.mass..g.) + Wild.captive ## Res.Df RSS Df Sum of Sq F Pr(>F) ## 1 1993 227.02 ## 2 1992 225.44 1 1.5808 13.968 0.0001912 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Like any other linear models, a basic assumption of general linear mixed-models is a normal error term
However, the structure of data may make this assumption invalid (e.g. binary data or count data)
Two very common types of non-normal error structures are:
lmer(response ~ fixedEffect + (1|randomEffect), family="poisson")