Linear Mixed Models

Categorical Explanatory Variables

So far in ANOVA we have treated all categorical vars as the same

There are actually two types of categorical variables

  • fixed effects
  • random effects

We need to deal with random effects differently, but it can be tricky at first to tell them apart

Telling them apart

ANOVA model: \[Y_{ij}=\mu + A_i + \epsilon_{ij}\]

Fixed effects: (\(A_i\)) affect the mean of groups in a meaningful way

Mixed Model: \[Y_{ij}=\mu + A_i + U_i + \epsilon_{ij}\]

Random effects: (\(U_i\)) structure variance, but not in an additive way like fixed effects. Think of these as additional, structured error terms

Tricks for telling them apart

Fixed Effects:

  • meaningful factor labels (e.g., chimp vs. gorilla)
  • small possible range of values….you have data for all of them
  • affect only mean of \(y\) variable
  • effect predictable in different studies (e.g., gorillas predictably larger than chimps)

Random Effects:

  • uninformative factor labels (e.g., site A)
  • very large range of possible values…you have data for a random sample of them
  • affect only the variance of \(y\) variable
  • effect unpredictable or meaningless across studies

Telling them apart - Example

Question: do aggression rates differ by parity status?

You follow a group of 10 habituated monkeys for 2 years. Each dau you watch each monkey for an hour, and record the number of agonistic encounters, social group to which individuals belong, and the parity state of the individual. In the end you have 7300 observations (10 monkeys \(\times\) 365 days in a year \(\times\) 2 years).

Individual Parity AggEncounters SocialGroup
Pat nulliparous 4 GoldStars
Cindy parous 3 BlueStars
Pat parous 7 GoldStars
Louise parous 2 GoldStars

Do aggression rates differ by parity state?

Individual Parity AggEncounters SocialGroup
Pat nulliparous 4 GoldStars
Cindy parous 3 BlueStars
Pat parous 7 GoldStars
Louise parous 2 GoldStars

You have a single response variable: AggEncounters

Three predictor variables: Parity, Individual, SocialGroup

Which are fixed and which are random effects?

When should you choose LMM?

Almost always, you choose LMM because you have one or more fixed effects, but you also have pseudo-replication

Pseudo-replication

Recall from our discussion of ANOVA that independent observations in a treatment group are called replicates

Pseudoreplication occurs when something masquerading as a replicate of the treatment actually isn’t independent

This situation is common, and is easy to identify with a bit of practice.

Pseudo-replication

There are several major kinds of pseudo-replication, most important for us are:

  • temporal pseudoreplication - typically involves repeated measures on same individual
  • spatial pseudoreplication - typically involves observations that may be similar due to spatial association

Challenge: give me some examples of instances where data may be pseudoreplicated

Pseudo-replication

Individual Parity AggEncounters SocialGroup
Pat nulliparous 4 GoldStars
Cindy parous 3 BlueStars
Pat parous 7 GoldStars
Louise parous 2 GoldStars

Recall our dataset of 7300 observations of parity and aggressive encounter rate, measured daily for 2 years for 10 individuals.

This dataset is massively pseudo-replicated…..why?

Nested versus crossed random effects

Crossed effects is the more common situation. Crossed effects occur when all levels of one effect appear within all the levels of a second effect. For example Parity and Social Group are crossed.

Nested effects occur when levels of one effect only occur within a single level of a second effect.

In our primate dataset, Individual is nested within SocialGroup, e.g. Louise and Pat only occur within the GoldStars, and they may have similarities due to being in the same group that we want to account for.

Steps in Linear Mixed Modelling (LMM)

  • decide on the structure of your model (which are fixed and which are random effects)
  • identify any nesting structure
  • specify the formula correctly,

Fake example with primate data and lme4 package

Question: with our primate data, do individuals of parity status differ in aggression rates?

Individual Parity AggEncounters SocialGroup
Pat nulliparous 4 GoldStars
Cindy parous 3 BlueStars
Pat parous 7 GoldStars
Louise parous 2 GoldStars

LMM in R with lme4 package

Models are specified much like normal linear models, using a formula

Random effects are specified as: (1|randomEffect)

The 1 stands in for the intercept. Effectively, you are saying: “allow each level of randomEffect to have its own independent intercept”

Fixed effects are specified just like you always have in standard linear models

LMM in R with lme4 package

Question: with our primate data, do individuals of parity status differ in aggression rates?

Note: this is fake data, the primatedata object doesn’t exist. I just want to show you the model formula.

libray(lme4)
lmer(AggEncounters ~ Parity + (1|SocialGroup/Individual),  data=primatedata)

Notice: Parity is a fixed effect. The nested random effects are given after the | symbol, starting from the most general, to most specific.

An example with real data

library(lme4)
head(sleepstudy, 2)
##   Reaction Days Subject
## 1 249.5600    0     308
## 2 258.7047    1     308

Question: does the number of days a subject has gone without sleep increase their reaction time?

LMM in R with lme4 package

One option is to fit a model with a random effect term to allow each subject to have its own intercept:

library(lme4)
randIntercept <- lmer(Reaction ~ Days + (1 | Subject), sleepstudy)
summary(randIntercept)
## Linear mixed model fit by REML ['lmerMod']
## Formula: Reaction ~ Days + (1 | Subject)
##    Data: sleepstudy
## 
## REML criterion at convergence: 1786.5
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.2257 -0.5529  0.0109  0.5188  4.2506 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  Subject  (Intercept) 1378.2   37.12   
##  Residual              960.5   30.99   
## Number of obs: 180, groups:  Subject, 18
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept) 251.4051     9.7467   25.79
## Days         10.4673     0.8042   13.02
## 
## Correlation of Fixed Effects:
##      (Intr)
## Days -0.371

LMM in R

Another option is to fit a model allowing both the slope and the intercept to vary for each subject:

randSlopeInt <- lmer(Reaction ~ Days + (Days | Subject), sleepstudy)
summary(randIntercept)
## Linear mixed model fit by REML ['lmerMod']
## Formula: Reaction ~ Days + (1 | Subject)
##    Data: sleepstudy
## 
## REML criterion at convergence: 1786.5
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.2257 -0.5529  0.0109  0.5188  4.2506 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  Subject  (Intercept) 1378.2   37.12   
##  Residual              960.5   30.99   
## Number of obs: 180, groups:  Subject, 18
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept) 251.4051     9.7467   25.79
## Days         10.4673     0.8042   13.02
## 
## Correlation of Fixed Effects:
##      (Intr)
## Days -0.371

LMM in R - Is my model any good?

One tool is the Akaike Information Criterion (AIC)

Quantifies goodness of fit, while penalizing for model complexity.

AIC(randIntercept)
## [1] 1794.465
AIC(randSlopeInt)
## [1] 1755.628

AIC is like golf

The lower the score, the better

Model Simplification, in LMM, or in standard linear models

“Everything should be kept as simple as possible, but no simpler.”

  • Albert Einstein (probably never said this…but still)

Example - Isler et al. (2008)

brains <- read.table("https://stats.are-awesome.com/datasets/Isler_et_al_brains.txt", header=TRUE, sep="\t")
library(dplyr)
brains <- brains %>% 
  select(Species, ECV..cc., Body.mass..g., Wild.captive) %>%
  filter(Wild.captive %in% c("Wild", "Captive"))

head(brains)
##        Species ECV..cc. Body.mass..g. Wild.captive
## 1 nigroviridis    59.07            NA         Wild
## 2     belzebul    60.00            NA         Wild
## 3     belzebul    51.00            NA         Wild
## 4     belzebul    54.00            NA         Wild
## 5     belzebul    55.50            NA         Wild
## 6     belzebul    54.00            NA         Wild

Example - Isler et al. (2008)

Simplest model just uses body mass

mod1 <- lm(log(ECV..cc.) ~ log(Body.mass..g.), data=brains)

Slightly more complicated model includes wild versus captive

mod2 <- lm(log(ECV..cc.) ~ log(Body.mass..g.) + Wild.captive, data=brains)

Which model is better? In this simple case, we can look at significance of individual terms, and \(R^2\), but in LMM we don’t have this option.

anova() function - new use for old friend

Recall that anova() doesn’t do analysis of variance, it creates an ANOVA table from a model

You can also compare two models with anova() to test the hypothesis that they are significantly different

The models are compared with a likelihood-ratio test

likelihood ratio test

Only valid for models that are nested: i.e., one is a subset of the other

simpler  <-    lm(resp ~ fac1 + fac2)
morecomplex <- lm(resp ~ fac1 + fac2 + fac3)
anova(simpler, morecomplex)

Note: more complex models ALWAYS fit the data better, but likelihood ratio test asks if this difference is significant

Example - Isler et al. (2008)

anova(mod1, mod2)
## Analysis of Variance Table
## 
## Model 1: log(ECV..cc.) ~ log(Body.mass..g.)
## Model 2: log(ECV..cc.) ~ log(Body.mass..g.) + Wild.captive
##   Res.Df    RSS Df Sum of Sq      F    Pr(>F)    
## 1   1993 227.02                                  
## 2   1992 225.44  1    1.5808 13.968 0.0001912 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Generalized Linear Mixed Models

Like any other linear models, a basic assumption of general linear mixed-models is a normal error term

However, the structure of data may make this assumption invalid (e.g. binary data or count data)

Two very common types of non-normal error structures are:

  • poisson - for count data
  • binomial - for binary (e.g. presence absence data) or proportion data

GLMM in R

lmer(response ~ fixedEffect + (1|randomEffect), family="poisson")