
Why is having too much power A BAD THING?
| The Truth | Retain \(H_0\) | Reject \(H_0\) |
|---|---|---|
| \(H_0\) True | Correct! | Type I error (\(\alpha\)) |
| \(H_0\) False | Type II error (\(\beta\)) | Correct! |
In power analysis, \(H_0\) is false, so any non-significant p values are Type II errors.
In a given bunch of simulations, power = \(1 - \beta\)
A common target is a power of 0.8
Remember, this is a balancing act! The more powerful, the more chance of commiting Type I error, because Type I and Type II are inversely related.
doANOVA <- function(sampleSize=20, grandMean=50, errorSD=5, meanDiff=3, showPlot=TRUE) {
library(ggplot2)
myError <- rnorm(sampleSize, mean = 0, sd = errorSD)
groups <- rep(c("groupX", "groupY"), sampleSize/2)
y <- grandMean + c((meanDiff/2)*-1, (meanDiff/2)) + myError
if(showPlot == TRUE) print(qplot(x=groups, y=y, geom="boxplot", fill=groups))
myModel <- lm(y~groups)
#simple way to get the p-value is to make an ANOVA
#table and then pull it out
p <- anova(myModel)[1,5]
return(p)
}
doANOVA(sampleSize=20, errorSD=5, meanDiff=3)
## Warning: package 'ggplot2' was built under R version 3.6.2
## [1] 0.1553212
calculatePower <- function(n){
results<-numeric(500)
for(i in 1:500){
results[i] <- doANOVA(sampleSize=n, errorSD=5, meanDiff=3, showPlot=FALSE)
}
beta <- sum(results>0.05) / length(results)
return(1 - beta)
}
calculatePower(n=20)
## [1] 0.252
We want to detect our effect at least 80% of the time
So, we can start with our sample size of 20, which we know is too low, and increment our sample size up and up until we hit a power of 0.8
currentPower <- 0
sampleSize <- 20
while(currentPower < 0.8){
currentPower <- calculatePower(n=sampleSize)
sampleSize <- sampleSize + 4
}
currentPower
## [1] 0.814
sampleSize
## [1] 96