Enjoy R

How the probability of type II error varies

The type II error is what we may call a miss.

When we test, our usual preference is to keep valid the null hypothesis, unless there is a strong evidence we should reject it, given the data we have.

When H0 is not true at population level but we still keep preferring it, we end to make the type II error, because we miss to find in the population something different from what is stated under the null.

The probability to make this mistake is usually called beta and it is a decreasing function of both the sample size and the probability of type I error (alpha).

For a given alpha, the higher the sample size, the lower the standard error, the shorter the interval we compute for the parameter under H0. That’s why, we will more easily reject, and beta will decrease.

For a given sample size, beta will of course decrease if we increase alpha, but, in that case, we are accepting the compromise of a higher probability of rejecting H0 when it is actually true at population level.

Let’s see with a plot how beta varies according to three different sample sizes (of  10, 50 and 100 observations) and two different alpha (1% and 10%).

In this particular case, we are assuming under H0 to have a mean of 50; we are then computing beta for values of H1 between 46 and 54. The standard deviation considered is equal to 2.5.


betaFunction <- function(mu1, mu0, sigma, n, alpha = 0.05) {
    delta <- qnorm(alpha/2, lower.tail = FALSE) * sigma/sqrt(n)
    lb <- mu0 - delta
    ub <- mu0 + delta
    pnorm(ub, mu1, sigma/sqrt(n)) - pnorm(lb, mu1, sigma/sqrt(n))

seq1 <- seq(46, 54, by = 0.01)

alphas <- c(0.01, 0.1)

samplesizes <- c(10, 50, 100)

listsalpha <- lapply(alphas, function(x) sapply(samplesizes, function(n) betaFunction(seq1, 
    50, 2.5, n, x)))

for (i in 1:length(listsalpha)) {
    colnames(listsalpha[[i]]) <- samplesizes

for (i in 1:length(listsalpha)) {
    listsalpha[[i]] <- data.frame(mu1 = seq1, listsalpha[[i]], alpha = alphas[i])

dfWide <- do.call(rbind, listsalpha)

dfLong <- melt(dfWide, id.vars = c("alpha", "mu1"), value.name = "beta", variable.name = "samplesize")

dfLong$samplesize <- ordered(factor(substr(dfLong$samplesize, 2, 4)), sort(as.numeric(as.character(samplesizes))))

dfLong$alpha <- factor(dfLong$alpha)

ggplot(dfLong, aes(x = mu1, y = beta)) + geom_path(aes(color = samplesize, linetype = alpha, 
    size = alpha)) + labs(title = "Variations in type II error", x = "Mean under H1") + 
    scale_size_discrete(range = c(1, 1.1)) + theme_solarized_2(light = F)



Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: