# Enjoy R

### How the probability of type II error varies

The * type II error* is what we may call a

*.*

*miss*When we test, our usual preference is to keep valid the null hypothesis, unless there is a strong evidence we should reject it, given the data we have.

When *H0* is not true at population level but we still keep preferring it, we end to make the * type II error*, because we

*to find in the population something different from what is stated under the null.*

*miss*The probability to make this mistake is usually called *beta* and it is a decreasing function of both the sample size and the probability of *type I error (alpha)*.

For a given *alpha*, the higher the sample size, the lower the standard error, the shorter the interval we compute for the parameter under *H0*. That’s why, we will more easily reject, and *beta* will decrease.

For a given sample size, *beta* will of course decrease if we increase *alpha*, but, in that case, we are accepting the compromise of a higher probability of rejecting *H0* when it is actually true at population level.

Let’s see with a plot how *beta* varies according to three different sample sizes (of 10, 50 and 100 observations) and two different *alpha* (1% and 10%).

In this particular case, we are assuming under *H0* to have a mean of 50; we are then computing *beta* for values of *H1* between 46 and 54. The standard deviation considered is equal to 2.5.

```
library(reshape2)
library(ggthemes)
betaFunction <- function(mu1, mu0, sigma, n, alpha = 0.05) {
delta <- qnorm(alpha/2, lower.tail = FALSE) * sigma/sqrt(n)
lb <- mu0 - delta
ub <- mu0 + delta
pnorm(ub, mu1, sigma/sqrt(n)) - pnorm(lb, mu1, sigma/sqrt(n))
}
seq1 <- seq(46, 54, by = 0.01)
alphas <- c(0.01, 0.1)
samplesizes <- c(10, 50, 100)
listsalpha <- lapply(alphas, function(x) sapply(samplesizes, function(n) betaFunction(seq1,
50, 2.5, n, x)))
for (i in 1:length(listsalpha)) {
colnames(listsalpha[[i]]) <- samplesizes
}
for (i in 1:length(listsalpha)) {
listsalpha[[i]] <- data.frame(mu1 = seq1, listsalpha[[i]], alpha = alphas[i])
}
dfWide <- do.call(rbind, listsalpha)
dfLong <- melt(dfWide, id.vars = c("alpha", "mu1"), value.name = "beta", variable.name = "samplesize")
dfLong$samplesize <- ordered(factor(substr(dfLong$samplesize, 2, 4)), sort(as.numeric(as.character(samplesizes))))
dfLong$alpha <- factor(dfLong$alpha)
ggplot(dfLong, aes(x = mu1, y = beta)) + geom_path(aes(color = samplesize, linetype = alpha,
size = alpha)) + labs(title = "Variations in type II error", x = "Mean under H1") +
scale_size_discrete(range = c(1, 1.1)) + theme_solarized_2(light = F)
```

- Posted in: R ♦ rstats ♦ statistical software

## Recent Comments