# Enjoy R

### Find a sample size for a T test

When we want to test if a sample mean is significantly different from a certain known population’s mean, we usually use a T test with the following statistic:

```
Tn = (sample.mean - mean) / (sample.sd / sqrt(n))
```

which has to be compared with a critical value of Student’s T (d.o.f.= n-1).

The null hypothesis consists in a non significant difference between the two means.

So far, it looks very simple, but things get more complicated if we want to know the sample size we can start from to reject the null hypothesis, at a steady level of alpha.

The question could be:

*If the mean is 8, my sample mean is 6.7625 and my sample standard deviation is 2.1, what sample size should I start from to consider the two means different?*

The following function will clarify it:

```
t.samplesize <- function(mean, sample.mean, sample.sd, x = NULL, two.sided = TRUE,
prob = 0.05) {
if (prob < 0 | prob > 1)
stop("probability of error I must be included in [0,1]")
# condition for probability
if (two.sided)
prob <- 1 - prob/2 else prob <- 1 - prob
n <- 2
# starting from n=2
if (!is.null(x)) {
sample.mean <- mean(x)
sample.sd <- sd(x)
}
while (abs(sample.mean - mean)/(sample.sd/sqrt(n)) < qt(prob, n - 1) & n <
3000) n <- n + 1
# increasing n of one unit each loop, while test statistic is smaller than
# Student's t: when this condition is no more satisfied, or if n becomes
# equal to 3000, loop will be interrupted.
if (n == 3000)
n <- ceiling((qnorm(prob) * sample.sd/(sample.mean - mean))^2)
# a sample size of 3000 observations is large enough to use the normal
# distribution for the test statistic.
return(n)
# return the sample size
}
```

So, the solution is:

`t.samplesize(mean = 8, sample.mean = 6.7625, sample.sd = 2.1, two.sided = TRUE, prob = 0.05)`

`## [1] 14`

We found out we need at least 14 observations to reject null hypothesis.

The function also permits to directly give the sample as input:

`sample.vector <- c(3, 8, 9.29, 9.4, 6.3, 6.2, 6, 5.91) t.samplesize(mean = 8, x = sample.vector, two.sided = TRUE, prob = 0.05)`

`## [1] 14`

Let’s check if the function works well or not (if the output = * FALSE* we reject the null hypothesis).

`abs((6.7625 - 8)/(2.1/sqrt(13))) < abs(qt(0.05/2, 13 - 1)) # n=13`

`## [1] TRUE`

`abs((6.7625 - 8)/(2.1/sqrt(14))) < abs(qt(0.05/2, 14 - 1)) # n=14`

`## [1] FALSE`

- Posted in: R ♦ rstats ♦ statistical software

## Recent Comments