Enjoy R

Find a sample size for a T test

When we want to test if a sample mean is significantly different from a certain known population’s mean, we usually use a T test with the following statistic:

Tn = (sample.mean - mean) / (sample.sd / sqrt(n))

which has to be compared with a critical value of Student’s T (d.o.f.= n-1).
The null hypothesis consists in a non significant difference between the two means.

So far, it looks very simple, but things get more complicated if we want to know the sample size we can start from to reject the null hypothesis, at a steady level of alpha.

The question could be:
If the mean is 8, my sample mean is 6.7625 and my sample standard deviation is 2.1, what sample size should I start from to consider the two means different?
The following function will clarify it:

t.samplesize <- function(mean, sample.mean, sample.sd, x = NULL, two.sided = TRUE, 
    prob = 0.05) {

    if (prob < 0 | prob > 1) 
        stop("probability of error I must be included in [0,1]")
    # condition for probability

    if (two.sided) 
        prob <- 1 - prob/2 else prob <- 1 - prob

    n <- 2
    # starting from n=2

    if (!is.null(x)) {

        sample.mean <- mean(x)
        sample.sd <- sd(x)

    }

    while (abs(sample.mean - mean)/(sample.sd/sqrt(n)) < qt(prob, n - 1) & n < 
        3000) n <- n + 1
    # increasing n of one unit each loop, while test statistic is smaller than
    # Student's t: when this condition is no more satisfied, or if n becomes
    # equal to 3000, loop will be interrupted.

    if (n == 3000) 
        n <- ceiling((qnorm(prob) * sample.sd/(sample.mean - mean))^2)
    # a sample size of 3000 observations is large enough to use the normal
    # distribution for the test statistic.

    return(n)
    # return the sample size
}

So, the solution is:

t.samplesize(mean = 8, sample.mean = 6.7625, sample.sd = 2.1, two.sided = TRUE, 
    prob = 0.05)

## [1] 14

We found out we need at least 14 observations to reject null hypothesis.

The function also permits to directly give the sample as input:

sample.vector <- c(3, 8, 9.29, 9.4, 6.3, 6.2, 6, 5.91)

t.samplesize(mean = 8, x = sample.vector, two.sided = TRUE, prob = 0.05)

## [1] 14

Let’s check if the function works well or not (if the output = FALSE we reject the null hypothesis).

abs((6.7625 - 8)/(2.1/sqrt(13))) < abs(qt(0.05/2, 13 - 1))  # n=13

## [1] TRUE


abs((6.7625 - 8)/(2.1/sqrt(14))) < abs(qt(0.05/2, 14 - 1))  # n=14

## [1] FALSE
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s

%d bloggers like this: