The source code of this post is right here.
T-test is one of the basic tests used in biology (and perhaps most others scientific disciplines). However, it is not always interpreted correctly. Hopefully, this small demo will help understanding what the t-test is and what question is answers.
Let’s generate a sample.
Now the question is if the mean of the sample is different from some value. Typically the question is (or can be reduce down to) if the sample mean is different from zero. A quantity or statistic how far away the sample mean from the NULL mean is so-called t-statistic. \[ t = \frac{\bar{x}-\mu}{\frac{s}{\sqrt{N}}} \]
Let’s compute the t-statitic assuming the NULL mean value = 0.
Hence, our doubt that the mean of the sample is actually not any different from the NULL value can be posed in a more quantitative way. That is:
What is the probability of achieving as big or more extreme t-statistic for a sample with the mean equal to the NULL mean?
Let’s calculate this probability directly by generating a large number of random samples with mean equal to zero.
As a first step we’ll generate random samples with mean equal to the NULL mean. The actual value of the standard deviation at this point does not matter. The t-statistic is scaled by the standard deviation anyway.
Calculating t-statistics for the generated random samples.
How often the t-statistic from the random samples is more extreme then the t-statistic of the tested sample?
First let’s make it clear why we are concerned with both higher and lower deviations. It is because we do not say specifically if the sample mean is higher then or lower then the NULL mean. Our sceptical statement is just that it is different. Difference can go both ways - higher and lower. Thus we count deviations on both tails.
The actual t-test p-value is:
The directly calculated probability of achieving the same or more extreme t-statistic and the p-value from the R’s t.test
are quite close!