6.10 – t distribution

Introduction

Student’s t distribution is a sampling distribution where values are sampled from a normal distributed population, but σ the standard deviation and μ the mean of the population are not known. When sample size is large and we know σ the standard deviation, we would use the Z-score to evaluate probabilities of the sample mean. The t-distribution applies when σ is not known and sample size is small (e.g., less than 30, per rule of thirty).

According to Wikipedia and sources therein, Student was the pseudonym of William Sealy Gosset, who came up with the t-test and t-distribution.

The equation of the t-test is

    \begin{align*} t = \frac{\left ( X_{i} - \mu\right )}{s_{\bar{X}}} \end{align*}

where the difference between “X bar,” the sample mean, and μ, the population mean, is divided by the standard error of the mean, s_{\bar{X}}, defined in Chapter 3.2 and again in Chapter 3.3. This formulation of the t-test is called the one sample t-test (Chapter 8.5).  We call the result of this calculation the test statistic for t. We evaluate how often that value or greater of a test statistic will occur by applying the t distribution function.

There are many t-distributions, actually, one for every degree of freedom. Like the normal distribution, the t distribution is symmetrical about a mean of zero. But it is stacked up (leptokurtic) around the middle at low degrees of freedom. As degrees of freedom increase, the t distribution spreads and becomes increasingly like the normal distribution.

Relationship between t distribution and standard normal curve

First, here is our standard normal plot, mean = 0, standard deviation = 1

normal, mean = 0, s = 1

Next, here’s the t-distribution for five degrees of freedom.

t distribution, df =5

Lets see what happens to the shape of the t-distribution as we increase the degrees of freedom from df = 5, 10, 20, 50, 1000, 10000. The last graphic is the standard normal curve again.

animated GIF, t distribution

t distribution, from df = 5 to 10,000 plus standard normal curve, animated gif

By convention in the Null Hypothesis Significance Testing protocol (NHST), we compare the test statistic to a critical value. The critical value is defined as the value of the test statistic that occurs at the Type I error rate, which is typically set to 5%. We introduced logic of NHST approach in Chapter 6.9 with the chi-square distribution. Again ,this is just an introduction; we teach it now as a sort of mechanical understanding to develop. The justification for this approach to testing of statistical significance is developed in Chapter 8.

Table of Critical values of the t distribution for df  1 – 5, one tail (upper)

df α = 0.05 α = 0.025 α = 0.01
1 6.314 12.706 31.820
2 2.920 4.303 6.965
3 2.353 3.182 4.541
4 2.132 2.776 3.747
5 2.015 2.571 3.365

See Appendix 20.3 for a complete table of t-distribution.

Questions

  1. What happens to the shape of the t distribution as degrees of freedom are increased from 1 to 5 to 20 to 100?

Be able to answer these questions using the t table, Appendix 20.3, or using Rcmdr

  1. For probability α = 5%, what is the critical value of the t distribution (upper tail) for 1 degree of freedom? For 5 df? For 20 df? For 30 df?
  2. The value of the t test statistic is given as 12. With 3 degrees of freedom, what is the approximate probability of this value, or greater from the t distribution?

Chapter 6 contents