6.10 – t distribution

Introduction

Student’s t distribution is a sampling distribution where values are sampled from a normal distributed population, but σ the standard deviation and μ the mean of the population are not known. When sample size is large and we know σ the standard deviation, we would use the Z-score to evaluate probabilities of the sample mean. The t-distribution applies when σ is not known and sample size is small (e.g., less than 30, per rule of thirty).

Note 1: According to Wikipedia and sources therein, Student was the pseudonym of William Sealy Gosset, who came up with the t-test and t-distribution.

The equation of the t-test is

    \begin{align*} t = \frac{\left ( X_{i} - \mu\right )}{s_{\bar{X}}} \end{align*}

where the difference between “X bar,” the sample mean, and μ, the population mean, is divided by the standard error of the mean, s_{\bar{X}}, defined in Chapter 3.2 and again in Chapter 3.3. This formulation of the t-test is called the one sample t-test (Chapter 8.5).  We call the result of this calculation the test statistic for t. We evaluate how often that value or greater of a test statistic will occur by applying the t distribution function.

There are many t-distributions, actually, one for every degree of freedom. Like the normal distribution, the t distribution is symmetrical about a mean of zero. But it is stacked up (leptokurtic) around the middle at low degrees of freedom. As degrees of freedom increase, the t distribution spreads and becomes increasingly like the normal distribution.

Relationship between t distribution and standard normal curve

First, here is our standard normal plot, mean = 0, standard deviation = 1 (Fig 1).

normal, mean = 0, s = 1

Figure 1. Density plot of standard normal distribution

Next, here’s the t-distribution for five degrees of freedom (Fig 2).

t distribution, df =5

Figure 2. Density plot of t-distribution for five degrees of freedom.

Lets see what happens to the shape of the t-distribution as we increase the degrees of freedom from df = 5, 10, 20, 50, 1000, 10000 (Fig 3). The last graphic in the series is the standard normal curve again (Fig 3).

animated GIF, t distribution

Figure 3. Animated GIF of density plot t distribution, from df = 5 to 10,000 plus standard normal curve.

By convention in the Null Hypothesis Significance Testing protocol (NHST), we compare the test statistic to a critical value. The critical value is defined as the value of the test statistic that occurs at the Type I error rate, which is typically set to 5%. We introduced logic of NHST approach in Chapter 6.9 with the chi-square distribution. Again ,this is just an introduction; we teach it now as a sort of mechanical understanding to develop. The justification for this approach to testing of statistical significance is developed in Chapter 8.

Table of Critical values of the t distribution for df  1 – 5, one tail (upper)

df α = 0.05 α = 0.025 α = 0.01
1 6.314 12.706 31.820
2 2.920 4.303 6.965
3 2.353 3.182 4.541
4 2.132 2.776 3.747
5 2.015 2.571 3.365

See Appendix 20.3 for a complete table of t-distribution.

Questions

  1. What happens to the shape of the t distribution as degrees of freedom are increased from 1 to 5 to 20 to 100?

Be able to answer these questions using the t table, Appendix 20.3, or using Rcmdr

  1. For probability α = 5%, what is the critical value of the t distribution (upper tail) for 1 degree of freedom? For 5 df? For 20 df? For 30 df?
  2. The value of the t test statistic is given as 12. With 3 degrees of freedom, what is the approximate probability of this value, or greater from the t distribution?

Chapter 6 contents