6.11 – F distribution

Introduction
χ2, t and F distributions are related
Questions
Chapter 6 contents

Introduction

The F distribution is the probability distribution associated with the F statistic and named in honor of R. A. Fisher. The F distribution is used as the null distribution of the ANOVA test statistic. The F distribution is the ratio of two chi-square distributions, with degrees of freedom v₁ and v₂ for numerator and denominator, respectively.

We can for illustration purposes define the F statistic as a ratio of two variances,

$\begin{align*} F = \frac{s_{2}^2}{s_{1}^2} \end{align*}$

The F statistic has two sets of degrees of freedom, one for the numerator and one for the denominator. The actual formula for the F distribution is quite complicated and in general we don’t use the F distribution in a way that involves parameter estimation. Rather, it is used in evaluating the statistical significance of the F statistic. Therefore, we produce but a few graphs and a table of critical values to illustrate the distribution.

We call the result of this calculation the F test statistic. We evaluate how often that value or greater of a test statistic will occur by applying the F distribution function. A few graphs to get a sense of what the distribution looks like for varying v₁, v₂ held to ten degrees of freedom (Fig. 1).
animated GIF, F distribution

Figure 1. Animated GIF plot of F distribution value for range of degrees of freedom.

By convention in the Null Hypothesis Significance Testing protocol (NHST), we compare the test statistic to a critical value. The critical value is defined as the value of the test statistic that occurs at the Type I error rate, which is typically set to 5%., per our presentations in Chapter 6.7, 6.9, and 6.10. The justification for NHST approach to testing of statistical significance is developed in Chapter 8.

Table of Critical values of the F distribution, one tail (upper)

Degrees of freedom v1 = 1 – 4, v2 = 10

Fv1, 10	α = 0.05	α = 0.025	α = 0.01
1	4.964	6.937	10.044
2	4.103	5.456	7.559
3	3.708	4.826	6.552
4	3.478	4.468	5.994

For the complete F table see Appendix 20.4

χ², t and F distributions are related

χ², t and F distributions are all distributions indexed by their degrees of freedom. With some algebra, these three distributions can be shown to be related to each other. The probabilities tabled in the chi-squared are part of the F-distribution.

Some interesting relationships between the F distribution and other distributions can be shown. By definition we claimed that the F distribution is built on ratio of chi-square distributions, so that should indicate to you the relationship between the two kinds of continuous probability distributions. However, one can also show relationships to other distributions for the F distribution. For example, for the case of v₁ = 1 and v₂ = any value, then F_1,v2 = t², where t refers to the t distribution.

Questions

What happens to the shape of the F distribution as degrees of freedom are increased from 1 to 5 to 20 to 100?
In Rcmdr, which option do you select to get the critical value for df1 = 1 and df=20 at alpha = 5%?
A. F quantiles
B. F probabilities
C. Plot of F distribution
D. Sample from F distribution

Be able to answer these questions using the F table, Appendix 20.4, or using Rcmdr

For probability α = 5%, and numerator degrees of freedom equal to 1, what is the critical value of the F distribution (upper tail) for 1 degree of freedom? For 5 df? For 20 df? For 30 df?
The value of the F test statistic is given as 12. With 3 degrees of freedom for the numerator, and ten degrees of freedom for the denominator, what is the approximate probability of this value, or greater from the F distribution?

Chapter 6 contents

6.9 – Chi-square distribution

Introduction
Example
R code
Questions
Chapter 6 contents
Questions

Introduction

As noted earlier, the normal deviate or Z score can be viewed as randomly sampled from the standard normal distribution. The chi-square distribution describes the probability distribution of the squared standardized normal deviates with degrees of freedom, df, equal to the number of samples taken. (The number of independent pieces of information needed to calculate the estimate, see Ch. 8.) We will use the chi-square distribution to test statistical significance of categorical variables in goodness of fit tests and contingency table problems.
The equation of the chi-square is

$\begin{align*} \chi^2 = \sum_{i = 1}^{k}\frac{\left ( f_{i}-\hat{f}_{i} \right )^2}{\hat{f}_{i}} \end{align*}$

where k is the number of groups or categories, from 1 to k, and f_i is the observed frequency and f_i “hat” is the expected frequency for the k^th category. We call the result of this calculation the chi-square test statistic. We evaluate how often that value or greater of a test statistic will occur by applying the chi-square distribution function. Graphs to show chi-square distribution for degrees of freedom equal to 1 – 5, 10, and 20 (Fig. 1).

GIF, chi-square distribution, df = 1 - 5, 10, and 20

Figure 1. Animated GIF of plots of chi-square distribution over range of degrees of freedom.

Note that the distribution is asymmetric, particularly at low degrees of freedom. Thus tests using the chi-square are one-tailed (Fig. 2).

upper-tail chi-square distribution

Figure 2. The test of the chi-square is typically one-tailed. In this case, probability of values greater than the critical value.

By convention in the Null Hypothesis Significance Testing protocol (NHST), we compare the test statistic to a critical value. The critical value is defined as the value of the test statistic — the cutoff boundary between statistical significance and insignificance — that occurs at the Type I error rate, which is typically set to 5%. The interpretation of the result is as follows: after calculating a test statistic, we can judge significance of the results relative to the null hypothesis expectation. If our test statistics is greater than the critical value, then the p-value of our results are less than 5% (R will report an exact p-value for the test statistic). You are not expected to be able to follow this logic just yet — rather, we teach it now as a sort of mechanical understanding to develop in the NHST tradition. The justification for this approach to testing of statistical significance is developed in Chapter 8. A portion of the critical values of the chi-square distribution are shown in Figure 3.

Portion of chi-square table

Figure 3. Portion of the table of some critical values of chi-square distribution, one tailed (right-tailed or “upper” portion of distribution).

See Appendix for a complete chi-square table.

Example

Professor Hermon Bumpus of Brown University in Providence, Rhode Island, received 136 House Sparrows (Passer domesticus) after a severe winter storm 1 February 1898. The birds were collected from the ground; 72 of the birds survived, 64 did not (Table 1). Bumpus made several measures of morphology on the birds and the data set has served as a classical example of Natural Selection (Chicago Field Museum). We’ll look at this data set when we introduce Linear Regression.

Table 1. Survival statistics of Bumpus House sparrows

	Yes	No
Female	21	28
Male	51	36

Was there a survival difference between male and female House Sparrows? This is a classic contingency table analysis, something we will at length in Chapter 9. For now, we report the Chi-square test statistic for this test was 3.1264 and the test had one degree of freedom. What is the critical value of the chi-square distribution at 5% and one degree of freedom?. Typically we would simply use R to look this up

qchisq(c(0.05), df=1, lower.tail=FALSE)

But we can also get this from the table of critical values (Fig. 4). Simply select the row based on the degrees of freedom for the test then scan to the column with the appropriate significance level, again, typically 5% (0.05).

Figure 4. Portion of the chi-square distribution which shows how to find critical value of the chi-square distribution.

Figure 4. Portion of the chi-square distribution which shows how to find critical value of the chi-square distribution.

For 1 degree of freedom at 5% significance, the critical value is 3.841. Back to our hypothesis: Did male and female survival differ in the Bumpus data set? Following the NHST logic, if the test statistic value (e.g., 3.1264) is greater than the critical value (3.841), then we would reject the null hypothesis. For this example, we would conclude no statistical difference between male and female survival because the test statistic was smaller than the critical value. How likely are these results due to chance? That’s where the p-value comes in. Our test statistic value falls between 5% and 10% (2.706 < 3.1264 < 3.841). In order to get the actual p-value of our test statistic we would need to use R.

R code

Given a chi-square test statistic you can use R to calculate the probability of that value against the null hypothesis. At the R prompt

pchisq(c(3.1264), df=1, lower.tail=FALSE)

And R output

[1] 0.07703368

Because we are using R Commander, simply select the command by following the menu options.

Rcmdr: Distributions → Continuous distributions → Chi-squared distribution → Chi-squared probabilities …

Enter the chi-square value and degrees of freedom (Fig. 5).

Figure 4. Screenshot of input box in Rcmdr for Chi-square probability values.

Figure 5. Screenshot of input box in Rcmdr for Chi-square probability values.

Questions

What happens to the shape of the chi-square distribution as degrees of freedom are increased from 1 to 5 to 20 to 100?

Be able to answer these questions using the Chi-square table, Appendix 20.2, or using Rcmdr

For probability α = 5%, what is the critical value of the chi-square distribution (upper tail)?
The value of the chi-square test statistic is given as 12. With 3 degrees of freedom, what is the approximate probability of this value, or greater from the chi-square distribution?

Chapter 6 contents

Introduction
Some preliminaries
Ratios and proportions
Combinations and permutations
Types of probability
Discrete probability distributions
Continuous distributions
Normal distribution and the normal deviate (Z)
Moments
Chi-square (Χ²) distribution
t distribution
F distribution
References and suggested readings

6.11 – F distribution

Introduction

Table of Critical values of the F distribution, one tail (upper)

χ2, t and F distributions are related

Questions

Chapter 6 contents

6.9 – Chi-square distribution

Introduction

Example

R code

Questions

Chapter 6 contents

χ², t and F distributions are related