6.8 – Moments

Introduction

Moments are used to describe the shape of a distribution. For those of you who remember your calculus, moments were discussed as a method to find the center of mass, or balancing point (Herman and Strang 2018). For distributions, the center and shape moments follow from the expected value of the probability function.

Note 1: Expected value of a statistic is calculated by multiplying the likelihood of each possible outcome in a sample space, then adding up all of those values. From probability theory it is the weighted average of the outcomes of a random variable. A simpler way to think of the expected value is that if one were to guess the height of a person, the expected value is the average height of the population from which the person would be selected.

Four moments apply for describing the shape of a distribution. The 1st moment describes the middle, the 2nd describes the spread from the middle, the 3rd describes symmetry about the middle, and the 4th describes the shape, whether peaked and sharp, or leptokurtic, or broad and flattened, or platykurtic.

Equations for the moments

Over the years, several equations  have been proposed to estimate skewness and kurtosis. The above formulas are just one example from the list (Joanes and Gill 1998).

Pearson’s standardized moments:

equation standardized moment

where E is expected value of random variable. The expected value concept follows from rules of probability — basically, the average of large number, n, of X.

Four moments can be used to describe the shape of a distribution.

1st moment, μ (mean): population mean, 3.1 – Measures of Central Tendency

2nd moment, σ2 (variance): population variance, 3.2 – Measures of dispersion

3rd moment, skewness symbol (skew):

 

equation skewness

4th moment, symbol kurtosis (kurtosis):

kurtosis equation

Estimating moments in R and R Commander

histogram, finishing times in minutes

Figure 1. Histogram finishing times in minutes for 1307 runners at 2016 Banana 5K

In R Commander, we select Statistics → Summaries → Numerical summaries…, which brings up a popup menu. First, select the variable, in this case Minutes, from the Data tab (not shown). Next, click on Statistics tab to choose options (Fig. 2).

Rcmdr Numerical summaries

Figure 2. Rcmdr Numerical summaries Statistics tab.

For estimates of the moments, check Mean, Standard Deviation, Skewness, and Kurtosis. Note that Rcmdr gives you the choice among three different Types of skewness and kurtosis. Type 1 include the equations provided on this page, corresponding to definitions dating back to the 1940s. Type 2 is the default and corresponds to equations used by other professional statistics package (SAS, SPSS). For large sample size, the different types will tend to agree. Caution applies to smaller data sets — the different types may disagree (Joanes and Gill 1998).

Large sample size, n = 1307

Type 1

    mean       sd  skewness    kurtosis    n
34.42999 10.31437 0.6159258 -0.01593882 1307

Type 2

    mean       sd  skewness    kurtosis    n
34.42999 10.31437 0.6166337 -0.01139521 1307

Type 3

    mean       sd skewness    kurtosis    n
34.42999 10.31437 0.615219 -0.02050335 1307

Small sample size

To test the claim about sample size and the moment statistics, draw a random sample of 30 from the larger data set. Sample without replacement

sample.banana <- data.frame(sample(banana5K$Minutes, 30, replace = FALSE))

I forgot to specify a new variable name, so R used the whole command as the variable name. I could go back and fix my function call, or simply rename the variable as follows

names(sample.banana)[c(1)] <- c("Minutes")

The random sample yielded a distribution (Fig. 3).

Histogram sample size 30

Figure 3. Histogram finishing times in minutes for random sample of 30 drawn from 1307 runners at 2016 Banana 5K

Repeat Numerical summaries on small data set, n = 30

Type 1

    mean       sd  skewness  kurtosis  n
33.16667 10.00373 0.5538637 0.5024438 30

Type 2

    mean       sd  skewness  kurtosis  n
33.16667 10.00373 0.5834511 0.8276415 30

Type 3

    mean       sd  skewness  kurtosis  n
33.16667 10.00373 0.5264025 0.2728392 30

Conclusion: We can compare consistency of the estimators by calculating coefficient of variation. The three types of skewness estimators differed by only 1% and 5% for large and small sample size, respectively. In contrast, the three types of kurtosis estimators differed by 29% and 52% for large and small sample size, respectively.

Questions

  1. Explore the consistency of skewness and kurtosis estimates by calculating and comparing coefficient of variation estimates. R Commander provide a nice way to draw randomly from various defined distributions. Draw two data sets of 15 (small) and 1000 (large), from the chi-square distribution (1 degree of freedom) and a minimum of one other continuous distribution.

Example, draw random sample of 1000 from chi-square distribution. Rcmdr: Distributions → Continuous distributions → Chi-squared distribution → Sample from chi-squared distribution…

Enter name for the variable, enter degrees of freedom (e.g., 1), number of samples (e.g., 1000), and number of observations (variables, columns). Leave Sample means checked under data sets.

Rcmdr command to select sample from a distribution

This results in a new data set. Get “moments” from Numerical summaries and calculate coefficient of variations. Which moments have the most consistency regardless of the kind of distribution.

  1. Make histograms for each of your created data sets. Describe what you see about the shape of the plotted distributions.

Chapter 6 contents