8.3 – Sampling distribution and hypothesis testing
Introduction
Understanding the relationship between sampling distributions, probability distributions, and hypothesis testing is the crucial concept in the NHST — Null Hypothesis Significance Testing — approach to inferential statistics. is crucial, and many introductory text books are excellent here. I will add some here to their discussion, perhaps with a different approach, but the important points to take from the lecture and text are as follows.
Our motivation in conducting research often culminates in the ability (or inability) to make claims like:
- “Total cholesterol greater than 185 mg/dl increases risk of coronary artery disease.”
- “Average height of US men aged 20 is 70 inches (1.78 m).”
- “Species of amphibians are disappearing at unprecedented rates.”
Lurking beneath these statements of “fact” for populations (just what IS the population for #1, for #2, and for #3?) is the understanding that not ALL members of the population were recorded.
How do we go from our sample to the population we are interested in? Put another way — How good is our sample? We’ve talked about how “biostatistics” can be generalized as sets of procedures you use to make inferences about what’s happening in populations. These procedures include:
- Have an interesting question
- Experimental design (Observational study? Experimental study?)
- Sampling from populations (Random? Haphazard?)
- Hypotheses: HO and HA
- Estimate parameters (characterize the population)
- Tests of hypotheses (inferences)
We have control of each of these — we choose what to study, we design experiments to test our hypotheses…We have already introduced these topics (Chapters 6 – 8).
We also obtain estimates of parameters, and inferential statistics applies to how we report our descriptive statistics (Chapter 3). Estimates of parameters like the sample mean and sample standard deviation can be assessed for accuracy and precision (e.g., confidence intervals).
Sampling distribution
Imagine drawing a sample of 30 from a population, calculating the sample mean for a variable (e.g., systolic blood pressure), then calculating a second sample mean after drawing a new sample of 30 from the same population. Repeat, accumulating one estimate of the mean, over and over again. What will be the shape of this distribution of sample means? The Central Limit Theorem states that the shape will be a normal distribution, regardless of whether or not the population distribution was normal, as long as the sample size is large (i.e., Law of Large Numbers). We alluded to this concept when we introduced discrete and continuous distributions (Chapter 6).
It’s this result from theoretical statistics that allows us to calculate the probability of an event from a sample without actually carrying out repeated sampling or measuring the entire population.
A worked example
To demonstrate the CLT we want R to help us generate many samples from a particular distribution and calculate the same statistic on each sample. We could make a for loop, but the replicate() function provides a simpler framework. We’ll sample from the chi-square distribution. You should extend this example to other distributions on your own, see Question 5 below.
Note 1: This example is much simpler to enter and run code in the script window, adjusting code directly as needed. If you wish to try to run this through Rcmdr, you’ll need to take a number of steps, and likely need to adjust the code and rerun anyway. Some of the steps in would be Rcmdr: Distributions → Continuous distributions → Chi-squared distribution → Sample from chi-square distribution…, then running Numerical summaries and saving the output to an object (e.g., out), extracting the values from the object (e.g., out$Table, confirm by running command str(out)— str() is an R utility to display the structure of an object), then testing the object for normality Rcmdr: Statistics → Test of normality, select Shapiro-Wilk, etc.. In other words, sometimes a GUI is a good idea, but in many cases, work with the script!
Generate x replicate samples (e.g., x = 10, 100, 1000, one million) of 30 each from chi-square distribution with one degree of freedom, test the distribution against null hypothesis (assume normal distributed, e.g., Shapiro-Wilk test, see Chapter 13.3), then make a histogram (Chapter 4.2), like Figure 1 or Figure 2.
x.10 <- replicate(10, { my.mean <- rchisq(30, 1) mean(my.mean) }) normalityTest(~x.10, test="shapiro.test") hist(x.10, col="orange")
Result from R
Shapiro-Wilk normality test
data: x.10
W = 0.87016, p-value = 0.1004

Figure 1. means of ten replicate samples drawn at random from chi-square distribution, df = 1.
Modify the code to draw 100 samples, we get Fig 2.

Figure 2. means of 100 replicate samples drawn at random from chi-square distribution, df = 1. Results from Shapiro-Wilks test: W = 0.97426, p-value = 0.04721.
And finally, modify the code to draw one million samples, we get Figure 3.

Figure 3. means of one million replicate samples drawn at random from chi-square distribution, df = 1. Normality test will fail to run, sample size of 5000 limit.
How to apply sampling distribution to hypothesis testing
First, a reminder of some definitions.
Estimate = we will always (almost) concern ourselves with how good our sample mean (such values are called estimates) is relative to the population mean, the thing we really want, but can only hope to get an estimate of.
Accuracy = how close to the true value is our measure?
Precision = how repeatable is our measure?
How can we tell if we have a good estimate? We want an estimate with an evaluation for accuracy and for precision. The sampling error provides an assessment of precision, whereas the confidence interval provides a statement of accuracy. We need an estimate of the sampling error for the statistic,
Sample standard error of the mean
We introduced sample error of the mean in section 3.4 of Chapter 3. Everything we measure can have a corresponding statement about how accurate (sampling error) is our estimate! First, we begin by asking, “how accurate is the mean that we estimate from a sample of a population?” How do we answer this? We could prove it in the mathematical sense of proof (and people have and do) OR we can use the computer to help. We’ll try this approach in a minute.
What we will show relates to the standard error of the population mean (SEM) or
![]()
, whose equation is shown below.
![]()
or equivalently, from the standard deviation we have
![]()
Note that the SEM takes the variance and divides through by the sample size. In general, then, the larger the sample size, the smaller the “error” around the mean. As we work through the different statistical tests, t-tests, analysis of variance, and related, you will notice that the test statistic is calculated as a ratio between a difference or comparison divided by some form of an error measurement. This is to remind you that “everything is variable.”
A note on standard deviation (SD) and standard error of the mean (SEM): SD estimates the variability of a sample of Xi‘s whereas SEM estimates the variability of a sample of means.
Let’s return to our thought problem and see how to demonstrate a solution. First, what is the population? Second, can we get the true population mean?
One way, a direct (but impossible?) approach would be to measure it — get all of the individuals in a population and measure them, then calculate the population mean. Then, we could compare our original sample mean against the true mean and see how close it was. This can be accomplished in some limited cases. For example, the USA conducts a census of her population every ten years, a procedure which costs billions of dollars. We can then compare samples from the different states or counties to the USA mean. And these statistics are indeed available via the census.gov website. But even the census uses sampling — individuals are randomly selected to answer more questions and from this sample trends in the population are inferred.
So, sampling from populations is the way to go for most questions we will encounter. The procedures we will use to show how a sample mean relates to the population mean are general and may be used to show how any estimate of a variable (sample mean and sample standard deviation, etc.), relates to properties of a parameter. We’ll get to the other issues, but for now, think about sample size.
Sampling from populations is necessary and inevitable, and, to a certain extent, under your control. But how many individuals do we need? The quick answer is for me to direct your attention to the equation for the SEM. Can you see in that ratio the secret to obtaining more precise estimates? There are many ways to approach this question, but let’s use the tools from last time, those based on properties of a normal distribution.
If we can view the sampling as having come from a population at least approximately normally distributed for our variable, then we can now examine empirically the effect of different sample sizes on the estimate of the mean.
A hint: variability is important!
From one population we obtain two samples, A and B. Sample sizes are
![]()
Assume for now that we know the true mean (μ) and standard deviation (σ) for the population. Note. This is one of the points of why we use computer simulation so much to teach statistics — it allows us to specify what the truth is, then see how our statistical tools work or how our assumptions affect our statistically based conclusions.
![]()
Confidence intervals
Reliability is another word for precision. We define a confidence interval as a statistic to report the reliability of our estimated statistic. We introduced confidence interval in Chapter 3.4. At least in principle, confidence intervals can be calculated for all statistics (mean, variance, etc.,) and for all data types. Confidence intervals define a lower limit, L, and an upper limit, U, and that you are making a statement that you are “95% certain that the true value (parameter value) is between these two limits.
We previously reported how to calculate an approximate confidence intervals for proportions and for NNT; simply multiple standard error estimate by 2. Here we introduce an improved approximate calculation of the 95% confidence interval for the sample mean
![]()
where Z is something you would look up from the table of the normal distribution. For a 95% confidence interval, 100% – 95% = 5% and divide 5% by two: the lower limit corresponds to 2.5% and the upper limit corresponds to 2.5% on our normal distribution. We look up the table and we find that Z for 0.025 is 1.96 and that is the value we would plug into our equation above. For large sample sizes, you can get a pretty decent estimate of the confidence interval by replacing 1.96 with “2.”
Questions
1. What is the probability of having a sample mean greater than 50 (mean > 50) for a sample of n = 9 ?
We’ll use a slight modification of the Z-score equation we introduced in Chapter 6.6 — the modification here is that previously we referred to the distribution of Xi‘s and how likely a particular observation would be. Instead, we can use the Z score with the standard normal distribution (aka Z-distribution), approach to solving how likely an estimated sample mean is given the population parameters μ and σ. Recall the Z score
![]()
We have everything we need except the SEM, which we can calculate by dividing the standard deviation by squared root of sample size.
For
![]()
, σ = 12.0 (given above), and μ = 47, n = 9, plug in the values:
![]()
Therefore, after applying the equation for Z score,
. This corresponds to how far away from the standard mean of zero.
Look up
from the table of normal distribution. The answer is
, which corresponds to that
is EQUAL to or GREATER than 0.75, which is what we wanted. Translated, this implies that, given the level of variability in the sample, 22.66% of your sample means would be greater than 50! We write:
.
Some care needs to be taken when reading these tables — make sure you understand how the direction (less than, greater than) away from the mean is tabulated.
2. Instead of greater, how would you get the probability less than 50?
Total area under the curve is 1 (100%), so subtract
.
I recommend that you do these by hand first, then check your answers. You’ll need to be able to do this for exams.
Here’s how to use Rcmdr to do these kind of problems.
Rcmdr: Distributions → Continuous distributions → Normal distribution → Normal probabilities …

Figure 5. Screenshot Rcmdr menu to get normal probability.
Here’s the answer from Rcmdr
pnorm(c(50), mean=47, sd=12, lower.tail=TRUE) [1] 0.5987063
3. Now, try a larger sample size. For
, what is the probability of having a sample mean greater than 50 (
)?
![]()
, μ = 47, σ = 12, n = 50 and
![]()
Therefore, after applying the equation for Z score,
. Look up
(Normal table, subtract answer from 1) and we get
.
Or 3.84% of your sample means would be greater than 50! We write:
.
Said another way: If you have a sample size of 50 (
) and you obtain a mean greater than 50 then there is only a 3.84% chance that the TRUE MEAN IS 47.
4. What happens if the variability is smaller? Chance σ from 12 to 6 then repeat questions 1 and 4.
5. Repeat the demonstration of Central Limit Theorem and Law of Large Numbers for discrete distributions
- binomial distribution. Replace
rchisq()withrbinom(n, size, prob)in thereplicate()function example. See Chapter 6.5 - poisson distribution. Replace
rchisq()withrpois(n, lambda)in thereplicate()function example. See Chapter 6.5
Quiz Chapter 8.3
Sampling distribution and hypothesis testing
Chapter 8 contents
- Introduction
- The null and alternative hypotheses
- The controversy over proper hypothesis testing
- Sampling distribution and hypothesis testing
- Tails of a test
- One sample t-test
- Confidence limits for the estimate of population mean
- References and suggested readings
6.5 – Discrete probability distributions
Binomial distribution
Discrete refers to particular outcomes — data that take on only specific separate values. Discrete data types include all of the categorical types we introduced earlier, including binary, ordinal, and nominal.
The binomial probability distribution is a discrete distribution for the number of successes, k, in a sequence of n independent trials, where the outcome of each trial can take on only one of two possible outcomes. For cases of 0 or 1, yes or no, “heads” or “tails,” male or female, pass or fail, we talk about the binomial distribution, because the outcomes are discrete and there can be only two possible (binary) outcomes.
Note 1: The biological definition of sex is binary — whether sexually reproducing organism produces male (sperm) or female (ovum) gametes (cf discussion in Goymann et al 2022). For the argument that sex is a continuum held by some biomedical and social scientists, see Ainsworth 2015.
The mathematical function of the binomial is written as
![]()
where the binomial coefficient is given by
![]()
and X refers to the number of ways to choose “success” from n observations.
Consider an example.
We have to define what we mean by success. For coin toss, this might be the number of heads.
The mean for the binomial this is given simply as
![]()
where X is “Heads” (the category of successes for our example), and p corresponds to the probability the selected event occurs, in this case, “Heads.”
The variance of the binomial distribution is given by
![]()
Here’s a density plot of two trials with success 2% with n (x) equal to 20 (Fig 1).

Figure 1. Plot generated with KMggplot2 Rcmdr plugin.
Here’s the R code
Create the trials, 1 through 20, then create an object to hold the number of trials
nSize=1:20 Size <- length(nSize); Size
R returns
[1] 20
Assign the probability value to an object
prob <- 0.02
Next, calculate the mean, mu, and the variance, var, for the binomial with prob = 0.02 and number of trials Size = 20
mu <- Size*prob var <- Size*prob*(1-prob)
Print the mean and variance; lets assign them to an object then print the object
stats <- c(mu, var); stats
And R returns
[1] 0.400 0.392
And here’s a real-world example. Twinning in humans is rare. In Hawaiʻi in the 1990’s the rate of twin births (monozygotic and dizygotic) was about 20 for every 1000 births or 2%. “Success” here then is twin births.

Figure 2. Example of binomial-like distribution: reported twins born in Hawaiʻi.
Interestingly, rates of twins have since increased in Hawaiʻi (31 out of 1000 births) and in the United States overall (33 out of 1000 births) (Table 2, NCHS Data Brief No. 80, 2012). Data for Fig 2 were for year 2009.
Out of ten births, what is the probability of two twin births in Hawaiʻi?
![]()
You can solve this with your calculator (yikes!), or take advantage of online calculators (GraphPad QuickCalcs), or use R and Rcmdr.
In R, simply type at the prompt
dbinom(2,10,0.031) [1] 0.03361446
Try in R Commander.
Rcmdr → Distributions → Discrete distributions → Binomial distribution → Binomial probabilities …

Figure 3. Rcmdr menu to get binomial probability.
Note I used p = 0.033 the rate for entire USA. Here’s the output
> .Table <- data.frame(Pr=dbinom(0:10, size=10, prob=0.033)) > rownames(.Table) <- 0:10 > .Table Pr 0 7.149320e-01 1 2.439789e-01 2 3.746728e-02 ← Answer, 0.0375 or 3.75% 3 3.409639e-03 4 2.036263e-04 5 8.338782e-06 6 2.371422e-07 7 4.624430e-09 8 5.918028e-11 9 4.487991e-13 10 1.531579e-15
And here is the output for our example from Hawaiʻi (p = 0.031).
> .Table <- data.frame(Pr=dbinom(0:10, size=10, prob=0.031)) > rownames(.Table) <- 0:10 > .Table Pr 0 7.298570e-01 1 2.334940e-01 2 3.361446e-02 ← Answer, 0.0336 or 3.36% 3 2.867694e-03 4 1.605494e-04 5 6.163507e-06 6 1.643178e-07 7 3.003893e-09 8 3.603741e-11 9 2.561999e-13 10 8.196283e-16
We use the binomial distribution as the foundation for the binomial test, ie, the test of an observed proportion against an expected population level proportion in a Bernoulli trial.
Hypergeometric distribution
The binomial distribution is used for cases of sampling with replacement from a population. When sampling without replacement is done, the hypergeometric distribution is used. It is the number of successes, k, in a sequence of n independent trials drawn from a fixed population.
Note 2: A fixed population in epidemiology refers to a group of individuals, defined by set of characteristics. Examples include individuals selected because they all share a common life event, eg, giving birth. The population is “fixed” because, once an individual is included in the population, they remain a member of the population even if their characteristics change during the duration of the study. The opposite of a fixed population is an dynamic or open population. Definitions from Chapter 2 in Aschengrau and Seage (2003).
This sampling scheme means that each draw is no longer independent — with each draw you decrease the remaining number of observations and thus change the proportion.
The P-value from the hypergeometric distribution is often used to study if genes from a list of Gene Ontology (eg, biological process functional terms like “cell cycle,” or “apoptosis”) are enriched in the sets of genes with expression differences across the treatment groups (cf, discussion in Cao and Zhang 2014). The R package gprofiler2, which links to a Cloud service at g:Profiler, the can be used to do gene enrichment analyses with stunning visualizations (eg, Manhattan plot) suitable for publication.
The mathematical function of the hypergeometric is written as
![Rendered by QuickLaTeX.com \begin{align*} Pr\left [X = k \right ] = \frac{\left ( \frac{K}{k} \right )\left ( \frac{N - K}{n - k} \right )}{\left ( \frac{N}{n} \right )} \end{align*}](https://biostatistics.letgen.org/wp-content/ql-cache/quicklatex.com-4c3fda04d51bcfab12f92b7cd77ff0ad_l3.png)
where N is the population size, K is the number of successes in that population, and n and k are defined as above. Lets look apply this to the twinning problem.
In 2009, 2200 women gave birth in Hawaiʻi County, Hawaiʻi. Out of 10 births, what is the probability of 2 twin births in Hawaiʻi?
Assuming “risk” of twinning is the same rate as in rest of USA, then we have expected 72 successes in this population (0.033 * 2200).
Here’s the graph (Fig 4).

Figure 4. Plot of hypergeometric distribution twinning Hawaiʻi.
where the X axis values shows the number of events with successes (twin births). Taking the bin 2 (we wanted to know about probability of 2 out of ten), we can draw a line back to the Y-axis to get our probability — looks like about 5% roughly. Plot drawn with KMggplot2
To get the actual probability,
Rcmdr → Distributions → Discrete distributions → Hypergeometric distribution → Hypergeometric probabilities …

Figure 5. Rcmdr menu to get hypergeometric probability.
where m is the number of successes, n is the number of “failures,” and k is the number of trials.
> .Table Pr 0 0.716453457 1 0.243438645 2 0.036688041 ← Answer, 0.0367 or 3.67% 3 0.003228871
The reference to white and black balls and urns is a device described by Bernoulli himself and has been used by others ever since to discuss probability problems (called the urn problem) and so I apply it here to be consistent. The urn contains a number of white (x) and a number of black (y) balls mixed together. One ball is drawn randomly from the urn — what color is it? The ball is then is either returned into the urn (replacement) or it is left out (without replacement) as in the hypergeometric problem, and the selection process is repeated.
Besides applications in gambling and balls-in-urns problems, this distribution is the basis for many tests of gene enrichment from microarray analyses. The hypergeometric forms the basis of the Fisher Exact test (see Chapter 9.5).
Discrete uniform distribution
For discrete cases of “1,” “2,” “3,” “4,” “5,” or “6,” on the single toss of fair dice, we can talk about the discrete uniform distribution because all possible outcomes are equally likely. If you are branded as a “card-counter” in Las Vegas, all you’ve done is reached an understanding of the uniform distribution of card suits!
One biological example would be the fate of a random primary oocyte in the human (mammal) female — three out of four will become polar bodies, eventually reabsorbed, whereas one in four will develop into a secondary oocyte (egg); the uniform distribution has to do with the counts of the products — each of the four primary oocytes has the same (apparently) chance (25%) of becoming the egg (Edwards and Beard 1997).
Analyses in biological research often use a uniform distribution as a baseline or null model (Gotelli and Ulrich 2011). For DNA sequencing, the uniform distribution provides a baseline for assessing coverage bias, eg, GC bias (Ross et al 2013). The uniform distribution underlies the Jukes–Cantor evolutionary model. The Jukes–Cantor model assumes equal substitution rates among all four nucleotides (A, T, C, G) and equal base frequencies (Felsenstein 2003).
The uniform distribution exists also for continuous data types, where every value within a specified interval has equal probability density. For example, DNA shearing during initial library prep for sequencing reactions, breakpoints may be modeled as uniformly distributed along fragment length. Sequence-dependent cleavage bias is concluded when certain DNA fragments or sequences are generated or detected more (or less) frequently than would be expected by chance (Meyer and Liu 2014).
Non-uniform distributions reveal cleavage bias.
Poisson distribution
An extension from the binomial case is that, rather than following success or failure, you may have the following scenario. Consider a wind-dispersed seed released from a plant. If we markup the area around the plant in grids, we could then count the number of seeds within each grid. Most grids will have no seeds, some grids will have one seed, a few grids may have two seeds, etc. Multiple seeds in grids is a rare event. The graph might look like

Figure 6. Example, poisson-like graph: the number of wind-dispersed seeds within each grid.
In molecular biology, use of Poisson distribution is common. In digital PCR for example, a DNA sample is diluted and divided into thousands of small reaction volumes or partitions. Thus, each partition contains either zero molecules, one molecule, or more than one molecule, where the loading of molecules into partitions is assumed to be independent, random, and rare per partition. In sequencing, Poisson models are used to describe random sampling of molecules and reads, especially when coverage is generated by non-deterministic processes like shotgun sequencing.
The Poisson has interesting properties, one being that the expected mean, λ, is equal to the variance. Lambda is, with respect to the Poisson distribution, the average number of times an event happens over a specific period or area.
An equation is
![]()
where μ is the mean (or could substitute with variance!), e is the natural logarithm, and X is number of successes you are interested in. For example, if μ = 1, what is the probability of observing a grid with five seeds? Simple enough to do this by hand, but let’s use Rcmdr instead. Here’s the graph (Fig 7) from Rcmdr (KMggplot2 plugin).

Figure 7. The probability of observing a grid with five seeds, poisson μ = 1 (ggplot2).
and for the actual probability we have from R
dpois(5, lambda = 1) [1] 0.003065662
Rcmdr → Distributions → Discrete distributions → Poisson distribution → Poisson probabilities … (Fig 8)
The only thing to enter is the mean (some call μ lambda with symbol λ).

Figure 8. Rcmdr menu, poisson probability.
Here’s the output from R. For intervals 0, 1, 2, 3, …, 6 (Rcmdr just enters this range for you!
> .Table <- data.fram(Pr=dpois(0:6, lambda = 1)) > rownames(.Table) <- 0:6 > .Table Pr 0 0.3678794412 1 0.3678794412 2 0.1839397206 3 0.0613132402 4 0.0153283100 5 0.0030656620 ← Answer, 0.0307 or 3.07% 6 0.0005109437
Next — Continuous distributions
And finally, for ratio (continuous) scale data, which can take on any value, we can express the chance that probability of a given point as a continuous function, with the normal distribution being one of the most important examples (there are others, like the F-distribution). Many statistical procedures assume that the data we use can be viewed as having come from a “normally distributed population.” See Chapter 6.6.
Questions
1.2. Quarterback sacks by game for the NFL team Seahawks, years 2011 through 2022 are summarized below (data extracted from https://www.pro-football-reference.com/ ).
| Sacks | How many games? |
| 0 | 25 |
| 1 | 46 |
| 2 | 49 |
| 3 | 39 |
| 4 | 25 |
| 5 | 14 |
| 6 | 8 |
| 7 | 2 |
| 8 | 1 |
| 9 | 0 |
a) Assuming a Poisson distribution, what are the mean (lambda) and variance?
b) The table covers a total of 112 games. How many sacks (events) were observed?
c) What is the probability of the Seahawks getting zero sacks in a game (in 2022, a season was 17 games; prior years a season was 16 games)?
Quiz Chapter 6.5
Discrete probability
Chapter 6 contents
- Introduction
- Some preliminaries
- Ratios and proportions
- Combinations and permutations
- Types of probability
- Discrete probability distributions
- Continuous distributions
- Normal distribution and the normal deviate
- Moments
- Chi-square distribution
- t distribution
- F distribution
- References and suggested readings