15.2 – Wilcoxon Rank Sum Test

Introduction
Wilcoxon test, worked example
Questions
Quiz
Data sets
Chapter 15 contents

Introduction

Wilcoxon rank sum test also called the two sample Wilcoxon test. It is equivalent to another nonparametric tests called the Mann-Whitney test, which was independently derived. We get the Wilcoxon test statistic in Rcmdr through the Statistics submenu.

Rcmdr: Statistics → Nonparametric tests → Two-sample Wilcoxon Test

I’ll show you the test with an example. We’ll use the same data set introduced in chapter 10.3, body mass (g) for four geckos (Hemidactylus frenatus, Fig. 1) and four green anolis lizards (Anolis carolinensis, Fig. 2).

Male Hemidactylus frenatus, central Oahu, M. Dohm 2018

Figure 1. Female common house gecko, Hemidactylus frenatus, central Oahu, M. Dohm 2018.

Male Anolis carolinensis, ʻAkaka Falls, Big Island of Hawai`i, M. Dohm 2018

Figure 2. Male Anolis carolinensis, ʻAkaka Falls, Hawaiʻi, M. Dohm 2018.

Wilcoxon test, worked example

Geckos: 3.186, 2.427, 4.031, 1.995
Anoles: 5.515, 5.659, 6.739, 3.184

Note 1. This test in Rcmdr requires that data were stacked worksheet and not in unstacked worksheet two columns. If you need help with worksheet format, then see Part07 in Mike’s Workbook for Biostatistics.

We choose from the Rcmdr Nonparametric statistics menu the Two sample Wilcoxon test (Fig. 3), then a two-tailed test of the null hypothesis (Fig. 4) and elect to use the defaults for the tests and calculations of P-values.

Screenshot Rcmdr 2 Sample Wilcoxn test

Figure 3. Screenshot Rcmdr menu 2 sample Wilcoxon test. Options are selected by clicking on “Options” tab (see Fig. 4)

Screenshot Wilcoxon options menu

Figure 4. Screenshot of options tab Rcmdr menu 2 sample Wilcoxon test. Keep defaults to run the “Wilcoxon test.”

Don’t forget to stack the data. Rcmdr won’t produce an error message if the data set is in the unstacked, improper conformation. Instead, Rcmdr menu options will not be available. For example, Fig. 5 shows a Two-sample Wilcoxon test… dimmed from view, not available for selection.

Screenshot Rcmdr Statistics menu

Figure 5. Screenshot of Rcmdr menu. Note Two- sample Wilcoxon test… not available.

The results of the test, copied from the Output window, are shown below.

wilcox.test(Mass ~ Lizard, alternative="two.sided", data=LizardStacked)

Wilcoxon rank sum test
data: Mass by Lizard
W = 14, p-value = 0.1143
alternative hypothesis: true location shift is not equal to 0

The calculation of the Wilcoxon test statistic (W) is straightforward, involving summing the ranks. Obtaining the P-value of the test of the null is a bit more involved as it depends on permutations of all possible combinations of differences. For us, R will do nicely with the details, and we just need to check the P-value.

Here, we see that the medians are 5.6 g for the Anolis, and 2.8 g for the geckos. The associated P-value is 0.1143. Thus, we fail to reject the null hypothesis and conclude that there was no difference in median body mass.

Note 2. This is the same general conclusion we got when we ran a independent t-test on this data set: no difference between day one and day two.

Questions

Conduct an independent t-test on the Lizard body mass data.
- Make a box plot to display the two groups and describe the middle and variability.
- Compare results of test of hypothesis. do they agree with the Wilcoxon test? If not, list possible reasons why the two tests disagree.
Using the dataset below, test null hypothesis using independent t-test, Welch’s test, and nonparametric Wilcoxon’s test.
1. Make a box plot to display the two groups and describe the middle and variability.
2. Compare results of test of hypothesis. do they agree with the Wilcoxon test? If not, list possible reasons why the tests disagree.

Quiz Chapter 15.2

Wilcoxon Rank Sum Test

Data set

var1	var2
5.84	5.93
5.72	5.95
5.75	6.02
5.78	5.81
5.81	6.16
5.81	5.95
5.73	6.09
5.77	5.89
5.76	5.99
5.86	5.60
5.84	6.16
5.83	6.16
5.80	6.06
5.78	6.07
5.89	5.66
5.83	6.14
5.79	5.99
5.84	6.15
5.90	5.81
5.86	6.20

Chapter 15 contents,

10.1 – Compare two independent sample means

Introduction
Worked example
Questions for the worked example
Try another example: Comet assay
Questions for Comet assay data set
T test from summary statistics
P-values from confidence intervals
Questions
Quiz
Comet assay data set
Chapter 10 contents

Introduction

We introduced the concept of comparing a sample statistic (mean) against a population parameter (Chapter 6.7, Normal deviate) or one-sample t-test against a specified mean (eg, from published data or from theory, Chapter 8.5).

Consider now a basic experimental design, the randomized control trial, or RCT (Fig 1), introduced in Chapter 2.4.

simple RCT

Figure 1. A two group Randomized Control Trial.

Subjects randomly selected from population of interest, then again — random assignment — once recruited into one of two treatment groups. Importantly, subjects belong to one treatment arm only: no subject simultaneously receives the treatment and the control. This is in contrast to the paired design, in which subjects receive both treatments (see Chapter 10.3).

In inferential statistics about an experiment, we are more likely trying to test if sample means are different. For example

two species grown in common garden, do they differ in growth rate?
human subjects given a new treatment have better outcomes compare to those receiving a control treatment (eg, placebo).

The equivalent null hypothesis is that two samples are pulled from the same population. We write the null hypothesis as

$\begin{align*} H_{0}\ :\ \bar{X}_{1}=\bar{X}_{2} \end{align*}$

and the corresponding alternative hypothesis, H_A, then must be

$\begin{align*} H_{A}:\bar{X}_{1}\neq \bar{X}_{2} \end{align*}$

Question: Is this a one tailed or two-tailed hypothesis?

Answer: Two-tailed (review Chapter 8.4)

Note 1: In this day and age, there’s really no compelling reason to learn the t-test. First, it is just a special case of the one-way ANOVA, therefore, it’s a special case of the general linear model. Struggling to learn R commands? Well, one solution would be just to learn the general linear model approach — the R function lm() (OK — don’t get too excited — lm() has many options and details). Second, few experiments or observational studies are likely to have only two groups; thus, the temptation to carry out a series of t-tests, taking all groups two at a time, or “pairwise,” while tempting, actually violates a whole bunch of basic statistical rules (discussed in Chapter 12.1). It will also make statisticians go crazy when they see it. That said, if your experiment has but two groups, then by all means, the t-test is a choice. The t-test is also a statistical test that you have likely already used before so we present the discussion here to build on what you may already have learned. We also present the independent t-test as a vehicle.

Worked example

We introduce the two-sample t-test, or better, the independent sample t-test.

$\begin{align*} t=\frac{\bar{X}_{1}-\bar{X}_{2}}{s_{\bar{X}_{1}-\bar{X}_{2}}} \end{align*}$

where the numerator is the difference between the two sample means and the denominator is the standard error of the differences between the two groups standard errors. The formula for

$\begin{align*} s_{\bar{X}_{1}-\bar{X}_{2}}=\sqrt{ \frac{s_1^2}{n_{1}}+\frac{s_2^2}{n_{2}}} \end{align*}$

The choice of independent sample over two-sample is best because it emphasizes that the two groups (the two samples), must be comprise of independent sampling units. This is a pretty straight-forward requirement; you have randomly assigned twenty individuals to two groups, a control group (n = 10) and a treatment group (n = 10). Individuals are either in the control group or they are in the treatment group — they cannot simultaneously appear in both groups.

We will work our way through this test by example. For starters, let’s use the same lizard dataset (see Example data set, below), four body mass recordings (grams) each for house geckos (Hemidactylus frenatus, Fig 2) and the Carolina anole (Anolis carolensis, Fig 3), two of many lizard species introduced to Hawaiʻi.

Male Hemidactylus frenatus, central Oahu, M. Dohm 2018

Figure 2. Male Hemidactylus frenatus, central Oahu, M. Dohm.

Male Anolis carolinensis, ʻAkaka Falls, Big Island of Hawai`i, M. Dohm 2018

Figure 3. Male Anolis carolinensis, ʻAkaka Falls, Big Island of Hawaiʻi, M. Dohm.

Example data set

Geckos: 3.186, 2.427, 4.031, 1.995

Anoles: 5.515, 5.659, 6.739, 3.184

Question: How would you go about creating a data frame with the values in long form (stacked worksheet), including a label variable and the body mass?

Note 2: The independent sample t-test in Rcmdr requires a stacked worksheet and not in unstacked worksheet two columns implied by the above pseudo-code. If you need help with worksheet format, then see Part07 in Mike’s Workbook for Biostatistics.

Answer: At the R prompt, type

Geckos <- c(3.186, 2.427, 4.031, 1.995); Anoles = c(5.515, 5.659, 6.739, 3.184) #create two vectors
bmass <- c(Geckos, Anoles)    #combine the two vectors into a single vector holding all of the body mass records
species <- c("gecko", "gecko", "gecko", "gecko", "anole", "anole", "anole", "anole")   #create your label variable
lizards <- data.frame(species, bmass)  #create your data frame
lizards   #print your data frame  species bmass
1   gecko 3.186
2   gecko 2.427
3   gecko 4.031
4   gecko 1.995
5   anole 5.515
6   anole 5.659
7   anole 6.739
8   anole 3.184

Note also that you can enter data into the Data editor by creating the data frame first then adding values. To edit the data frame “lizards” type fix(lizards) at the R prompt, then close the data frame when you have added or changed values as needed.

As always, begin with an exploration of the data, including a graph (Fig 4).

Box plot of lizard body mass.

Figure 4. Box plot of lizard body mass.

We can see already that there’s greater spread of data for the Anoles compared to the Geckos, but the median values differ. Small sample sizes can be a problem for analyses as we can only have reduced confidence in our conclusions. However, we press on for the sake of demonstration.

Let’s test the null hypothesis, H₀, i.e., the two species of lizards have the same mean body mass.

Rcmdr: Statistics → Means → Independent-samples t-test…

In this next image I posted the Rcmdr menu popup for the Independent Samples t-test (Fig 5). Later versions of Rcmdr split the settings for this command into two tabs; the first tab allows for the selection of the variables and setting the hypotheses whereas the second tab, labeled Options, permits additional choices. The default selections need your attention: to actually conduct the t-test you need to answer “No” to the question, “Assume equal variance?” (Fig 6).

Screenshot Rcmdr t-test menu

Figure 5. Rcmdr menu for Independent sample t-test.

Select the Options tab (Fig 6) to select null hypothesis and to select the t-test and not the Welch-test (which is the default, i.e., No to the prompt “Assume equal variances?”).

Screenshot Rcmdr t-test options

Figure 6. Rcmdr Options menu for Independent sample t-test.

Let’s look at the results and break down the parts of the test.

t.test(Body.mass~Lizard, alternative='two.sided', conf.level=.95, 
+ var.equal=TRUE, data=lizards)

Two Sample t-test

data: Body.mass by Lizard 
t = 2.7117, df = 6, p-value = 0.03503
alternative hypothesis: true difference in means is not equal to 0 
95 percent confidence interval:
0.2308685 4.4981315 
sample estimates:
mean in group Anolis mean in group Gecko 
5.27425 2.90975

Consider the R session output above and answer the following questions.

Questions for the worked example

Which lizard group had the greater mean value, Anolis or Gecko?
What are the assumptions necessary for you to use the independent sample t-test?
What does “two-sided” mean?
What was the null hypothesis?
Was this a one-tailed or two-tailed test of the null hypothesis?
What is the value of the test statistic?
How many degrees of freedom?
What is the critical value for this test?
What is the value of the lower limit of the 95% confidence interval?
What is the value of the lower limit of the 99% confidence interval?
True or False. If the null hypothesis is accepted, then zero is a value included in the 95% confidence interval.
Do you accept the null hypothesis? Explain your selection.

Try another example

DNA damage, changes in the chemical structure of nucleotide bases or breakage of the DNA chains, occurs in cells under many circumstances. The comet assay, or single-cell gel electrophoresis, is one method for visualizing and measuring DNA strand breaks in cells. Exposed cells are mixed with a low-melting temperature agarose and placed onto a microscope slide. The cells are then lysed with an alkaline detergent and high salts. When current is applied across the slide, undamaged DNA remains in the nucleus, whereas damaged DNA extends towards the anode to form a comet-like tail, with imaging assisted by including a fluorescent dye like Sybr-Green. Examples of comets are shown below (Fig 7).

example comet assay

Figure 7. Comet examples. A Intact cell, no DNA damage, B Cell with some DNA damage, a slight tail to the right is evident, C Cell with significant DNA damage, a large tail is evident. M. Dohm.

In an experiment, immortalized lung epithelial cells were exposed to dilute copper solutions for 30 minutes then washed with PBS. The comet assay was applied to these cells and for comparison, to cells without copper exposure but otherwise treated the same way (controls). The data are available at the bottom of this page (scroll down or click here).

Again, you should begin all analyses with an exploration of the data, including a graph (Fig 8).

box plot comet tails

Figure 8. Boxplot of comet tail lengths for cells with and without (control) exposure to copper in the cell medium for 30 minutes.

Let’s look at the R output for the t-test analysis.

    Two Sample t-test

data:  CometTail by Treatment
t = -5.8502, df = 38, p-value = 9.139e-07
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -22.39865 -10.88213
sample estimates:
mean in group Control  mean in group Copper 
             11.14533              27.78571

Consider the R session output above and answer the following questions.

Questions for Comet assay data set

Which cell group had the greater mean value, Copper-exposed or Control-exposed cells?
What are the assumptions necessary for you to use the independent sample t-test?
What does “two-sided” mean?
What was the null hypothesis?
Was this a one-tailed or two-tailed test of the null hypothesis?
What is the value of the test statistic?
How many degrees of freedom?
What is the critical value for this test?
What is the value of the lower limit of the 95% confidence interval?
What is the value of the lower limit of the 99% confidence interval?
True or False. If the null hypothesis is accepted, then zero is a value included in the 95% confidence interval.
Do you accept the null hypothesis? Explain your selection.

T test from summary statistics

In some cases you may only have to summary statistics for data, eg, the means and the standard deviations. We can use the equations of the t test to write a simple formula, where the user provides the known means, standard deviations, and sample size. For example, create a simple function with readline for user input.

myTtest <- function() {
   mnx <- as.numeric(readline(prompt="Enter mean of x: "))
   stdevx <- as.numeric(readline(prompt="Enter sd of x: "))
   nx <- as.numeric(readline(prompt="Enter n of x: "))
   mny <- as.numeric(readline(prompt="Enter mean of y: "))
   stdevy <- as.numeric(readline("Enter sd of y: "))
   ny <- as.numeric(readline(prompt="Enter n of y: "))
   myTvalue <- abs(((mnx-mny)-0)/sqrt(((stdevx^2)/nx)+(stdevy^2)/ny))
   myDF <- as.integer(nx+ny-2)
   myPvalue <- pt(myTvalue,myDF,lower.tail=FALSE)*2
   myResults <- c(myTvalue, myDF, myPvalue)
   report <- c("T-test: ", "df: ", "two-tailed p-value: ")
   cat(sprintf("%s %3.3f, ", report, myResults))
}

then run the function by typing myTest() at the R prompt and entering the means, standard deviations, and sample size when prompted.

myTtest()
Enter mean of x: 2.91
Enter sd of x: .895
Enter n of x: 4
Enter mean of y: 5.27
Enter sd of y: 1.497
Enter n of y: 4
T-test: 2.706, df: 6.000, two-tailed p-value: 0.035

P-values from confidence intervals

While we expect certain reporting criteria for published, it is not uncommon that one or more elements are missing. For example, while means plus or minus standard deviations for two groups may be reported, the confidence interval of the difference may not be reported, or, an inexact p-value is reported like “< 0.05,” but we “need” an exact p-value for our meta-analysis. A little math, and these missing statistics can be calculated.

We can use the lizard study as an example; while all of the expected elements were reported, including the p-value (0.03503), let’s say the p value was reported as “statistically significant at p < 0.05.”

We need

lower and upper confidence intervals: 0.2309 and 4.498, respectively.

critical value from t distribution, df = 6 (2 tailed, therefore 0.05/2). A little R code we have:

$critVal <- qt(c(.05), df=6, lower.tail=FALSE)=2.447$

the standard error, which can be calculated as $SEliz = (upper − lower)/(2×critVal)=1.097968$

and then the p-value, $exp(-.717*testLiz-0.416*testLiz^2) = 0.03101$

which is pretty close to the result from R (p = 0.03503). The difference is likely due to the small sample size.

Questions

Don’t forget to work through the Questions for the Comet tail data set (scroll up or click here).
Microsoft Excel, LibreOffice Calc, and Google sheets spreadsheet software all include t-test functions and return the p-value. Consider two variables big (100, 110, 120, 100, 110, 210, 200) and small (0,1,1,2,0,1,0). (Note — these two groups are obviously very different, calculating a t-test on their difference is silly, just for this question.) If formatting is set to the default two decimal places for Number cell category, the p-value will return as “0.00.” How should you report the p-value in this case?

Quiz Chapter 10.1

Compare two independent sample means

Comet assay data set

Treatment	CometTail
Control	17.856139
Control	16.52125
Control	14.925449
Control	14.029174
Control	13.332945
Control	8.811185
Control	14.701654
Control	9.261025
Control	21.779311
Control	6.180284
Control	9.201752
Control	5.54472
Control	6.717885
Control	2.625092
Control	7.191583
Control	5.392866
Control	11.284813
Control	15.441254
Control	17.857176
Control	4.250956
Copper	53.214287
Copper	38.92857
Copper	18.928572
Copper	30
Copper	28.928572
Copper	15.357142
Copper	17.857143
Copper	17.5
Copper	21.071428
Copper	29.285715
Copper	28.214285
Copper	16.785715
Copper	21.071428
Copper	37.5
Copper	38.214287
Copper	17.857143
Copper	29.642857
Copper	11.071428
Copper	35
Copper	49.285713

Chapter 10 contents

Introduction
Compare two independent sample means
Digging deeper into t-test Plus the Welch test
Paired t-test
References and suggested reading

8.5 – One sample t-test

Introduction
A typical inference workflow
R code
Take a step back and review
Why are you showing me the equations when I can just do this in R?
Check critical value against our test statistic
Questions
Quiz
Chapter 8 contents

Introduction.

We’re now talking about the traditional, classical two group comparison involving continuous data types. Thus begins your introduction to parametric statistics. One sample tests involve questions like, how many — what proportion of — people would we expect are shorter or taller than two standard deviations from the mean? This type of question assumes a population and we use properties of the normal distribution and, hence, these are called parametric tests because the assumption is that the data has been sampled from a particular probability distribution.

However, when we start asking questions about a sample statistic (e.g., the sample mean), we cannot use the normal distribution directly, i.e., we cannot use Z and the normal table as we did before (Chapter 6.7). This is because we do not know the population standard deviation and therefore must use an estimate of the variation (s) to calculate the standard error of the mean.

With the introduction of the t-statistic, we’re now into full inferential statistics-mode. What we do have are estimates of these parameters. The t-test — aka Student’s t-test — was developed for the purpose of testing sample means when the true population parameters are not known.

Note 1: It’s called Student’s t-test after the pseudonym used by William Gosset.

The equation of the one sample t-test. Note the resemblance in form with the Z-score!

$\begin{align*} t=\frac{\bar{X}-\mu}{s_{\bar{X}}} \end{align*}$

where $s_{\bar{X}}$ is the sample standard error of the sample mean (SEM).

For example, weight change of mice given a hormone (leptin) or placebo. The $\bar{X} = 5 g$ , but under the null hypothesis, the mean change is “really” zero ( $\mu= 0$ ). How unlikely is our value of 5 g?

Note 2: Did you catch how I snuck in “placebo” and mice? Do you think the concept of placebo is appropriate for research with mice, or should we simply refer to it as a control treatment? See Ch5.4 – Clinical trials for review.

Speaking of null hypotheses, can you say (or write) the null and alternative hypotheses in this example? How about in symbolic form?

We want to know if our sample mean could have been obtained by chance alone from a population where the true change in weight was zero.

$s = 3, n = 20$

and

$s_{X} = \frac{s}{\sqrt{n}} = \frac{3}{\sqrt{20}} = 0.6708$

and we take these values and plug them into our equation of the t-test

$t = \frac{(5 - 0)}{0.67} = 7.45$

Then recall that Degrees of Freedom are DF = n – 1 so we have DF = 20 – 1 = 19 for the one sample t-test. And the Critical Value is found in the appropriate table of critical values for the t distribution (Fig 1)

portion from t distribution table with alpha = 5% and df = 19 highlighted

Figure 1. Table of a portion of the Critical values of the t distribution. Red selections highlight critical value for t-test at α = 5% and df = 19.

Note 3: See our table of critical values of t distribution.

Or, and better, use R

qt(c(0.025), df=19, lower.tail=FALSE)

where qt() is function call to find t-score of the p^th percentile (cf 3.3 – Measures of dispersion) of the Student t distribution. For a two tailed test, we recall that 0.025 is lower tail and 0.025 is upper tail.

In this example we would be willing to reject the Null Hypothesis if there was a positive OR a negative change in weight.

This was an example of a “two-tailed test” which is “2-tail” or α₍₂₎ in Table of critical values of the t distribution.

Critical Value for α₍₂₎ = 0.05, df = 19, = 2.093

Do we accept or reject the Null Hypothesis?

A typical inference workflow.

Note the general form of how the statistical test is processed, a form which actually applies to any statistical inference test.

Identify the type of data
State the null hypothesis (2 tailed? 1 tailed?)
Select the test statistic (t-test) and determine its properties
Calculate the test statistic (the value of the result of the t-test)
Find degrees of freedom
For the DF, get the critical value
Compare critical value to test statistic
Do we accept or reject the null hypothesis?

And then we ask, given the results of the test of inference, What is the biological interpretation? Statistical significance is not necessarily evidence of biological importance. In addition to statistical significance, the magnitude of the difference — the effect size — is important as part of interpreting results from an experiment. Statistical significance is at least in part because of sample size — the large the sample size, the smaller the standard error of the mean, therefore even small differences may be statistically significant, yet biologically unimportant. Effect size is discussed in Ch9.1 – Chi-square test: Goodness of fit, Ch11.4 – Two sample effect size and Ch12.5 – Effect size for ANOVA.

R Code.

Let’s try a one-sample t-test. Consider the following data set: body mass of four geckos and four Anoles lizards (Dohm unpublished data).

For starters, let’s say that you have reason to believe that the true mean for all small lizards is 5 grams (g).

Geckos: 3.186, 2.427, 4.031, 1.995
Anoles: 5.515, 5.659, 6.739, 3.184

Get the data into R (Rcmdr)

By now you should be able to load this data in one of several ways. If you haven’t already entered the data, check out Part 07. Working with your own data in Mike’s Workbook for Biostatistics.

Once we have our data.frame, proceed to carry out the statistical test.

To get the one-sample t-test in Rcmdr, click on Statistics → Means → Single-sample t-test… Because there is only one numerical variable, Body.mass, that is the only one that shows up in the Variable (pick one) window (Fig 2).

Figure 6. Screenshot Rcmdr single-sample t-test menu

Figure 2. Screenshot Rcmdr single-sample t-test menu.

Type in the value 5.0 in the Null hypothesis: m = u box.

Question 1: Quick! Can you write, in plain old English, the statistical null hypothesis???

Answer 1: For example: No difference between gecko and Anolis lizard mean body mass.

Click OK

The results go to the Output Window.

t.test(lizards$Body.mass, alternative='two.sided', mu=5.0, conf.level=.95)
One Sample t-test
data: lizards$Body.mass
t = -1.5079, df = 7, p-value = 0.1753
alternative hypothesis: true mean is not equal to 5
95 percent confidence interval:
2.668108 5.515892
sample estimates:
mean of x 4.092

end of R output

Let’s identify the parts of the R output from the one sample t-test. R reports the name of the test and identifies

The dataset$variable used (lizards$Body.mass). The data set was called “lizards” and the variable was “Body.mass”. R uses the dollar sign ($) to denote the dataset and variable within the data set.
The value of the t test statistic was (t = -1.5079). It is negative because the sample mean was less than the population mean — you should be able to verify this!
The degrees of freedom, df = 7
The p-value = 0.1753
95% confidence interval of the population mean; lower limit = 2.668108, upper limit = 5.515892
The sample mean = 4.092

Take a step back and review.

Let’s make sure we “get” the logic of the hypothesis testing we have just completed.

Consider the one-sample t-test.

$\begin{align*} t=\frac{\bar{X}-\mu}{s_{\bar{X}}} \end{align*}$

Step 1. Define H_O and H_A. The null hypothesis might be that a sample mean, $\bar{X}$ , is equal to μ = 5.

$H_{0} : μ = 5$

The alternate is that the sample mean is not equal to 20.

$H_{A} : μ \neq 5$

Where did the value 5 come from? It could be a value from the literature (does the new sample differ from values obtained in another lab?). The point is that the value is known in advance, before the experiment is conducted, and that makes it a one-sample t-test.

One tailed hypothesis or two?

We introduced you to the idea of “tails of a test” (Ch08.4). As you should recall, a null/alternative hypothesis for a two-tailed test may be written as

Null hypothesis

$\begin{align*} H_{0} : \bar{X}=\mu \end{align*}$

versus the alternative hypothesis

$\begin{align*} H_{A} : \bar{X} \ne \mu \end{align*}$

where $\bar{X}$ is the sample mean and $\mu$ is the population mean.

Alternatively, we can write one-tailed tests of null/alternative hypothesis

$\begin{align*} H_{0} : \bar{X} \geq \mu \end{align*}$

for the null hypothesis versus the alternative hypothesis

$\begin{align*} H_{A} :\bar{X} < \mu \end{align*}$

Question 2: Are all possible outcomes of the one-tailed test covered by these two hypotheses?

Answer 2: Yes

Question 3: What was the SEM for this problem?

Answer 3: It would be the sample standard deviation divided by the square root of the sample size.

Step 2. Decide how certain you wish to be (with what probability) that the sample mean is different from μ. As stated previously, in biology, we say that we are willing to be incorrect 5% of the time (Cowles and Davis 1982; Cohen 1994). This means we are likely to correctly reject the null hypothesis 100% – 5% = 95% of the time, which is the definition of statistical power. We do this by setting the Type I error to be 5% (alpha, α = 0.05). The Type I error is the chance that we will reject a null hypothesis, but the true condition in the population we sampled was actually “no difference.”

Step 3. Carry out the calculation of the test statistic. In other words, get the value of t from the equation above by hand, or, if using R (yes!) simply identify the test statistic value from the R output after conducting the one sample t test.

Step 4. Evaluate the result of the test. If the value of the test statistic is greater than the critical value for the test, then you conclude that the chance (the P-value) that the result could be from that population is not likely and you therefore reject the null hypothesis.

Question 4: What is the critical value for a one-sample t-test with df = 7?

Answer 4: From R, we get + 2.365 for the two-tailed test. R code was qt(c(.025), df=7, lower.tail=FALSE)

Hint; you need the table or better, use R

Rcmdr: Distributions → Continuous distributions → t distributions → t quantiles

You also need to know three additional things to answer this question.

You need to know alpha (α), which we have said generally is set at 5%.
You also need to know the degrees of freedom (DF) for the test. For a one sample t-test, DF = n – 1, where n is the sample size.
You also must know whether your test is one or two-tailed.
You then use the t-distribution (the tables of the t-distribution at the back of your book) to obtain the critical value. Note that if you use R, the actual p-value is returned.

Why learn the equations when I can just do this in R?

Rcmdr does this for you as soon as you click OK. Rcmdr returns the value of the test statistic and the p-value. R does not show you the critical value, but instead returns the probability that your test statistic is as large as it is AND the null hypothesis is true. From our one-sample t-test example, the Rcmdr output. The simple answer is that in order to understand the R output properly you need to know where each item of the output for a particual test comes from and how to interpret it. Thus, the best way is to have the equations available and to understand the algorithmic approach to statistical inference.

And, this is as good of time as any to show you how to skip the RCmdr GUI and go straight to R.

First, create your variables. At the R prompt enter the first variable

liz <- c("G","G","G","G","A","A","A","A")

and then create the second variable

bm <- c(3.186,2.427,4.031,1.995,5.515,5.659,6.739,3.184)

Next, create a data frame. Think of a data frame as another word for worksheet.

lizz <- data.frame(liz,bm)

Verify that entries are correct. At the R prompt type “lizz” wthout the quotes and you should see

lizz
 liz bm
1 G 3.186
2 G 2.427
3 G 4.031
4 G 1.995
5 A 5.515
6 A 5.659
7 A 6.739
8 A 3.184

End of R output

Carry out the t-test by typing at the R prompt the following

t.test(lizz$bm, alternative='two.sided', mu=5, conf.level=.95)

And, like the Rcmdr output we have for the one-sample t-test the following R output

One Sample t-test
data: lizards$Body.mass
t = -1.5079, df = 7, p-value = 0.1753 alternative hypothesis: true mean is not equal to 5
95 percent confidence interval:
2.668108 5.515892
sample estimates:
mean of x
4.092

End of R output

which, as you probably guessed, is the same as what we got from RCmdr.

Question 5: From the R output of the one sample t-test, what was the value of the test statistic?

-1.5079
7
0.1753
2.668108
5.515892
4.092

Answer 5: -1.5079

Note 4: BI311 students — On an exam you will be given portions of statistical tables and output from R. Thus you should be able to evaluate statistical inference questions by completing the missing information. For example, if I give you a test statistic value, whether the test is one- or two-tailed, degrees of freedom, and the Type I error rate alpha, you should know that you would need to find the critical value from the appropriate statistical table. On the other hand, if I give you R output, you should know that the p-value and whether it is less than the Type I error rate of alpha would be all that you need to answer the question.

Why fall back on statistical tables? Think of this as a basic skill. In statistics and for some statistical tests, Rcmdr and other software may not provide the information needed to decide that your test statistic is large, and a table in a statistics book is the best way to evaluate the test.

For now, double check Rcmdr by looking up the critical value from the t-table.

Check critical value against our test statistic

Df = 8 – 1 = 7

The test is two-tailed, therefore α(2)

α = 0.05 (note that two-tailed critical value is 2.365. T was equal to 1.51 (since t-distribution is symmetrical, we can ignore the negative sign), which is smaller than 2.365 and so we would agree with Rcmdr — we cannot reject the null hypothesis.

Question 6: From the R output of the one sample t-test, what was the P-value?

-1.5079
7
0.1753
2.668108
5.515892

Answer 6: 0.1753

Question 7: We would reject the null hypothesis

False
True

Answer 7: False — p-value, 17.5%, is greater than Type I error of 5%.

Questions

Seven questions, with answers, were provided for you within the text in this chapter. Here’s one more, but without answers.

8. Here’s a small data set for you to try your hand at the one-sample t-test and Rcmdr. The dataset contains cell counts, five counts of the numbers of beads in a liquid with an automated cell counter (Scepter, Millipore USA). The true value is 200,000 beads per milliliter fluid; the manufacturer claims that the Scepter is accurate within 15%. Does the data conform to the expectations of the manufacturer? Write a hypothesis then test your hypothesis with the one-sample t-test. Here’s the data.

scepter

258900

230300

107700

152000

136400

Quiz Chapter 8.5

One sample t-test

15.2 – Wilcoxon Rank Sum Test

Introduction

Wilcoxon test, worked example

Questions

Data set

Chapter 15 contents,

10.1 – Compare two independent sample means

Introduction

Worked example

Example data set

Questions for the worked example

Try another example

Questions for Comet assay data set

T test from summary statistics

P-values from confidence intervals

Questions

Comet assay data set

Chapter 10 contents

8.5 – One sample t-test

Introduction.

A typical inference workflow.

R Code.

Take a step back and review.

One tailed hypothesis or two?

Why learn the equations when I can just do this in R?

Check critical value against our test statistic

Questions

Chapter 8 contents