16.4 – Spearman and other correlations

Introduction
Spearman rank correlation
R code
Other correlations
Questions
Chapter 16 contents

Introduction

Pearson product moment correlation is used to describe the level of linear association between two variables. There are many types of correlation estimators in addition to the familiar Product Moment Correlation, r.

Spearman rank correlation

If you take the ranks for X₁ and the ranks for X₂, the correlation of ranks is called Spearman rank correlation, r_s. Spearman correlation is a nonparametric statistic. Like the product moment correlation, it can take values between -1 and +1.

For variables X₁ and X₂, the rank order correlation may be calculated on the ranks as

$\begin{align*} \rho_{s} = 1 - \frac{6 \sum d_{i}^2}{n \left (n^2 - 1 \right )} \end{align*}$

where d_i is the difference between the ranks of X₁ and X₂ for each experimental unit. This formula assumes that there are no tied ranks; if there are, use the equation for the product moment correlation instead (but on the ranks).

R commander has an option to calculate the Spearman rank correlation simply by selecting the check box in the correlation sub menu. However, if the data set is small, it may be easier to just run the correlation in the script window.

Our example for the product moment correlation was between Drosophila fly wing length and wing area (Table 1).

Table 1. Fly wing lengths and area, units mm and mm², respectively (Dohm pers obs.)

Obs	Student	Length	Area
1	S01	1.524	0.446
2	S01	2.202	0.876
3	S01	1.52	0.39
4	S01	1.62	0.51
5	S01	1.71	0.736
6	S03	1.551	0.453
7	S03	2.228	0.882
8	S03	1.46	0.394
9	S03	1.659	0.503
10	S03	1.719	0.535
11	S05	1.534	0.441
12	S05	2.223	0.889
13	S05	1.49	0.391
14	S05	1.633	0.514
15	S05	1.704	0.546
16	S08	1.551	0.453
17	S08	2.228	0.882
18	S08	1.468	0.394
19	S08	1.659	0.503
20	S08	1.719	0.535

Repeated observations by image analysis (ImageJ) were collected by four students (S01, S03, So5, S08) from fixed wings to glass slides.

Here’s the scatterplot of the ranks of fly wing length and fly wing area (Fig. 1).

fly wings, area by length
Figure 1. Drosophila wing area (mm²) by wing length (mm).

A nonparametric alternative to the product moment correlation, the Spearman Rank correlation can be obtained directly. The Spearman correlation involves ranking the data, i.e., converting data types, from ratio scale data to ordinal scale, then applying the same formula used for the Product moment correlation to the ranked data. The Spearman correlation would be the choice for testing linear association between two ordinal type variables. It is also appropriate in lieu of the parametric product moment correlation when the statistical assumptions are not met, e.g., normality assumption.

R code

For the Spearman rank correlation, at the R prompt type

cor.test(Area, Length, alternative="two.sided", method="spearman")

R returns with
    Spearman's rank correlation rho

data:  Area and Length
S = 58.699, p-value = 5.139e-11
alternative hypothesis: true rho is not equal to 0
sample estimates:
      rho 
0.9558658

Alternatively, to calculate either correlation, use R Commander.

Rcmdr: Statistics → Summaries→ Correlation test

Example

BM=c(29,29,29,32,32,35,36,38,38,38,40)
Matings=c(0,2,4,4,2,6,3,3,5,8,6)

cor.test(BM,Matings, method="spearman")
Warning in cor.test.default(BM, Matings, method = "spearman") :
  Cannot compute exact p-value with ties

        Spearman's rank correlation rho

data:  BM and Matings
S = 77.7888, p-value = 0.03163
alternative hypothesis: true rho is not equal to 0
sample estimates:
      rho 
0.6464143

cor.test(BM,Matings, method="pearson")

        Pearson's product-moment correlation

data:  BM and Matings
t = 2.6728, df = 9, p-value = 0.02551
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 0.1087245 0.9042515
sample estimates:
      cor 
0.6652136

Other correlations

Kendall’s tau

Another nonparametric correlation is Kendall’s tau (τ). Rank the X₁ values, then rank the X₂ values. Count the number of X₁, X₂ pairs that have the same rank (concordant pairs) and the number of X₁, X₂ pairs that do not have the same rank (discordant pairs), Kendall’s tau is then

$\begin{align*} \tau = \frac{\left ( no. \ of \ concordant \ pairs \right ) - \left ( no. \ of \ discordant \ pairs \right ) }{\frac{1}{2} \left ( n - 1 \right )} \end{align*}$

where n is the number of pairs.

Note 1: The denominator is our familiar number of pairwise comparisons if we take k = n

$\begin{align*} C = \frac{k \left ( k - 1 \right )}{2} \end{align*}$

We introduced concordant and discordant pairs when we presented McNemar’s test and cross-classified experimental design in Chapter 9.6.

Example: Judging of Science Fair posters

What is the agreement between two judges, A and B, who evaluated the same science fair posters? Posters were evaluated on if the student’s project was hypothesis-based and judges used a Likert-like scale Strongly disagree (1), Somewhat disagree (2), Neutral (3), Somewhat agree (4), Strongly agree (5).

Table 2. Two judges evaluated six posters for evidence of hypothesis-based project

Poster	Judge.A	Judge.B
1	5	4
2	2	3
3	4	2
4	3	1
5	2	1
6	4	3

A concordant pair represents a poster ranked higher by both judges, while a disconcordant pair is a poster ranked high by one judge but low by another judge. Poster 1 and poster 5 were concordant pairs,

In R, it is simple to get this correlation directly by invoking the cor.test function and specifying the method equal to kendall. The cor.test assumes that the data are in a matrix, so use the cbind function to bind two vectors together – note the vectors need to have the same number of observations. If the data set is small, it is easier to just enter the data directly in the script window of R commander.

A = c(2,2,3,4,4,5)
B = c(1,3,1,2,3,4)
m = cbind(A,B)
cor.test(A,B, method="kendall")

  Cannot compute exact p-value with ties

        Kendall's rank correlation tau

data:  A and B
z = 1.4113, p-value = 0.1581
alternative hypothesis: true tau is not equal to 0
sample estimates:
      tau 
0.5384615

End of R output

There were no ties in this data set, but we can run the product moment correlation just for comparison

cor.test(A,B, method="pearson")

        Pearson's product-moment correlation

data:  A and B
t = 1.4649, df = 4, p-value = 0.2168
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 -0.4239715  0.9478976
sample estimates:
      cor 
0.5909091

End R output

Tetrachoric and Polychoric correlation

Tetrachoric correlations used for binomial outcomes (yes, no), polychoric correlation used for ordinal categorical data like the Likert scale. Introduced by Karl Pearson, commonly applied correlation estimate for Item Analysis in psychometric research. Pyschometrics, a sub-discipline within psychology and now a significant part of education research, is about evaluating assessment tools.

R package psych.

R code: Tetrachoric correlation

tetrachoric(x,y=NULL,correct=.5,smooth=TRUE,global=TRUE,weight=NULL,na.rm=TRUE,
     delete=TRUE)

R code: Polychoric correlation

polychoric(x,smooth=TRUE,global=TRUE,polycor=FALSE,ML=FALSE, std.err=FALSE, 
     weight=NULL,correct=.5,progress=TRUE,na.rm=TRUE,  delete=TRUE)

Polyserial correlation

R package polychor. Used to estimate linear association between a ratio scale variable and an ordinal variable.

R code: Polyserial correlation

polyserial(x,y)

Biserial correlation would be a special case of the polyserial correlation, where ordinal variable is replaced by a dichotomous (binomial) variable.

R code: Polyserial correlation

biserial(x,y)

Intra-class correlation coefficient

Both the ICC and the product moment correlation, r, which we introduced in Chapter 16.1, are measures of strength of linear association between two ratio scale variables (Jinyuan et al 2016). But ICC is more appropriate for association between repeat measures of the same thing, e.g., repeat measures of running speed. In contrast, the product moment correlation can be used to describe association between any two variables, e.g., between repeat measures of running speed, but also between say running speed and maximum jumping height. ICC is used when quantitative measures are organized into paired groups, e.g., before and after on same subjects, or cross-classified designs. ICC was introduced in Chapter 12.3 as part of discussion of repeated measures and random effects. ICC is used extensively to assess reliability of a measurement instrument (Shrout and Fleiss 1979; McGraw and Wong 1996).

Example. Data from Table 2

library(psych)
ICC(myJudge, lmer=FALSE)

R output follows

Intraclass correlation coefficients 
                         type  ICC   F df1 df2      p lower bound upper bound
Single_raters_absolute   ICC1 0.40 2.3   5   6  0.166      -0.306        0.84
Single_random_raters     ICC2 0.46 3.9   5   5  0.081      -0.093        0.85
Single_fixed_raters      ICC3 0.59 3.9   5   5  0.081      -0.130        0.90
Average_raters_absolute ICC1k 0.57 2.3   5   6  0.166      -0.880        0.91
Average_random_raters   ICC2k 0.63 3.9   5   5  0.081      -0.205        0.92
Average_fixed_raters    ICC3k 0.74 3.9   5   5  0.081      -0.299        0.95

Number of subjects = 6 Number of Judges = 2
See the help file for a discussion of the other 4 McGraw and Wong estimates

Lots of output, lots of “ICC”. However, rather than explaining each entry, reflect on the type and review the data. Were the posters evaluated repeatedly? Posters were evaluated twice, but only once per judge, so there is a repeated design with respect to the posters. Had the judges been randomly selected from a population of all possible judges? No evidence was provided to suggest this, so judges were a fixed factor (see Chapter 12.3 and Chapter 14.3).

The six ICC estimates reported by R follow discussion in Shrout and Fliess (1979), and our description fits their Case 3: “Each target is rated by each of the same k judges, who are the only judges of interest (p. 421)” Thus, we find ICC for single fixed rater, ICC = 0.59. Note that we would fail to reject the hypothesis that the judges evaluations were associated.

Questions

See Homework 9, Mike’s Workbook for biostatistics

Chapter 16 contents

16.1 – Product moment correlation

Introduction
The correlation coefficient
Effect size
Standard error of the correlation
Confidence interval
Significance testing
R code
Assumptions of the product-moment correlation
Testing two independent correlations
Questions
Chapter 16 contents

Introduction

A correlation is used to describe the direction $\pm$ magnitude of linear association and the between two variables. There are many types of correlations, some are based on ranks, but the one most commonly used is the product-moment correlation (r). The Pearson product-moment correlation is used to describe association between continuous, ratio-scale data, where “Pearson” is in honor of Karl Pearson (b. 1857 – d. 1936).

There are many other correlations, including Spearman’s and Kendall’s tau (τ) (Chapter 16.4) and ICC, the intraclass correlation (Chapter 12.3 and Chapter 16.4).

The product moment correlation is appropriate for variables of the same kind — for example, two measures of size, like the correlation between body weight and brain weight.

Spearman’s and Kendall’s tau correlation are nonparametric and would be alternatives to the product moment correlation. The intraclass correlation, or ICC, is a parametric estimate suitable for repeat measures of the same variable.

The correlation coefficient

$\begin{align*} r_{XY}=\frac{\sum_{i=1}^{n}\left (X_{i}-\bar{X} \right )\left (Y_{i}-\bar{Y} \right )}{\left (n-1 \right )s_{X} s_{Y}} \end{align*}$

The numerator is the sum of products and it quantifies how the deviates from X and Y means covary together, or change together. The numerator is known as a “covariance.”

$\begin{align*} COV_{XY}=\sum_{i=1}^{n}\left (X_{i}-\bar{X} \right )\left (Y_{i}-\bar{Y} \right ) \end{align*}$

The denominator includes the standard deviations of X and Y; thus, the correlation coefficient is the standardized covariance.

The product moment correlation, r, is an estimate of the population correlation, ρ (pronounced rho), the true relationship between the two variables.

$\begin{align*} \rho_{XY}=\frac{COV_{XY}}{\sigma_{X} \sigma_{Y}} \end{align*}$

where COV(X,Y) refers to the covariance between X and Y.

Effect size

Estimates for r range from -1 to +1; the correlation coefficient has no units. A value of 0 describes the case of no statistical correlation, i.e., no linear association between the two variables. Usually, this is taken as the null hypothesis for correlation — “No correlation between two variables,” with the alternative hypothesis (2-tailed) — “There is a correlation between two variables.”

Like mean difference effect size, we can report the strength of correlation between two variables. Consider the magnitude and not the direction $\pm$ . Like Cohen’s effect size:

Absolute value	Magnitude of association
0.10	small, weak
0.30	moderate
> 0.50	strong, large

Note that one should not interpret a “strong, large” correlation as evidence that the association is necessarily real. See Ch16.c for more on spurious correlations.

Standard error of the correlation

An approximate standard error for r can be obtained using this simple formula

$\begin{align*} s_{r}=\sqrt{\frac{1-r^2}{n-2}} \end{align*}$

This standard error can be used for significance testing with the t-test. See below.

Confidence interval

Like all situations in which an estimate is made, you should report the confidence interval for r. The standard error approximation is appropriate when the hull hypothesis is $r = 0$ , because the joint distribution is approximately normal. However, as the estimate approaches the limits of the closed interval $(-1,1)$ , the distribution becomes increasingly skewed.

The approximate confidence interval for the correlation is based on Fisher’s z-transformation. We use this transformation to stabilize the variance over the range of possible values of the correlation and, therefore, better meet the assumptions of parametric tests based on the normal distribution.

The transform is given by the equation

$\begin{align*} z= 0.5 ln\left ( \frac{1+r}{1-r} \right ) \end{align*}$

where ln is the natural logarithm. In the R language we get the natural log by log(x), where x is a variable we wish to transform.

Equivalently, z can be rewritten as

$\begin{align*} z= arctanh\left (r \right ) \end{align*}$

the inverse hyperbolic tangent function. In R language this function is called by atanh(r) at the R prompt.

The standard error for z is about

$\begin{align*} \sigma_{z} =\sqrt{\frac{1}{n-3}} \end{align*}$

We take z to be the estimate of the population zeta, ζ. We take the sampling distribution of z to be approximately normal and thus we may then use the normal table to generate the 95% confidence interval for zeta.

$\begin{align*} z-1.96\sigm_{z}\ < \ \zeta \ < \ z+1.96\sigm_{z} \end{align*}$

Why 1.96? We want 95% confidence interval, so that at Type I α = 0.05, we want the two tails of the Normal distribution (see Appendix 20.1), or + 0.025. Thus +0.025 is +1.96 and -1.96 corresponds to 0.025 (α = 0.05/2 tails = 0.025).

Significance testing

Significance testing of correlations is straightforward, with the noted caveat about the need to transform in cases where the estimate is close to $\pm1$ . For the typical test of null hypothesis, the correlation, r, is equal to 0, the t distribution can be used (i.e., it’s a t-test).

$\begin{align*} t=\frac{r-0}{s_{r}} \end{align*}$

which has degrees of freedom, DF = n – 2.

Use the t-table critical values to test the null hypothesis involving product moment correlation (e.g., Appendix 20.3; for Spearman rank correlation $r_{s}$ , see Table G, p. 686 in Whitlock & Schluter).

Alternatively (and preferred), we’ll just use R and Rcmdr’s facilities without explanation; the t distribution works OK as long as the correlations are not close to $\pm1$ , in which case other things need to be done — and this is also true if you want to calculate a confidence interval for the correlation.

You are sufficiently skilled at this point to evaluate whether a correlation is statistically significantly different from zero — just check out whether the associated P-value is less than or greater than alpha (usually set at 5%). A test of whether or not the correlation, r₁, is equal to some value, r₂, other than zero is also possible. For an approximate test, replace zero in the above test statistic calculation with the value for r₂, and calculate the standard error of the difference. Note that use of the t-test for significance testing of the correlation is an approximate test — if the correlations are small in magnitude using the Fisher’s Z transformation approach will be less biased, where the test statistic z now is

$\begin{align*} \z =\frac{z_{r1} - z_{r2}}{\sigma_{z1-z2}} \end{align*}$

and standard error of the difference is

$\begin{align*} \sigma_{z1-z2} =\sqrt{\frac{1}{n1-3}+\frac{1}{n2-3}} \end{align*}$

and look up the critical value of z from the normal table.

R code

To calculate correlations in R and Rcmdr, have ratio-scale data ready in the columns of a R and Rcmdr data frame. We’ll introduce the commands with an example data set from my genetics laboratory course.

Question. What is the estimate of the product moment correlation between Drosophila fly wing length and area?

Data (thanks to some of my genetics students!)

Area <- c(0.446, 0.876, 0.390, 0.510, 0.736, 0.453, 0.882, 0.394, 0.503, 0.535, 0.441, 0.889, 0.391, 0.514, 0.546, 0.453, 0.882, 0.394, 0.503, 0.535)
Length <- c(1.524, 2.202, 1.520, 1.620, 1.710, 1.551, 2.228, 1.460, 1.659, 1.719, 1.534, 2.223, 1.490, 1.633, 1.704, 1.551, 2.228, 1.468, 1.659, 1.719)

Create your data frame, e.g.,

FlyWings <- data.frame(Area, Length)

And here’s the scatterplot (Fig. 1). We can clearly see that Wing Length and Wing Area are positively correlated, with one outlier (Fig. 1).

fly wings, area by length

Figure 1. Scatterplot of Drosophila wing area by wing length

The R command for correlation is simply cor(x,y). This gives the “pearson” product moment correlation, the default. To specify other correlations, use method = "kendall", or method = "spearman" (See Chapter 16.4).

Question. What are the Pearson, Spearman, and Kendall’s tau estimates for the correlation between fly Wing Length and Wing Area?

At the R prompt, type

cor(Length,Area) 
[1] 0.9693334 
cor(Length,Area, method="kendall") 
[1] 0.8248008 
cor(Length,Area, method="spearman") 
[1] 0.9558658

Note that we entered Length first. On your own, confirm that the order of entry does not change the correlation estimate. To both estimate test the significance of the correlation between Wing Area and Wing Length, at the R prompt type

cor.test(Area, Length, alternative="two.sided", method="pearson")

R returns with

Pearson's product-moment correlation 

data: Area and Length 

t = 16.735, df = 18, p-value = 2.038e-12>
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.9225336 0.9880360
sample estimates:
   cor
0.9693334

Alternatively, to calculate and test the correlation, use R Commander, Rcmdr: Statistics → Summaries→ Correlation test

R’s cor.test uses Fisher’s z transformation; note if we instead use the approximate calculation instead how poor the approximation works in this example. The estimated correlation was 0.97, thus the approximate standard error was 0.058. The confidence interval (t-distribution, alpha=0.05/2 and 18 degrees of freedom) was between 0.848 and 1.091, which is greater than the z transform result and returns an out-of-bounds upper limit.

Alternative packages to base R provide more flexibility and access to additional approaches to significance testing of correlations (Goertzen and Cribbie 2010). For example, z_cor_test() from the TOSTER package.

z_cor_test(Area, Length)

Pearson's product-moment correlation
data:  Area and Length
z = 8.5808, N = 20, p-value < 2.2e-16
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.9225336 0.9880360
sample estimates:
      cor
0.9693334

To confirm, check the critical value for z = 8.5808, two-tailed, with

> 2*pnorm(c(8.5808), mean=0, sd=1, lower.tail=FALSE)
[1] 9.421557e-18

Note the difference is that Fisher’s z is used for hypothesis testing; cor.test and z_cor_test return same confidence intervals.

and by bootstrap resampling (see Chapter 19.2),

boot_cor_test(Area, Length)

Bootstrapped Pearson's product-moment correlation

data: Area and Length
N = 20, p-value < 2.2e-16
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.8854133 0.9984744
sample estimates:
cor
0.9693334

The z-transform confidence interval would be preferred over the bootstrap confidence interval because it is narrower.

Assumptions of the product-moment correlation

Interestingly enough, there are no assumptions for estimating a statistic. You can always calculate an estimate, although of course, this does not mean that you have selected the best calculation to describe the phenomenon in question, it just means that assumptions are not applicable for estimation. Whether it is the sample mean or the correlation, it is important to appreciate that, look, you can always calculate it, even if it is not appropriate!

Statistical assumptions and those technical hypotheses we evaluate apply to statistical inference — being able to correctly interpret a test of statistical significance for a correlation estimate depends on how well assumptions are met. The most important assumption for a null hypothesis test of correlation is that samples were obtained from a “bivariate normal distribution.” It is generally sufficient to just test normality of the variables one at a time (univariate normality), but the student should be aware that testing the bivariate normality assumption can be done directly (e.g., Doornick and Hansen 2008).

Testing two independent correlations

Extending from a null hypothesis of the correlation is equal to zero to the correlation equals a particular value should not be a stretch for you. For example, since we use the t-test to evaluate the null hypothesis that the correlation is equal to zero, you should be able to make the connection that, like the two sample t-test, we can extend the test of correlation to any value. However, using the t-test without considering the need to stabilize the variance.

When two correlations come from independent samples, we can test whether or not the two correlations are equal. Rather than use the t-test, however, we use a modification of Fisher’s Z transformation. Calculate z for each correlation separately, then use the following equation to obtain Z. We then look up Z from our table of standard normal distribution (Appendix 20.1, or better — use the normal distribution functions in Rcmdr) and we can obtain the p-value of the test of the hypothesis that the two correlations are equal.

$\begin{align*} Z= \frac{z_{1}-z_{2}}{\sqrt{\frac{1}{n_{1}-3}+\frac{1}{n_{2}-3}}} \end{align*}$

Example. Two independent correlations are r1 = 0.2 and r2 = 0.34. Sample sizes for group 1 was 14 and for group 2 was 21. Test the hypothesis that the two correlations are equal. Using R as a calculator, here’s what we might write in the R script window and the resulting output. It doesn’t matter which correlation we set as r1 or r2, so I prefer to calculate the absolute value of Z and then get the probability from the normal table for values greater or equal to |Z| (i.e., the upper tail).

z1 = atanh(0.2)
z2 = atanh(0.34)
n1 = 14
n2 = 21
Z = abs((z1-z2)/sqrt((1/(n1-3))+(1/(n2-3))))
Z = 0.3954983

From the normal distribution table we get a p-value of 0.3462 for the upper tail. Because this p-value is not less than our typical Type I error rate of 0.05, we conclude that the two correlations are not in fact significantly different.

Rcmdr: Distributions → Continuous distributions → Normal distribution → Normal probabilities…

pnorm(c(0.3954983), mean=0, sd=1, lower.tail=FALSE)

R returns

[1] 0.3462376

To make this two-tailed, of course all we have to do is multiple the one-tailed p-value by two, in this case two-tailed p-value = 0.69247.

Write a function in R

There’s nothing wrong with running the calculations as written. But R allows users to write their own functions. Here’s one possible function we could write to test two independent correlations. Write the R function in the script window.

test2Corr = function(r1,r2,n1,n2) {
z1=atanh(r1); z2=atanh(r2)
Z = abs((z1-z2)/sqrt((1/(n1-3))+(1/(n2-3))))
pnorm(c(Z), mean=0, sd=1, lower.tail=FALSE)
}

After submitting the function, we then invoke the function by typing at the R prompt

p = test2Corr(0.2,0.34,14,21); p

Again, R returns the one-tailed p-value

[1] 0.3462376

Unsurprisingly, these simple functions are often available in an R package. In this case, the psych package provides a function called r.test() which will accommodate the test of the equality hypothesis of two independent correlations. Assuming that the psych package has been installed, at the R prompt we type

require(psych)
r.test(14,.2,.34,n2=21,twotailed=TRUE)

And R returns

Correlation tests
Call:r.test(n = 14, r12 = 0.20, r34 = 0.34, n2 = 21, twotailed = TRUE)
Test of difference between two independent correlations
z value 0.4 with probability 0.69

More on testing correlations

The discussion above, as stated, assumes the two correlations are independent of each other. In ecological studies, for example, it is not uncommon to have estimates for correlations among multiple pairs of traits. Thus, our approach would not be appropriate for testing whether the difference in correlations between body mass/ornament size and body mass/swimming performance of male killifish because both correlations include the same variable (body mass).

Note: This example was selected from Sowersby et al 2022 — I’m not implying here that the authors analyzed their results incorrectly, I just like the example of traits to illustrate the point!

The r.test() in psyche package can be modified to handle this —

add Diedenhofen and Musch (2015)

Questions

True of False. It is relatively easy to move from the estimation of one correlation between two continuous variables, to the estimation of multiple pairwise (“2 at a time”) correlations among many variables. For k = the number of variables, there are k(k-1)/2 unique correlations. However, one should be concerned about the multiple comparisons problem as introduced in ANOVA when one tests for the statistical significance of many correlations.
True or False. Generally, the null hypothesis of a test of a correlation is Ho: r = 0, although in practice, one could test a null of r = any value.
Return to the fly wing example. What was the estimate of the value of the product moment correlation? The Spearman Rank correlation? The Kendall’s tau?
OK, you have three correlation estimates for test of the same null hypothesis, i.e., correlation between Length and Area is zero. Which estimate is the best estimate?
Apply the Fisher z transformation to the estimated correlation, what did you get?
For the fly wing example, what were the degrees of freedom?
For the fly wing example, calculate the approximate standard error of the product moment correlation.
Return one last time to the fly wing example. What was the value of the lower limit of the 95% confidence interval for the estimate of the product moment correlation? And the value of the upper limit?
Assume that another group of students (n = 15) made measurements on fly wings and the correlation was 0.86. Is the difference between the two correlations for the two groups of students equal? Obtain the probability using the Z calculation and R (Chapter 6.7) or the normal table.