14.8 – More on the linear model in Rcmdr

Introduction
R code: How to analyze multifactorial ANOVA problems
How would we analyze our snake experiment?
Contrast coding
Another example
Nested ANOVA
Repeatability and ANOVA
Questions
Quiz
Chapter 14 contents

Introduction

During the last lectures, we could have used the Two-Way ANOVA command in R or Rcmdr

R code: How to analyze multifactorial ANOVA problems

Rcmdr: Statistics → Means → Multiway ANOVA

To analyze our two factor data sets. As long as the design meets the following conditions, by all means use this command because it is simple and precisely correct.

Both factors are fixed, not random.
Each level of first factor is crossed with each level of the second factor.
No missing data (the design is fully replicated and balanced).

If any of the three points do not fit your two-way design, then you’ll need a different, more general and powerful ANOVA procedure in R and Rcmdr to analyze these types of designs. You’ll need the lm() function (Fig. 1).

Rcmdr: Statistics → Fit models → Linear model…

Menus to seelct linear model function, lm()

Figure 1. Linear model menu in Rcmdr, version 2.7.0

Some R basics with the lm() function, the general linear model.

$\begin{align*} Y\sim model \end{align*}$

Response $Y$ is specified by linear predictor(s), either factors or covariates (ratio-scale predictor variables). We communicate to R what the model is by using operators. The four more commonly used operators are

$+$ the basic way to include the model terms, i.e., the mode predictors

$:$ which is interpreted as the interaction of all the variables and the factors in the term

$*$ is interpreted as factor crossing

Not shown, but $\%in\%$ indicates the term on the left is nested within the term on the right.

A few examples: we’ll have three factors, A, B, and C. For our one-way ANOVA, the model specification would be

$\begin{align*} Y\sim A \end{align*}$

For our crossed, balanced two-way ANOVA, the model specification would be

$\begin{align*} Y\sim A + B + A:B \end{align*}$

or equivalently

$\begin{align*} Y\sim A*B \end{align*}$

And for our block ANOVA problem?

Click here to get entire List of model commands in R.

How would we analyze our snake experiment?

Table 1. The snake data set

Snake	Source	flick
1	dH2O	3
2	dH2O	0
3	dH2O	0
4	dH2O	5
5	dH2O	1
6	dH2O	2
1	fish	6
2	fish	22
3	fish	12
4	fish	24
5	fish	16
6	fish	16
1	worm	6
2	worm	22
3	worm	12
4	worm	24
5	worm	16
6	worm	16

We have two factors, but one factor is a Block (repeated measures on individuals). We need to tell R and Rcmdr what our model is. We’ll return to talk about models next time, a very important topic!! For now, think of a model as adding the independent variables together to predict the response variable.

In our Snake example, it’s a two-way ANOVA, but one factor is individual snake, the other is a treatment, and we have repeat measures, so there cannot be an interaction.

We tell R and Rcmdr which columns contain the Response, and under Model, we enter the columns with the two factors.

repeat measure linear model

Figure 2. Menu of linear model with repeat measures model, Rcmdr, version 2.7.0.

You must also tell R and Rcmdr which factors in the model (if any) are random to get the correct F statistics. Almost without exception, blocking factors are always treated as Random

The output looks like this (see below). More complicated, true (which means more information!), but things marked in red we’ve seen before.

LinearModel.2 <- lm(flick ~ Snake +Source, data=L16SnakeTaste)

summary(LinearModel.2)

Call:
lm(formula = flick ~ Snake + Source, data = L16SnakeTaste)

Residuals:
Min 1Q Median 3Q Max 
-5.2222 -0.7222 0.0278 1.5694 7.4444

Coefficients:
             Estimate Std. Error t value Pr(>|t|) 
(Intercept)    -4.444      2.523  -1.762 0.10862 
Snake[T.2]      9.667      3.090   3.128 0.01072 * 
Snake[T.3]      3.000      3.090   0.971 0.35451 
Snake[T.4]     12.667      3.090   4.099 0.00215 ** 
Snake[T.5]      6.000      3.090   1.942 0.08085 . 
Snake[T.6]      6.333      3.090   2.050 0.06755 . 
Source[T.fish] 14.167      2.185   6.484 0.0000704 ***
Source[T.worm] 14.167      2.185   6.484 0.0000704 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 3.784 on 10 degrees of freedom
Multiple R-squared: 0.8858, Adjusted R-squared: 0.8058 
F-statistic: 11.08 on 7 and 10 DF, p-value: 0.0005337

Contrast coding

When a factor (categorical variable) is used in a linear model (lm() or aov()), R must convert the factor into one or more numeric predictor variables. Contrast coding tells us how to represent the factor in the model, how to interpret the coefficients associated with the factor, how the hypothesis tests of the coefficients are handled, and whether or not Type III sums of squares tests are valid. R can handle contrast coding in a couple of ways, either as a dummy contrast (each group if the factor is contrasted to a baseline) or effects coding, where group means are compared to the grand mean. The command summary(lm()) reports contrast coding that was used in the linear model. Every kind of contrast leads to the same fitted values and same overall F-test, but coefficient estimates and t-tests change.

Note 1: Type I sums of squares tests are affected by the order of the predictors in the model, not by type of contrast coding used. R Commander defaults to Type II tests, which are generally unaffected by coding contrasts used. In contrast, Type III tests are affected by the type of coding contrasts used — Type III tests require sum-to-zero (effects) contrasts.

For the ANOVA table, we call up the command via

Rcmdr: Models → Hypothesis tests → ANOVA table… (Fig. 3).

Rcmdr menu: Model --> Hypothesis testing --> ANOVA table

Figure 3. Rcmdr: Models → Hypothesis tests → ANOVA table… Rcmdr, version 2.7.0

Confirm that the model object is active (in this case, the object was LinearModel.2), accept the defaults about types of tests and marginality, and submit OK. The output is

Anova(LinearModel.2, type="II")
Anova Table (Type II tests)

Response: flick
          Sum Sq Df F value Pr(>F) 
Snake     307.61  5  4.2956 0.02396 * 
Source    802.78  2 28.0256 0.00007954 ***
Residuals 143.22 10 
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

End R output

You do not need to know all of the output; all of that output is there for a reason, of course, but for now, here’s what R and Rcmdr has to say (from the help menu):

The sequential sums of squares is the added sums of squares given that prior terms are in the model. These values depend upon the model order. The adjusted sums of squares are the sums of squares given that all other terms are in the model. These values do not depend upon the model order.

You should try our snake example again, but this time, remove the tongue flick response to the dH2O for the first snake — a missing value. (just type into the cell NA).

How would you use GLM to analyze a simple two-way ANOVA, a fully crossed, fully replicated (“balanced”) design? We could use the Two-way ANOVA command in R and Rcmdr, or we could use lm(). Try it with the data set from random vs nonrandom lecture.

Another example

Below, you see how the model is entered. To let R know that I wish R and Rcmdr to test the interaction, I need to add a Model term for that source of variation. I accomplish this by typing “Diet*Drug” (without the quotes).

Figure 4. Crossed, balanced design. Linear model menu, Rcmdr, version 1.9.2

After clicking OK, the following output from the lm function is returned. How does this compare to output from the two-way ANOVA command in R and Rcmdr?

You should try both and compare!

Rcmdr: Models → Hypothesis tests → ANOVA table

Anova(LinearModel.17, type="II")
Anova Table (Type II tests)
Response: chol_randomized
             Sum Sq  Df F value   Pr(>F) 
diet         141.08   2  1.3061 0.28745 
drug         351.88   2  3.2577 0.05403 .
diet:drug    235.50   4  1.0901 0.38124 
Residuals   1458.19  27

End R output

Nested ANOVA

The nested ANOVA may be analyzed in multiple ways in R and Rcmdr, but I prefer the lm() function because it is the most general. For Nested ANOVA, we can also use lm(). Here’s where it gets a little tricky. Put in the Response variable (Chol), then click in the box for model: Select both factors, then type in / after the factor that’s nesting factor. For our nested model example (14.5 – Nested designs), Manufacturer Source was nested within Drug.

Table 3. Nested design example data set from Chapter 14 – Nested design

Drug	Source	Chol
1	1	202.6
1	1	207.8
1	1	190.2
1	1	211.7
1	1	201.5
1	2	189.3
1	2	198.5
1	2	208.4
1	2	205.3
1	2	210
2	3	212.3
2	3	204.4
2	3	221.6
2	3	209.2
2	3	222.1
2	4	203.6
2	4	209.8
2	4	204.1
2	4	201.8
2	4	202.6
3	5	189.1
3	5	219.9
3	5	196
3	5	205.3
3	5	204
3	6	194.7
3	6	192.8
3	6	226.5
3	6	200.9
3	6	219.7

Note 2: If you were working with a CROSSED model, then you would enter the two factors and indicate the interaction by typing Drug*Source (if these are the two factors involved in the interaction).

Figure 5. Nested design, linear model menu, Rcmdr, version 1.9.2

Fortunately, R and Rcmdr’s help system is quite extensive here, so when in doubt, check the help box…

Output from the linear model for the Nested Example looks like the one below.

The General Linear Model function in R and therefore Rcmdr returns information about our design plus Sums of Squares, Mean squares, and P-values. R and Rcmdr default’s to use of sequential evaluation of effects. Adjusted evaluation is useful for when you have a covariate (like body size or another confounding variable) that should be evaluated first before the factors are evaluated. We will use the sequential analysis.

Rcmdr: Models → Hypothesis tests → ANOVA table

Anova(LinearModel.15, type="II")
Anova Table (Type II tests)
Response: Chol
             Sum Sq Df F value Pr(>F)
Drug         225.14  2  1.1743 0.3262
Drug:Source  269.27  3  0.9363 0.4385
Residuals   2300.61 24

End R output

Repeatability and ANOVA

We need to tell Rcmdr how to structure the error term; you need the data frame to be arranged so

aovRes <- aov(dH2O ~ Source + Error(Source/Subject), data=SnakeTaste)
#Print the results
aovRes
Anova Table (Type II tests)
Response: dH2O
            Sum Sq Df F value   Pr(>F)    
Subject     307.61  5  4.2956 0.02396 *  
Source      802.78  2 28.0256 0.00007954 ***
Residuals   143.22 10                       
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#What components are available in the aovRes object?
names(aovRes)
[1] "Sum Sq"  "Df"      "F value" "Pr(>F)"

How do I extract the “F value” for Subjects?

str(aovRes)

Questions

1. What are three learning objectives from this page?
2. Be able to distinguish use of lm() and aov() and the summary output with respect to testing of linear models.

Quiz Chapter 14.8

Data sets

Table 1. The snake data set

Snake	Source	flick
1	dH2O	3
2	dH2O	0
3	dH2O	0
4	dH2O	5
5	dH2O	1
6	dH2O	2
1	fish	6
2	fish	22
3	fish	12
4	fish	24
5	fish	16
6	fish	16
1	worm	6
2	worm	22
3	worm	12
4	worm	24
5	worm	16
6	worm	16

14.8 – More on the linear model in Rcmdr

Introduction

R code: How to analyze multifactorial ANOVA problems

How would we analyze our snake experiment?

Contrast coding

Another example

Nested ANOVA

Repeatability and ANOVA

Questions

Data sets

Chapter 14 contents