9 – Inferences, Categorical Data

Introduction

No doubt you have already been introduced to chi-square (\chi^{2}) tests (click here correct pronounciation), particularly if you’ve had a genetics class, but perhaps you were not told why you were using the  \chi^{2} test, as opposed to some other test, for example t-test, ANOVA, or linear regression.

Chi-square analyses are used in situations of discrete (i.e., categorical or qualitative) data types. When you can count the number of “yes” or “no” outcomes from an experiment, then you are talking about a \chi^{2} problem. In contrast, continuous (i.e., quantitative) data types for outcome variables would require you to use the t-test (for two groups) or the ANOVA-like procedures (for two or more groups). Chi-square tests can be applied when you have two or more treatment groups.

Two kinds of chi-square analyses

(1) We ask about the “fit” of our data against predictions from theory. This is the typical chi-square that student’s have been exposed to in biology lab. If outcomes of an experiment can be measured against predictions from some theory, then this is a goodness of fit (gof) \chi^{2}. Goodness of fit is introduced in section, 9.1.

Note: The fit concept in statistics is simply how well a statistical model explains the data. As we go forward, this concept will appear frequently.

(2) We ask whether the outcomes of an experiment are associated with a treatment. These are called contingency table problems, and they will be the subject of the next lecture. The important distinction here is that there exists no outside source of information (“theory”) available to make predictions about what we would expect. Contingency tables are introduced in section 9.2.


Chapter 9 contents