6 – Probability, Distributions
Introduction
Probability is how likely something, an event, is likely to occur. Thus, an important concept to appreciate is that in many cases, like R.A. Fisher’s Lady tasting tea analogy, we can count in advance all possible outcomes of an experiment. On the other hand, for many more experiments, we cannot count all possible outcomes of the sample space, either because they are too numerous or simply unknowable. In such cases, applying theoretical probability distributions allow us to circumvent the countability problem. Whereas empirical probability distributions are frequency counts of observations, theoretical probabilities are based on mathematical formulas.
Probability distributions are key to the null hypothesis significance testing framework for statistical inference. Given assumptions about the data, probability distributions are used to evaluate the likelihood of observing data under the null hypothesis, eg, no difference between means of a control group compared to a treatment group. Much of classical inferential statistics, especially the kind one finds in introductory courses like ours, are built on probability distributions. ANOVA, t-tests, linear regression, etc., are parametric tests and assume errors are distributed according to a particular type of distribution, the normal or Gaussian distribution.
A probability distribution is a list of probabilities for each possible outcome of a discrete random variable in an entire population. Depending on the data type, there are many classes of probability distributions. In contrast, probability density functions are used to for continuous random variables. This chapter begins with basics of probability then gently introduces discrete and continuous probability distributions. In the other sections of this chapter we describe several probability density functions. Emphasis is placed on the normal distribution, which underlies most parametric statistics.
Homework to go with this topic
Homework 3: Distributions & Probability in Mike’s Workbook for Biostatistics.
Quizzes in this chapter
A total of 88 questions among the several subchapters, a mix of true or false and multiple choice question format.
Chapter 6 contents
- Introduction
- Some preliminaries
- Ratios and proportions
- Combinations and permutations
- Types of probability
- Discrete probability distributions
- Continuous distributions
- Normal distribution and the normal deviate (Z)
- Moments
- Chi-square distribution
- t distribution
- The F distribution
- References and suggested readings