0.2 – List of figures
1.1 – A quick look at R and R Commander
Figure 1. R.app icon shown on a MacBook dock.
Figure 2. The R GUI on a macOS system; red arrow points to the R prompt.
Figure 3. Screenshot of drop down menu RGUI, create new script, Windows 10
Figure 4. Screenshot of portion of R Script editor, Windows 11. A simple R command is visible.
Figure 5. The windows of R Commander, macOS. From bottom to top: Messages, Output, Script (tab, Markdown) Rcmdr ver. 2.4-4.
Figure 6. The windows of R Commander, Win11. From bottom to top: Messages, Output, Script (tab, R Markdown) Rcmdr ver. 2.5-1.
Figure 7. Screenshot of terminal window (cmd) on win11 computer, checking for installed pandoc on a win10 pc.
Figure 8. Screenshot of GUI preferences settings after changing from default MDI to SDI, win10
Figure 9. Screenshot Rcmdr Tools popup menu, macOS 10.15.6
Figure 10. Screenshot Rcmdr Set app nap dialog box, macOS 10.15.6
2.2 – Why do we use R Software?
Figure 1. “Spreadsheets,” xkcd.com no. 2180
2.3 – A brief history of (bio)statistics
Figure 1. Original map by John Snow showing the clusters of cholera cases in the London epidemic of 1854, drawn and lithographed by Charles Cheffins. Image Public Domain, from Wikipedia
Figure 2. Plot of Snow’s London using R cholera package. Triangles marked with p1-p13 represent public water pumps. Red dots represent cholera cases.
Figure 3. Plot of Snow’s London with walking areas drawn about the 13 water pumps. R cholera package.
2.4 – Experimental Design and rise of statistics in medical research
Figure 1. Left: ASARCO smelter, Ruston, Washington, image from Department of Ecology, State of Washington. Direction of smoke from the stack is north, toward Vashon Island. Right: Heat map of arsenic and lead affected areas. image from kingcounty.gov. Darker regions correspond to heavier arsenic and lead contamination of soils.
2.5 – Scientific method and where statistics fits
Figure 1. “Frequentists vs Bayesians,” xkcd.com no. 1132
Figure 2. Probability tree diagram with prevalence of type 2 diabetes and sensitivity, specificity of A1C test, data from CDC and Selvin et al 2011. Tree drawn with free diagrams.net app.
2.6 – Statistical reasoning
Figure 1. “Survivorship bias,” https://www.xkcd.com/1827/
Figure 2. “Selection bias,” https://xkcd.com/2618/
3.1 – Data types
Figure 1. Five Conus shells, example of discrete data type. Click image to view full sized image.
Figure 2. Analog thermometer showing office temperature at 23.1 Celsius, example of interval data type. Click image to view full sized image.
Figure 3. Blood glucose reading, 122 mg/dL. Click image to view full sized image.
Figure 4. Analog hygrometer showing office humidity at 65 percent, example of ratio data type. Click image to view full sized image.
Figure 5. Flowers (Hydrangea) are blue or they are not, example of binomial data type. Click image to view full sized image.
Figure 6. Cats are neither dogs or wolves, example of nominal data type. Click image to view full sized image.
Figure 7. Screenshot Rcmdr Read data from package menu.
3.2 – Measures of Central Tendency
Figure 1. A portion of the R help page about the function mean.
Figure 2. Dot plot of our x variable with locations of the mean (blue) and the trimmed mean (red). The Dotplot(x) function in package RcmdrMisc was used in Rcmdr to make this graphic. Arrows were added by hand. Dotplot() example code presented in Chapter 3.4.
Figure 3. Normal and lognormal distributions with mean (red) and median (blue) noted for comparison.
Figure 4. Female Rhinella marina (formerly Bufo marinus), Chaminade University campus. Body length 23.5 cm. Click image to view full sized image.
3.3 – Measures of dispersion
Figure 1. A histogram which displays a sampling of data with a mean of 10 (arrow marks the spot) and standard deviation (sd) of 50 units.
Figure 2. A histogram which displays a sampling of data with the same mean of 10 (arrow marks the spot) displayed in Fig. 1, but with a smaller standard deviation (sd) of 5 units.
Figure 3. Histogram of ages of subjects in the diabetic retinopathy data set in the survival package.
Figure 4. Scatter plot of the standard deviation (StDev) by the mean. Data sets were simulated.
Figure 5. Plot of the standard deviation by the mean for heights of different breeds of dogs.
3.4 – Estimating parameters
Figure 1. Dot plot of pipet results.
3.5 – Statistics of error
Figure 1. Magnetic dart board with 5 darts. Click image to view full sized image.
4 – How to report statistics
Figure 1. Scatter plot graphs of Anscombe’s quartet (Table 1)
Figure 2. Excel pie chart of Table 2 data set
Figure 3. Bar chart of Table 2 data set
4.1 – Bar (column) charts
Figure 1. Single nucleotide variants for human gene ACTB by DNA and functional element (data collected 19 May 2022 from NCBI SNP database with Advanced search query). A. Pie chart. Note that slide for “exon – nonsense” is not visible. B. Bar chart – color coded bars to facilitate comparison with pie chart.
Figure 2. A simple bar chart
Figure 3. The luxury ship RMS Titanic, which sunk 15 April 1912, More than 1500 souls were lost. Public domain image, Wikipedia. Click image to view full sized image.
Figure 4. A stacked bar chart, survival Titanic
Figure 5. A bar chart with error bars (standard error of the mean).
Figure 6. Another bar chart. with standard errors of mean
Figure 7. Bar chart that allows for a comparison among levels of a a factor (organs, liver vs. heart).
Figure 8. Same chart as in Figure 6, but on ratios.
Figure 9. Rcmdr menu popup for Plots Means
Figure 10. Plot of means
Figure 11. A bar chart using barplot2.
Figure 12. A barchart from ggplot2
4.2 – Histograms
Figure 1. Histograms of age distribution of runners who completed the 2103 Jamba Juice 5K race in Honolulu
Figure 2. KDE plot of age distribution of female runners who completed the 2103 Jamba Juice 5K race in Honolulu
Figure 3. Histogram of 752 observations, Sturge’s rule applied, default histogram
Figure 4. Histogram of 752 observations, Scott’s rule applied, ggplot2 histogram
Figure 5. Default histogram with different bin size
Figure 6. Default histogram, bin size set by Sturge’s rule.
Figure 7. Two histograms on same plot with ggplot2.
Figure 8. Same data as Fig 7, but using base hist() and plot() functions.
Figure 9. Examples of comet assay results.
4.3 – Box plot
Figure 1. A box plot. Elements of box plot labelled.
Figure 2A. Box plot, default graph in base package
Figure 2B. Same graph, but with color and made horizontal; boxplot(), default graph in base package
Figure 2C. Same graph, added original points; boxplot(), default graph in base package.
Figure 3. Popup menu in R Commander: Select the response variable and set the Plot by: option.
Figure 4. Select the group variable
Figure 5. Options tab, enter labels for axes and a title.
Figure 5. Resulting box plot from car package implemented in R Commander. Outliers are identified by row id number.
4.4 – Mosaic plots
Figure 1. Mosaic plot made with basic function mosaicplot().
Figure 2. First steps to make mosaic plot in R Commander EBM plug-in.
Figure 3. Next steps to make mosaic plot in R Commander EBM plug-in.
Figure 4. Mosaic plot made from R Commander EBMplug-in
Figure 5. First steps to make mosaic plot in R Commander KMggplot2 plug-in
Figure 6. Next steps to make mosaic plot in R Commander KMggplot2 plug-in.
Figure 7. Mosaic-like plot made from R Commander KMggplot2 plug-in.
Figure 8. Screenshot of popup menu from Rcmdr with mosaic plugin selected.
Figure 9. After clicking OK (Fig 8), click Yes to restart Rcmdr. The plugin will then be available.
Figure 10. How to access the mosaic plot in R Commander.
Figure 11. Screenshot of popup menu in mosaic plugin in R Commander.
Figure 12. Error message as result of selecting a dataframe for use in mosaic plugin.
Figure 13. Options for the mosaic plot
Figure 14. Our new mosaic plot.
Figure 15. Mosaic plot with changed color scheme.
4.5 – Scatter plots
Figure 1. Scatterplot of mid-parent (vertical axis) and their adult children’s (horizontal axis) height, in inches. data from Galton’s 1885 paper, “Regression towards mediocrity in hereditary stature.” The red line is the linear regression fitted line, or “trend” line, which is interpreted in this case as the heritability of height
Figure 2. Same plot as Figure 1, but with default settings for axis scales.
Figure 3. Finishing times in minutes of 1278 runners by age and gender at the 2013 Jamba Juice Banana 5K in Honolulu, Hawaii. Loess smoothing functions by groups of female (red) and male (blue) runners are plotted along with 95% confidence intervals.
Figure 4. First menu popup in R Commander Scatterplot command, Rcmdr ver. 2.2-3.
Figure 5. Second menu popup in R Commander scatterplot command., Rcmdr ver. 2.2-3
Figure 6. Default scatterplot, package car, from R Commander, version 2.2-4.
Figure 7. Modified scatterplot, same data from Figure 6
Figure 8. R plotting characters pch = 1 – 25 along with examples of color.
Figure 9. Usage of terms for X Y plots in research articles normalized to number of issues in six journals between 1990 and 2016.
Figure 10. Results from Ngram Viewer for American English, “scatterplot” (blue), “scatter plot” (red), “scatter diagram” (green), “scattergram” (orange), and “XY plot” (purple).
Figure 11. Results from Ngram Viewer for British English. See Figure 10 for key.
Figure 12. Bland-Altman plot of 1 cm unit measure in pixel number by imageJ from digital images by two independent observers. Purple central region is 95% CI.
Figure 13. Volcano plot, gene expression fold change.
4.6 – Adding a second Y axis
Figure 1. Screenshot from NOAA GOES-East – Sector view: Tropical Atlantic – GeoColor, 4 September 2019. Click image to view full sized image.
Figure 2. Plot of hurricanes from 1900 to present by decade.
Figure 3. Total number of hurricanes by decades, with Temperature Index by decades. Number of hurricanes represented on first (left) axis and Temperature Index represented on second (right) axis.
Figure 4. Total number of hurricanes by decades, with Atmospheric CO2 measured at Mauna Kea by decades. Number of hurricanes represented on first (left) axis and Atmospheric CO2 represented on second (right) axis.
Figure 5. Bubble plot
4.7 – Q-Q plot
Figure 1. A Q-Q plot, the default command in Rcmdr
Figure 2. Screenshot of R Commander menu for Q-Q plot
4.8 – Ternary plots
Figure 1. Blank Graphics window with initial ternary plot.
Figure 2. A few Skittles® candies.
Figure 3. Ternary plot of our Skittle critter data.
4.9 – Heat maps
Figure 1. Heat map, USA population by county and percent ethnicity compared to white, graph from census.gov
Figure 2. Heat map, gene expression in cultured rat lung cells exposed to metals
Figure 3. A simple heat map generated by heatmap() function, all default options.
Figure 4. ggplot() and aes() functions used to generate a heat map. Colors from brewer.pal
4.10 – Graph software
Figure 1. Screenshot of GrapheR GUI menu, box plot options
Figure 2. Box plot made with GrapheR.
Figure 3. Screenshot of KMggplot2 GUI menu, box plot options
Figure 4. Box plot graph made with GrapheR with jitter applied to avoid overplotting of points.
Figure 5. Box plot graph made with GrapheR with beeswarm applied to avoid overplotting of points.
Figure 6. Screenshot of plotly box plot. Live version, data points visible when mouse pointer hover.
Figure 7. Screenshot of box plot example in Veusz GUI.
Figure 8. Add screenshot
5 – Experimental design
Figure 1. Giant African Snail (Lissachatina fulica, formerly Achatina fulica). Image by M. Dohm
5.2 – Experimental units, Sampling units
Figure 1. Three aquariums, three fish. Image modified from https://www.pngrepo.com/svg/153528/aquarium
Figure 2. Three Miracle-Grow AeroGarden planters, each with nine seedlings of an Arabidopsis thaliana strain.
5.3 – Replication, Bias, and Nuisance Variables
Figure 1. Schematics of a set up for a hypothetical 48 well microplate.
Figure 2. Mean 5K running times (minutes) by age and gender (2006 – 2016, Jamba Juice Banana 5K race, Honolulu, HI).
5.5 – Importance of randomization in experimental design
Figure 1. Age of subjects by groups (A = blue, B = red) with and without randomized assignment of subjects to treatment groups
Figure 2. BMI of subjects by groups (A = blue, B = red) with and without randomized assignment of subjects to treatment groups
Figure 3. An example of clustering resulting from a random sampling process (Graph B). In contrast, Graph A was generated so that a point was located within each grid.
Figure 4. Same graphs as Figure 3, but with ellipses around the grouped data (hard to tell, but the centroids are the larger points).
Figure 5. Map of electrical transmission grid for continental United States of America. Image source https://openinframap.org/#3/24.61/-101.16
5.6 – Sampling from Populations
Figure 1. Sixteen mice, eight red and eight blue. Image © 2024 Mia D Graphics
Figure 2. Sixteen mice, randomly assigned to treatment groups C and T; by chance, 75% blue in C and just 25% in T. If color was a confounding factor then our conclusions about the effectiveness of the treatment would be associated with color. Image © 2024 Mia D Graphics
Figure 3. Format of 96-well plate. Red cells = “edge” wells; White cells = “inner” wells; Well reference in grey letter.
Figure 4. Screenshot of Sampling in Data Analysis menu, Microsoft Excel
Figure 5. Screenshot of input required for Sampling in Data Analysis menu, Microsoft Excel
6.1 – Some preliminaries
Figure 1. https://xkcd.com/365/
igure 2. View of Kamokuna Lava Bench, eruption of Pu`u `O`o, Kilauea, November 1998. Photo by S. Dohm.
Figure 3. Mark Twain. Image from The Miriam and Ira D. Wallach Division of Art, Prints and Photographs: Photography Collection, The New York Public Library. “Mark Twain in Middle Life” The New York Public Library Digital Collections. 1860 – 1920. https://digitalcollections.nypl.org/items/510d47d9-baec-a3d9-e040-e00a18064a99
Figure 3. xkcd comic strip, from https://imgs.xkcd.com/comics/hand_sanitizer.png
6.2 – Ratios and probabilities
Figure 1. Example planting of five tomato seeds, day 5, on agar Petri dish (M. Dohm)
Figure 2. A probability tree to help visualize comparison of deaths (“yes”) by car travel and by airline travel in the United States for the year 2000.
Figure 3. Comparing totals of deaths adjusted by numbers of licensed drivers and by licensed commercial airline pilots in the United States.
Figure 4. Comparing totals of deaths adjusted by numbers of car trips and by numbers of airline trips in the United States.
6.3 – Combinations and permutations
Figure 1. Heads (left) and Tails (right) of a USA quarter.
Figure 2. Playing cards with images commemorating 150th anniversary of Charles Darwin’s Origin of Species. (Design John R. C. White, Master of the Worshipful Company of Makers of Playing Cards 2008 to 2009.)
Figure 3. Bar chart of the combinations of correct guesses out of 10 attempts (graph was presented in Chapter 4.1).
6.5 – Discrete probability distributions
Figure 1. plot generated with KMggplot2 Rcmdr plugin
Figure 2. Example of binomial-like distribution: reported twins born in Hawaii.
Figure 3. Rcmdr menu to get binomial probability
Figure 4. Plot of hypergeometric distribution twinning Hawaii.
Figure 5. Rcmdr menu to get hypergeometric probability
Figure 6. Example, poisson-like graph: the number of wind-dispersed seeds within each grid.
Figure 7. ggplot2 poisson μ= 1.
Figure 8. Rcmdr menu, poisson probability
6.6 – Continuous distributions
Figure 1. Sample size = 20, drawn from population with known μ = 0 and σ = 1.
Figure 2. Sample size = 100, also drawn from population with known μ = 0 and σ = 1.
Figure 3. Sample size n = 1000, once again drawn from population with known μ = 0 and σ = 1.
Figure 4. And lastly, sample size n = 1 million also drawn from population with known μ = 0 and σ = 1.
Figure 5. Screenshot Rcmdr menu, sample from a normal distribution
Figure 6. Frequency expected for a few points (X: 0 – 10) drawn from a normal distribution, calculated using the formula and example values
6.7 – Normal distribution and the normal deviate (Z)
Figure 1. Frequency of observations expected to be greater than 7 from a large population with mean µ = 5 and σ = 2
Figure 2. Portion of the table of the normal distribution. Only values equal to or greater than Z = 0 are visible.
Figure 3. Highlight Z = 0.23, frequency is 0.409046.
Figure 4. Plot of standard normal distribution; area less than -1 σ.
Figure 5. proportion of the population is between 5 and 7 (sorry about all of the colors — I kind of went crazy)
6.8 – Moments
Figure 1. Histogram finishing times in minutes for 1307 runners at 2016 Banana 5K
Figure 2. Rcmdr Numerical summaries Statistics tab.
Figure 3. Histogram finishing times in minutes for random sample of 30 drawn from 1307 runners at 2016 Banana 5K
6.9 – Chi-square distribution
Figure 1. Animated GIF of plots of chi-square distribution over range of degrees of freedom.
Figure 2. The test of the chi-square is typically one-tailed. In this case, probability of values greater than the critical value.
Figure 3. Portion of the table of some critical values of chi-square distribution, one tailed (right-tailed or “upper” portion of distribution).
Figure 4. Portion of the chi-square distribution which shows how to find critical value of the chi-square distribution.
Figure 5. Screenshot of input box in Rcmdr for Chi-square probability values.
6.10 – t distribution
Figure 1. Density plot of standard normal distribution.
Figure 2. Density plot of t-distribution for five degrees of freedom.
Figure 3. Animated GIF of density plot t distribution, from df = 5 to 10,000 plus standard normal curve.
6.11 – F distribution
Figure 1. Animated GIF plot of F distribution value for range of degrees of freedom
7 – Probability, Risk Analysis
Figure 1. “Health data,” https://xkcd.com/2620/
7.2 – Epidemiology basics
Figure 1. R Commander popup menu for Normal quantiles.
7.3 – Conditional Probability and Evidence Based Medicine
Figure 1. Now that’s a box full of kittens. Creative Commons License, source: https://www.flickr.com/photos/83014408@N00/160490011
Figure 2. STS-51-L crew: (front row) Michael J. Smith, Dick Scobee, Ronald McNair; (back row) Ellison Onizuka, Christa McAuliffe, Gregory Jarvis, Judith Resnik. Image by NASA – NASA Human Space Flight Gallery, Public Domain.
Figure 3. Space Shuttle Challenger launches from launchpad 39B Kennedy Space Center, FL, at the start of STS-51-L. Hundreds of shorebirds in flight. Image by NASA – NASA Human Space Flight Gallery, Public Domain.
Figure 4. Probability tree for FOBT test; Good test outcomes shown in green: TP stands for true positive and TN stands for true negative. Poor outcomes of a test shown in red: FN stands for false negative and FP stands for false positive.
Figure 5. A summary of “evidence based medical” decisions, perhaps? https://xkcd.com/1619/
Figure 6. To install an Rcmdr plugin, first go to Rcmdr → Tools → Load Rcmdr plug-in(s)…
Figure 7. Select the Rcmdr plugin, then click the “OK” button to proceed.
Figure 8. Select “Yes” to restart R Commander and finish installation of the plug-in.
Figure 9. After restart of R Commander the EBM plug-in is now visible in the menu.
Figure 10. Select “Enter two-way table…”.
Figure 11. Two-way table Rcmdr EBM plug-in.
Figure 12. Draw a probability tree to help with the frequencies.
Figure 13. EBM plugin with data entry
7.4 – Epidemiology: Relative risk and absolute risk, explained
Figure 1. Data entry for 2X2 table at openepi.com.
Figure 2. Results for 2X2 table at openepi.com.
Figure 3. Rcmdr: Tools → Load Rcmdr plugins…
Figure 4. Rcmdr plug-ins available (after first download the files from an R mirror site).
Figure 5. R Commander EBM plug-in, enter 2X2 table menus
Figure 6. Illustration of probability tree for the statin problem
Figure 7. EBM plugin with two-way table completed for the statin problem.
7.5 – Odds ratio
Figure 1. Mosaic plot of athletes to non-athletes in college. Males red, females yellow, data from Gray 2002
Figure 2. Venn Diagram of athletes to non-athletes in college. Female athletes (n = 375), male athletes (n = 612), data from Gray 2002.
8 – Inferential statistics
Figure 1. NHST decision flow chart
8.1 – The null and alternative hypotheses
Figure 1. Flow chart of inductive statistical reasoning.
8.2 – The controversy over proper hypothesis testing
Figure 1. Screenshot t-quantiles Rcmdr
Figure 2. Screenshot of portion of t-table with highlighted (red) critical value for 10 degrees of freedom.
Figure 3. xkcd: Frequentists vs. Bayesians, https://xkcd.com/1132/
Figure 4. Conditional error probability values plotted against p-values.
8.3 – Sampling distribution and hypothesis testing
Figure 1. means of ten replicate samples drawn at random from chi-square distribution, df = 1.
Figure 2. means of 100 replicate samples drawn at random from chi-square distribution, df = 1. Results from Shapiro-Wilks test: W = 0.97426, p-value = 0.04721
Figure 3. means of one million replicate samples drawn at random from chi-square distribution, df = 1. Normality test will fail to run, sample size of 5000 limit.
Figure 5. Screenshot Rcmdr menu to get normal probability
8.4 – Tails of a test
Figure 1. Two-tailed distribution.
Figure 2. One-tailed distribution, lower tail (left) and upper tail (right).
8.5 – One sample t-test
Figure 1. Table of a portion of the Critical values of the t distribution. Red selections highlight critical value for t-test at α = 5% and df = 19.
Figure 2. Screenshot Rcmdr single-sample t-test menu
9.1 – Chi-square test: Goodness of fit
Figure 1. A portion of critical values of the chi-square at alpha 5% for degrees of freedom between 1 and 10. A more inclusive table is provided in the Appendix, Table of Chi-square critical values.
Figure 2. R Commander menu for Chi-squared quantiles.
Figure 3. R Commander menu for Chi-squared probabilities.
9.2 – Chi-square contingency tables
Figure 1. Screenshot R Commander menu for 2X2 data entry with counts.
Figure 2. Display of Xiang et al data entered into R Commander menu.
Figure 3. Screenshot Statistics options for contingency table.
9.5 – Fisher exact test
Figure 1. Screenshot Rcmdr menu, Contingency tables.
Figure 2. Screenshot Rcmdr menu, Enter Two-Way Table.
Figure 3. Screenshot Rcmdr two-way table menu, load the data from stacked worksheet.
Figure 4. Screenshot Rcmdr menu Statistics option. Select Chi-square test of independence, Fisher’s exact test, or both.
10.1 – Compare two independent sample means
Figure 1. A two group Randomized Control Trial.
Figure 2. Male Hemidactylus frenatus, central Oahu, M. Dohm.
Figure 3. Male Anolis carolinensis, `Akaka Falls, Big Island of Hawai’i, M. Dohm.
Figure 4. Box plot of lizard body mass.
Figure 5. Rcmdr menu for Independent sample t-test.
Figure 6. Rcmdr Options menu for Independent sample t-test.
Figure 7. Comet examples. A Intact cell, no DNA damage, B Cell with some DNA damage, a slight tail to the right is evident, C Cell with significant DNA damage, a large tail is evident. M. Dohm.
Figure 8. Boxplot of comet tail lengths for cells with and without (control) exposure to copper in the cell medium for 30 minutes.
10.2 – Digging deeper into t-test Plus the Welch test
Figure 1. Screenshot Rcmdr t-test options. Default is “No” for Assume equal variances, i.e., the Welch test.
10.3 – Paired t-test
Figure 1. A two group Randomized Crossover Trial.
Figure 2. Histograms shows the distribution of 5K running times of 15 women who ran the race twice.
11.1 – What is Statistical Power?
Figure 1. Population sampling from tail of distribution
Figure 2. Without us knowing, our sample may come from the extremes of two separate populations.
Figure 3. Box plot of race speed (kph) for 15 women 5K in two successive years.
Figure 4. Profile plot, PairedData package.
Figure 5. Box plot of differences, Red dotted lines shows the null hypothesis.
Figure 6. R Commander Paired t-test menu, Rcmdr version 2.7.
Figure 7. R Commander Paired t-Test options, select null hypothesis.
Figure 8. R Commander: Stack worksheet. Select the two variables, Race1 and Race2.
Figure 9. R Commander, select independent sample t-Test …
Figure 10. R commander, independent sample t test menu.
Figure 11. R Commander, select options for independent sample t-Test (assume equal variance).
11.5 – Power analysis in R
Figure 1. Screenshot of Rcmdr menu bar with (A) and without (B) the EZR plugin.
Figure 2. Screenshot of Rcmdr EZR plugin menu
12.2 – One way ANOVA
Figure 1. Hypothetical results of an experiment, box plots. Left, no difference among groups; Right, large differences among groups.
Figure 2. Screenshot Rcmdr select one-way ANOVA
Figure 3. Screenshot Rcmdr select one-way ANOVA options.
Figure 4. Box plot of lengths of leaves of a 10-day old plant from on of three strains of Arabidopsis thaliana.
12.3 – Fixed effects, random effects, and agreement
Figure 1. Simple waterfall plot or race improvement for Table 3 data.
Figure 2. Conus shells, image by M. Dohm
12.6 – ANOVA posthoc tests
Figure 1. One-way ANOVA menu in R Commander
Figure 2. Select Tukey posthoc tests with the one-way ANOVA
Figure 3. Plot of confidence intervals of Tukey HSD
12.7 – Many tests one model
Figure 1. O’hia, Metrosideros polymorpha. Public domain image from Wikipedia.
Figure 2. The o`hia dataset as viewed in R Commander
Figure 3. Box plots of growth responses of o`hia seedlings collected from three Maui sites, M-1 (elevation 750 ft), M-2 (elevation 1100 ft), and M-3 (elevation 6600 ft). Data adapted from Table 5 of Corn and Hiersey 1973.
Figure 4. R Commander, select to fit a Linear model.
Figure 5. Input linear model formula
Figure 6. To retrieve an ANOVA table, select Models, Hypothesis tests, then ANOVA table…
Figure 7. Options for types of tests.
13.1 – ANOVA Assumptions
Figure 1. Histogram of body mass (g) for 24 mammals (data from Boddy et al 2012).
Figure 2. Histogram of log10-transformed body mass observations from Figure 1.
Figure 3. Plot of brain and body weights (A) and log10-log10 transform (B) for a variety of species (data from Boddy et al 2012). The ratio is called encephalization index.
Figure 4. Q-Q plot, raw data. Compare to Figure 1.
Figure 5. Q-Q plot same data, log10-transformed, compare to Figure 2.
Figure 6. Phylogenetic tree of 24 species used in this report.
13.2 – Why tests of assumption are important
Figure 1. Screenshot of Rcmdr options menu for independent t-test. Red arrow points to default option “No,” which corresponds to Welch’s test.
13.3 – Test assumption of normality
Figure 1. Rattle descriptive graphics on Comet Copper dataset. Dotted line (top image) and red line (bottom image) follow the combined observations regardless of treatment.
Figure 2. Graphs describing different distributions. From top to bottom: Leptokurtosis, platykurtosis, negative skew, positive skew.
Figure 3. Histogram of simulated normal dataset, μ = 125, σ = 10.
Figure 4. Cumulated frequency of simulated normal dataset, μ = 125, σ = 10.
Figure 6. Histogram of simulated normal dataset, μ = 0, σ = 1.
Figure 7. Cumulated frequency of simulated normal dataset, μ = 0, σ = 1.
13.4 – Tests for equal variance
Figure 1. Screenshot R Commander F distribution probabilities
Figure 2. Screenshot data options R Commander F test
Figure 3. Screenshot menu options R Commander F test
Figure 4. Screenshot menu options R Commander Levene’s test
14.1 – Crossed, balanced, fully replicated designs
Figure 1A. One of several possible outcome of two treatments (factors). A clear interaction: First Diet level population 1 has greatest weight change, whereas for second diet level, population 2 has greatest weight change.
Figure 1B. One of several possible outcome of two treatments (factors). Clearly, no interaction: Population 1 always lower response than Population 2 regardless of Diet.
Figure 2. Plots of the main effects for Diet factor, levels A and B, and Population, levels 1 and 2.
Figure 3. Interaction plot between two factors, Diet and Population.
Figure 4. Linear model menu in Rcmdr.
Figure 5. A plot showing no interaction between factor A and factor B for some ratio scale response variable.
14.2 – Sources of variation
Figure 1. ANOVA table for two-way, balanced, replicated design.
14.3 – Fixed effects, Random effects
Figure 1. Interaction example. At density D1, genotype 2 (red line) has higher growth rate; at density D2, the ranking switches: now, genotype 1 (black line) has higher growth rate.
Figure 2. Interaction example expanded for multiple genotypes over multiple densities.
14.4 – Randomized block design
Figure 1. Screenshot Rcmdr Linear Model menu.
Figure 2. Line graph of data presented in Table 2.
Figure 3. Juvenile garter snake, image from GetArchive, public domain.
14.7 – Rcmdr Multiway ANOVA
Figure 1. Screenshot Rcmdr multi-way ANOVA.
Figure 2. Predictor effect plots, Diet and Population on Response variable.
Figure 3. Screenshot Rcmdr linear model menu.
14.8 – More on the linear model in Rcmdr
Figure 1. Linear model menu in Rcmdr, version 2.7.0
Figure 2. Menu of linear model with repeat measures model, Rcmdr, version 2.7.0.
Figure 3. Rcmdr: Models → Hypothesis tests → ANOVA table… Rcmdr, version 2.7.0
Figure 4. Crossed, balanced design. Linear model menu, Rcmdr, version 1.9.2
Figure 5. Nested design, linear model menu, Rcmdr, version 1.9.2
15.1 – Kruskal-Wallis and ANOVA by ranks
Figure 1. Screenshot Rcmdr menu create new variable.
15.2 – Wilcoxon Rank Sum Test
Figure 1. Female common house gecko, Hemidactylus frenatus, central Oahu, M. Dohm 2018.
Figure 2. Male Anolis carolinensis, ‘Akaka Falls, Hawai`i, M. Dohm 2018.
Figure 3. Screenshot Rcmdr menu 2 sample Wilcoxon test. Options are selected by clicking on “Options” tab (see Fig. 4)
Figure 4. Screenshot of options tab Rcmdr menu 2 sample Wilcoxon test. Keep defaults to run the “Wilcoxon test.”
Figure 5. Screenshot of Rcmdr menu. Note Two- sample Wilcoxon test… not available.
15.3 – Wilcoxon signed rank test
Figure 1. R Commander paired Wilcoxon test menu (aka Wilcoxon signed rank sum test). Rcmdr version 2.7.
Figure 2. R Commander Options, select null hypothesis.
16 – Correlation, Similarity, and Distance
Figure 1. Bar chart with error bars
Figure 2. Box plots
Figure 3. Scatterplot with groups
16.1 – Product moment correlation
Figure 1. Scatterplot of Drosophila wing area by wing length
16.2 – Causation and Partial correlation
Figure 1. Unmeasured confounding variables influence association between independent and dependent variables, the characters or traits we are interested in.
Figure 2. Running times over 100 meters of top athletes since the 1920s.
Figure 3. Scatterplot birth weight by lead exposure.
Figure 4. Screenshot Rcmdr partial correlation menu
Figure 5. Trellis plot, correlations among variables.
Figure 6. Causal paths among variables.
16.3 – Data aggregation and correlation
Figure 1. Scatterplot crime rates of cities by number of Catholic churches
Figure 2. scatterplot crime rates of cities by number of secular humanist associations.
Figure 3. Illustration of ecological fallacy: positive association at level of groups (boxes, solid blue line), but negative association at level of individuals (black circles, red dashed lines).
Figure 4. Bubble plot of data used to make Figure 1. Plot by LibreOffice Calc.
Figure 5, Bubble plot of data used to make Figure 2. Plot by ggplot2 package in R.
16.4 – Spearman and other correlations
Figure 1. Drosophila wing area (mm2) by wing length (mm).
16.6 – Similarity and Distance
Figure 1. Cartesian plot of two points, the first at x1 = 1 and y1 = 1 and the second at x2 = 4 and y2 = 4.
Figure 2. RAPD gel (simulated) five kinds of beans.
17.1 – Simple linear regression
Figure 1. R commander menu interface for linear model.
Figure 2. Number of matings by body mass (g) of the male bird.
Figure 3. Same data as in Fig. 2, but with the “best fit” line.
Figure 4. Figure 3 redrawn to extend the line to the Y intercept.
Figure 5. 95% confidence interval about the best fit line.
17.4 – OLS, RMA, and smoothing functions
Figure 1. CO2 in parts per million (ppm) plotted by year from 1958 to 2014
Figure 2. Plot of ppm CO2 by month for the year 2013.
Figure 3. Plot with different smoothing values (0.5 to 10.0).
17.5 – Testing regression coefficients
Figure 1. Scatterplot of hypothetical x,y data for which the researcher may obtain a statistically significant linear fit to sample of data from population in which null hypothesis is true relationship between x and y.
Figure 2. Screenshot linear regression menu. More than explanatory (predictor or independent) variables may be selected, but only one response (dependent) variable may be selected.
Figure 3. Pearson Scott Foresman, Public domain, via Wikimedia Commons
Figure 4. Scatterplot of oxygen consumption by tadpoles (blue: Gosner developmental stage I [35 – 38]; red: Gosner developmental stage II [39 – 44]), vs body mass (g).
Figure 5. Boxplot of oxygen consumption by Gosner developmental stages (blue: stage I; red: stage 2).
17.6 – ANCOVA – analysis of covariance
Figure 1. Scatterplot of oxygen consumption by R. pipiens tadpoles vs body mass (g) by developmental group (Gosner stages I or II).
Figure 2. Copy of Figure 4, Chapter 17.5; boxplot of oxygen consumption of R. pipiens tadpoles by Gosner developmental stages.
Figure 3. Scatterplot with best-fit regression lines of \dot{V} O_{2} by \text{Body.mass} for Gosner State I (closed circle, solid line) and Gosner Stage II (open circle, dashed line) R. pipiens tadpoles.
17.8 – Assumptions and model diagnostics for Simple Linear Regression
Figure 1. An ideal plot of residuals
Figure 2. We have a problem. Residual plot shows unequal variance (aka heteroscedasticity).
Figure 3. Problem. Residual plot shows systematic trend.
Figure 4. Problem. Residual plot shows nonlinear trend.
Figure 5. Basic diagnostic plots. A: residual plot; B: Q-Q plot of residuals; C: Scale-location (aka spread-location) plot; D: leverage residual plot.
18 – Multiple Linear Regression
Figure 1. Growth of bacteria over time (optical density at 600 nm UV spectrophotometer) , fit by logistic function (dashed line).
18.1 – Multiple Linear Regression
Figure 1. Screenshot of Rcmdr linear model menu with our model elements in place.
Figure 2. Scatter plot of predicted LDL against dose of a statin drug. Regression lines represent the different statin drugs (Statin1, Statin2).
Figure 3. 3D plot of BMI and dose of Statin drugs on change in LDL levels (green Statin2, blue Statin1).
Figure 4. An example of a possible interactive 3D plot; the file embedded in this page is not interactive, just an animation.
Figure 5. R’s default regression diagnostic plots.
18.2 – Nonlinear regression
Figure 1. Ideal plot of residuals against values of X, the predictor variable, for a well-supported linear model fit to the data.
Figure 2. Example of residual plot; pattern suggests nonlinear fit.
Figure 3. Residual plot
Figure 4. Lifespan of 1881 mice from 31 inbred strains (Data from Yuan et al (2012) available at https://phenome.jax.org/projects/Yuan2).
Figure 5. Screenshot Rcmdr GLM menu. For logistic on ratio-scale dependent variable, select gaussian family and identity link function.
18.3 – Logistic regression
Figure 1. Lifespan of 1881 mice from 31 inbred strains (Data from Yuan et al [2012] available at https://phenome.jax.org/projects/Yuan2). Note: I labeled Y axis labeled “Survival Probability”; “Inverse Survival Probability” would be more accurate.
Figure 2. Access Generalized Linear Model via R Commander
Figure 3. Screenshot Rcmdr GLM menu. For logistic on ratio-scale dependent variable, select gaussian family and identity link function.
18.4 – Generalized Linear Squares
Figure 1. Box plot of residuals from GLS model by elevation site predictors (left) and scatterplot of residuals by fitted values from GLS model (right).
18.5 – Selecting the best model
Figure 1. Rcmdr popup menu, Subset model selection…
Figure 2. Mallow’s Cp plot
Figure 3. Diagnostic plots
18.6 – Compare two linear models
Figure 1. Screenshot Rcmdr compare models menu.
19.1 – Jackknife sampling
Figure 1. histogram of jackknife estimates for slope
Figure 2. Histogram of jackknife estimates for intercept.
19.2 – Bootstrap sampling
Figure 1. histogram of bootstrap estimates for slope
Figure 2. Histogram of bootstrap estimates for intercept.
19.3 — Monte Carlo methods
Figure 1. Histograms of runif results with 100, 1K, 10K, and 100K numbers of values to be generated
Figure 2. Autocorrelation plots of runif results with 100, 1K, 10K, and 100K numbers of values
20.1 – Area under the curve
Figure 1. Area under the curve example.
Figure 2. Example ROC curve
20.5 – Time series
Figure 1. co2 data set from package datasets, comes with Rcmdr installation.
Figure 2. CO2 ppm monthly average data from NOAA, last data October 2020.
Figure 3. Observed (panel, top), trends over time (panel, second from top), seasonal changes (panel, second from bottom), and random error (panel, bottom).
Figure 4. Data in black, predicted values in red (additive) shaded by confidence interval.
20.6 – Dimensional analysis
Figure 1. Scatterplot English swallow mass (g) by total length (mm) by survival following winter storm
Figure 2. Scatterplot matrix of Bumpus English sparrow traits. Traits were (left-right): Alar extent (mm), length (tip of beak to tip of tail), length of head (mm), length of femur (in.), length of humerus (in.), length of sternum (in.), skull width (in.), length of tibio-taurus (in.), and weight (g)
Figure 3. Bi-plot of clusters, Skittles mini bags
20.9 – Survival analysis
Figure 1. Screenshot of menu call for survival analysis in Rcmdr
Figure 2. Kaplan-Meier plot of heart data. Dashed lines are upper and lower confidence intervals about the survival function.
Figure 3. KM plot
Figure 4. Screenshot of Survival estimator menu in Rcmdr.
20.10 – Growth equations and dose response calculations
Figure 1. Top: Parametric Nonlinear Growth Model; Bottom: Nonparametric Spline Fit
Figure 2. Hypothetical data set, survival of yeast in different salt concentrations.
Figure 3. Logistic curve added to Figure 1 plot.
Figure 4. Four parameter (red) and three parameter (green) logistic models fitted to data.
Figure 5. Plot of reduced data set.
Figure 6. Screenshot Microsoft Excel worksheet containing our data set (col A & B), with formulas added and calculated. Starting values for constants in column G, rows 2 – 4.
Figure 7. Screenshot Microsoft Excel, Solver add-in available.
Figure 8. Screenshot Microsoft Excel, Solver add-in available and ready for use.
Figure 9. Screenshot Microsoft Excel solver menu.
Figure 10. Screenshot solver completed run.
20.11 – Plot a Newick tree
Figure 1. Phylogram plot of 14 taxa
Figure 2. Cladogram view, same 14 taxa.
Figure 3. Plot of tree with labeled nodes.
Figure 4. Re-rooted tree.
Figure 5. Star phylogeny
20.12 – Phylogenetically independent contrasts
Figure 1. Star phylogeny (same image shown Figure 5, 20.11 – Plot a Newick tree).
Figure 2. A cladogram for same species, showing the hierarchical, nested relationships among taxa, what nature actually provides (same image shown Figure 2, 20.11 – Plot a Newick tree).
20.13 – How to get the distances from a distance tree
Figure 1. A gene tree of the product (protein HBA1) with five species.
Figure 2. Scatterplot HBA distance by logMYA divergence time