--- title: "Handling violations of assumptions: R code for Chapter 13 examples" author: "Michael Whitlock and Dolph Schluter" output: html_document: toc: yes toc_depth: 3 --- _Note: This document was converted to R-Markdown from [this page](http://whitlockschluter.zoology.ubc.ca/r-code/rcode13) by M. Drew LaMar. You can download the R-Markdown [here](https://qubeshub.org/collections/post/1355/download/chap13.Rmd)._ Download the R code on this page as a single file [here](http://whitlockschluter.zoology.ubc.ca/wp-content/rcode/chap13.r) ## New methods The variable names are plucked from the examples further below. **Data transformations** > marinelnBiomassRatio <- log(marine$biomassRatio) **Quantile plots** > qqnorm(marine$biomassRatio, datax = TRUE) **Mann-Whitney $U$-test** (R uses the equivalent Wilcoxon rank-sum test) > wilcox.test(timeToMating ~ feedingStatus, data = cannibalism) **Other new methods** * Shapiro-Wilk test of normality. * Sign test. * Permutation test. ## Example 13.1. [Biomass in marine reserves](http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter13/chap13e1MarineReserve.csv) *__The normal quantile plot, Shapiro-Wilk test of normality__, and the __log transformation__, investigating the ratio of biomass between marine reserves and non-reserve control areas.* Read and inspect the data. ```{r} marine <- read.csv(url("http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter13/chap13e1MarineReserve.csv")) head(marine) ``` **Histogram** of biomass ratio (Figure 13.1-4). ```{r} hist(marine$biomassRatio, right = FALSE, col = "firebrick", las = 1) ``` **Normal quantile plot** of biomass ratio data. ```{r} qqnorm(marine$biomassRatio, datax = TRUE) ``` Commands for a normal quantile plot with more options (Figure 13.1-4): ```{r} qqnorm(marine$biomassRatio, datax = TRUE, pch = 16, col = "firebrick", las = 1, ylab = "Biomass ratio", xlab = "Normal quantile", main = "") ``` **Shapiro-Wilk test** of normality. ```{r} shapiro.test(marine$biomassRatio) ``` *__Natural log-transformation__ and resulting confidence interval of the a mean of marine biomass ratio.* **Log-transformation**. The function `log()` takes the natural logarithm of all the elements of a vector or variable. The following command puts the results into a new variable in the same data frame, `marine`. ```{r} marine$lnBiomassRatio <- log(marine$biomassRatio) ``` **Histogram of the log-transformed** marine biomass ratio (Figure 13.3-2) . ```{r} hist(marine$lnBiomassRatio, right = FALSE, col = "firebrick", las = 1) ``` **95% confidence interval of the mean** using the log-transformed data. ```{r} t.test(marine$lnBiomassRatio)$conf.int ``` **Back-transform lower and upper limits of confidence interval** (`exp` is the inverse of the log function). ```{r} exp(t.test(marine$lnBiomassRatio)$conf.int) ``` ## Example 13.4. [Sexual conflict and speciation](http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter13/chap13e4SexualConflict.csv) *__The sign test__, comparing the numbers of species in 25 pairs of closely related insect taxa.* Read and inspect the data. ```{r} conflict <- read.csv(url("http://www.zoology.ubc.ca/~schluter/WhitlockSchluter/wp-content/data/chapter13/chap13e4SexualConflict.csv")) head(conflict) ``` **Histogram** of the difference in numbers of species (Figure 13.4-1). ```{r} hist(conflict$difference, right = FALSE, col = "firebrick", breaks = 50, xlab = "Difference in species number", las = 1) ``` **Count up the frequency of differences that are below, equal to, and above zero**. The first command below creates a new variable called `zero` in the data frame and sets all elements to "equal". This is just to get started. The next two commands overwrite those elements corresponding to taxon pairs with a difference greater than zero, or less than zero, respectively. ```{r} conflict$zero <- "equal" conflict$zero[conflict$difference > 0] <- "above" conflict$zero[conflict$difference < 0] <- "below" conflict$zero <- factor(conflict$zero, levels = c("below", "equal", "above")) table(conflict$zero) ``` **Sign test**. The sign test is just a binomial test. The result includes a confidence interval for the proportion using the Clopper-Pearson method, which isn't covered in the book. ```{r} binom.test(7, n = 25, p = 0.5) ``` ## Example 13.5. [Cricket sexual cannibalism](http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter13/chap13e5SagebrushCrickets.csv) *__The Wilcoxon rank-sum test (equivalent to the Mann-Whitney $U$-test)__ comparing times to mating (in hours) of starved and fed female sagebrush crickets. We also apply the __permutation test__ to the same data.* Read and inspect data. ```{r} cannibalism <- read.csv(url("http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter13/chap13e5SagebrushCrickets.csv")) head(cannibalism) ``` **Multiple histograms** (Figure 13.5-1) using the lattice package ```{r} library(lattice) histogram( ~ timeToMating | feedingStatus, data = cannibalism, layout = c(1,2), col = "firebrick", breaks = seq(0, 100, by = 20), type = "count", xlab = "Time to mating (hours)", ylab = "Frequency") ``` Commands for multiple histograms in basic R: ```{r} oldpar = par(no.readonly = TRUE) # make backup of default graph settings par(mfrow = c(2,1), oma = c(4, 6, 2, 6), mar = c(3, 4, 3, 2)) hist(cannibalism$timeToMating[cannibalism$feedingStatus == "starved"], col = "firebrick", las = 1, breaks = seq(0, 100, by = 20), main = "starved", ylab = "Frequency") hist(cannibalism$timeToMating[cannibalism$feedingStatus == "fed"], col = "firebrick", las = 1, breaks = seq(0, 100, by = 20), main = "fed", ylab = "Frequency") mtext("Time to mating (hours)", side = 1, outer = TRUE, padj = 0) par(oldpar) # revert to default graph settings ``` **Wilcoxon rank-sum test** (equivalent to Mann-Whitney $U$-test) ```{r} wilcox.test(timeToMating ~ feedingStatus, data = cannibalism) ``` *__Permutation test__ of the difference between mean time to mating of starved and fed crickets.* Begin by calculating the **observed difference between means** (starved minus fed). The difference is -18.25734 in this data set. ```{r} cricketMeans <- tapply(cannibalism$timeToMating, cannibalism$feedingStatus, mean) cricketMeans diffMeans <- cricketMeans[2] - cricketMeans[1] diffMeans ``` Decide on the **number of permutations**. ```{r} nPerm <- 10000 ``` Create a **loop to permute the data many times** (determined by `nperm`). In the loop, `i` is just a counter that goes from 1 to `nPerm` by 1; each permuted difference is saved in the `permResult`. ```{r} permResult <- vector() # initializes for(i in 1:nPerm){ # step 1: permute the times to mating permSample <- sample(cannibalism$timeToMating, replace = FALSE) # step 2: calculate difference betweeen means permMeans <- tapply(permSample, cannibalism$feedingStatus, mean) permResult[i] <- permMeans[2] - permMeans[1] } ``` **Plot null distribution** based on the permuted differences (Figure 13.8-1). ```{r} hist(permResult, right = FALSE, breaks = 100) ``` **Use the null distribution to calculate an approximate $P$-value**. This is the twice the proportion of the permuted means that fall below the observed difference in means, `diffMeans` (-18.25734 in this example). The following code calculates the *number* of permuted means falling below `diffMeans`. ```{r} sum(as.numeric(permResult <= diffMeans)) ``` These commands obtain the *fraction* of permuted means falling below `diffMeans`. ```{r} sum(as.numeric(permResult <= diffMeans)) / nPerm ``` Finally, multiply by 2 to get the $P$-value for a two-sided test. ```{r} 2 * ( sum(as.numeric(permResult <= diffMeans)) / nPerm ) ```