---
title: "Handling violations of assumptions: R code for Chapter 13 examples"
author: "Michael Whitlock and Dolph Schluter"
output:
html_document:
toc: yes
toc_depth: 3
---
_Note: This document was converted to R-Markdown from [this page](http://whitlockschluter.zoology.ubc.ca/r-code/rcode13) by M. Drew LaMar. You can download the R-Markdown [here](https://qubeshub.org/collections/post/1355/download/chap13.Rmd)._
Download the R code on this page as a single file [here](http://whitlockschluter.zoology.ubc.ca/wp-content/rcode/chap13.r)
## New methods
The variable names are plucked from the examples further below.
**Data transformations**
> marinelnBiomassRatio <- log(marine$biomassRatio)
**Quantile plots**
> qqnorm(marine$biomassRatio, datax = TRUE)
**Mann-Whitney $U$-test** (R uses the equivalent Wilcoxon rank-sum test)
> wilcox.test(timeToMating ~ feedingStatus, data = cannibalism)
**Other new methods**
* Shapiro-Wilk test of normality.
* Sign test.
* Permutation test.
## Example 13.1. [Biomass in marine reserves](http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter13/chap13e1MarineReserve.csv)
*__The normal quantile plot, Shapiro-Wilk test of normality__, and the __log transformation__, investigating the ratio of biomass between marine reserves and non-reserve control areas.*
Read and inspect the data.
```{r}
marine <- read.csv(url("http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter13/chap13e1MarineReserve.csv"))
head(marine)
```
**Histogram** of biomass ratio (Figure 13.1-4).
```{r}
hist(marine$biomassRatio, right = FALSE, col = "firebrick", las = 1)
```
**Normal quantile plot** of biomass ratio data.
```{r}
qqnorm(marine$biomassRatio, datax = TRUE)
```
Commands for a normal quantile plot with more options (Figure 13.1-4):
```{r}
qqnorm(marine$biomassRatio, datax = TRUE, pch = 16, col = "firebrick", las = 1, ylab = "Biomass ratio", xlab = "Normal quantile", main = "")
```
**Shapiro-Wilk test** of normality.
```{r}
shapiro.test(marine$biomassRatio)
```
*__Natural log-transformation__ and resulting confidence interval of the a mean of marine biomass ratio.*
**Log-transformation**. The function `log()` takes the natural logarithm of all the elements of a vector or variable. The following command puts the results into a new variable in the same data frame, `marine`.
```{r}
marine$lnBiomassRatio <- log(marine$biomassRatio)
```
**Histogram of the log-transformed** marine biomass ratio (Figure 13.3-2) .
```{r}
hist(marine$lnBiomassRatio, right = FALSE, col = "firebrick", las = 1)
```
**95% confidence interval of the mean** using the log-transformed data.
```{r}
t.test(marine$lnBiomassRatio)$conf.int
```
**Back-transform lower and upper limits of confidence interval** (`exp` is the inverse of the log function).
```{r}
exp(t.test(marine$lnBiomassRatio)$conf.int)
```
## Example 13.4. [Sexual conflict and speciation](http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter13/chap13e4SexualConflict.csv)
*__The sign test__, comparing the numbers of species in 25 pairs of closely related insect taxa.*
Read and inspect the data.
```{r}
conflict <- read.csv(url("http://www.zoology.ubc.ca/~schluter/WhitlockSchluter/wp-content/data/chapter13/chap13e4SexualConflict.csv"))
head(conflict)
```
**Histogram** of the difference in numbers of species (Figure 13.4-1).
```{r}
hist(conflict$difference, right = FALSE, col = "firebrick", breaks = 50, xlab = "Difference in species number", las = 1)
```
**Count up the frequency of differences that are below, equal to, and above zero**. The first command below creates a new variable called `zero` in the data frame and sets all elements to "equal". This is just to get started. The next two commands overwrite those elements corresponding to taxon pairs with a difference greater than zero, or less than zero, respectively.
```{r}
conflict$zero <- "equal"
conflict$zero[conflict$difference > 0] <- "above"
conflict$zero[conflict$difference < 0] <- "below"
conflict$zero <- factor(conflict$zero, levels = c("below", "equal", "above"))
table(conflict$zero)
```
**Sign test**. The sign test is just a binomial test. The result includes a confidence interval for the proportion using the Clopper-Pearson method, which isn't covered in the book.
```{r}
binom.test(7, n = 25, p = 0.5)
```
## Example 13.5. [Cricket sexual cannibalism](http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter13/chap13e5SagebrushCrickets.csv)
*__The Wilcoxon rank-sum test (equivalent to the Mann-Whitney $U$-test)__ comparing times to mating (in hours) of starved and fed female sagebrush crickets. We also apply the __permutation test__ to the same data.*
Read and inspect data.
```{r}
cannibalism <- read.csv(url("http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter13/chap13e5SagebrushCrickets.csv"))
head(cannibalism)
```
**Multiple histograms** (Figure 13.5-1) using the lattice package
```{r}
library(lattice)
histogram( ~ timeToMating | feedingStatus, data = cannibalism, layout = c(1,2), col = "firebrick", breaks = seq(0, 100, by = 20), type = "count", xlab = "Time to mating (hours)", ylab = "Frequency")
```
Commands for multiple histograms in basic R:
```{r}
oldpar = par(no.readonly = TRUE) # make backup of default graph settings
par(mfrow = c(2,1), oma = c(4, 6, 2, 6), mar = c(3, 4, 3, 2))
hist(cannibalism$timeToMating[cannibalism$feedingStatus == "starved"], col = "firebrick", las = 1, breaks = seq(0, 100, by = 20), main = "starved", ylab = "Frequency")
hist(cannibalism$timeToMating[cannibalism$feedingStatus == "fed"], col = "firebrick", las = 1, breaks = seq(0, 100, by = 20), main = "fed", ylab = "Frequency")
mtext("Time to mating (hours)", side = 1, outer = TRUE, padj = 0)
par(oldpar) # revert to default graph settings
```
**Wilcoxon rank-sum test** (equivalent to Mann-Whitney $U$-test)
```{r}
wilcox.test(timeToMating ~ feedingStatus, data = cannibalism)
```
*__Permutation test__ of the difference between mean time to mating of starved and fed crickets.*
Begin by calculating the **observed difference between means** (starved minus fed). The difference is -18.25734 in this data set.
```{r}
cricketMeans <- tapply(cannibalism$timeToMating, cannibalism$feedingStatus, mean)
cricketMeans
diffMeans <- cricketMeans[2] - cricketMeans[1]
diffMeans
```
Decide on the **number of permutations**.
```{r}
nPerm <- 10000
```
Create a **loop to permute the data many times** (determined by `nperm`). In the loop, `i` is just a counter that goes from 1 to `nPerm` by 1; each permuted difference is saved in the `permResult`.
```{r}
permResult <- vector() # initializes
for(i in 1:nPerm){
# step 1: permute the times to mating
permSample <- sample(cannibalism$timeToMating, replace = FALSE)
# step 2: calculate difference betweeen means
permMeans <- tapply(permSample, cannibalism$feedingStatus, mean)
permResult[i] <- permMeans[2] - permMeans[1]
}
```
**Plot null distribution** based on the permuted differences (Figure 13.8-1).
```{r}
hist(permResult, right = FALSE, breaks = 100)
```
**Use the null distribution to calculate an approximate $P$-value**. This is the twice the proportion of the permuted means that fall below the observed difference in means, `diffMeans` (-18.25734 in this example). The following code calculates the *number* of permuted means falling below `diffMeans`.
```{r}
sum(as.numeric(permResult <= diffMeans))
```
These commands obtain the *fraction* of permuted means falling below `diffMeans`.
```{r}
sum(as.numeric(permResult <= diffMeans)) / nPerm
```
Finally, multiply by 2 to get the $P$-value for a two-sided test.
```{r}
2 * ( sum(as.numeric(permResult <= diffMeans)) / nPerm )
```