---
title: "Correlation between numerical variables: R code for Chapter 16 examples"
author: "Michael Whitlock and Dolph Schluter"
output:
html_document:
toc: yes
toc_depth: 3
---
_Note: This document was converted to R-Markdown from [this page](http://whitlockschluter.zoology.ubc.ca/r-code/rcode16) by M. Drew LaMar. You can download the R-Markdown [here](https://qubeshub.org/collections/post/1394/download/chap16.Rmd)._
Download the R code on this page as a single file [here](http://whitlockschluter.zoology.ubc.ca/wp-content/rcode/chap16.r)
## New methods
Hover over a function argument for a short description of its meaning. The variable names are plucked from the examples further below.
**Linear correlation**:
> cor.test(booby\$futureBehavior, booby$nVisitsNestling)
**Spearman rank correlation**:
> cor.test(trick\$years, trick$impressivenessScore, method = "spearman")
## Example 16.1. [Flipping Booby](http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter16/chap16e1FlippingBird.csv)
*__Estimate a linear correlation__ between the number of non-parent adult visits experienced by boobies as chicks and the number of similar behaviors performed by the same birds when adult.*
Read and inspect the data
```{r}
booby <- read.csv(url("http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter16/chap16e1FlippingBird.csv"))
head(booby)
```
**Scatter plot.**
```{r, fig.width=4, fig.height=4}
plot(futureBehavior ~ nVisitsNestling, data = booby)
```
For a fancier scatter plot using more options (Figure 16.1-4):
```{r, fig.width=4, fig.height=4}
plot(futureBehavior ~ nVisitsNestling, data = booby, pch = 16, col = "firebrick", las = 1, bty = "l", cex = 1.2, xlab = "Events experienced as a nestling", ylab = "Future behavior")
```
**Correlation coefficient.** The `cor.test` function computes a number of useful quantities, which we save in the object `boobyCor`. The quantities can be extracted one at a time or shown all at once.
```{r}
boobyCor <- cor.test(booby$futureBehavior, booby$nVisitsNestling)
boobyCor
```
If only the estimated correlation and standard error are of interest, they can be obtained as follows. The calculation of standard error uses `nrow(booby)` to get the sample size for the correlation, but this will only be true if there are no missing values.
```{r}
r <- boobyCor$estimate
r
SE <- sqrt( (1 - r^2)/(nrow(booby) - 3) )
unname(SE)
```
**Confidence limit for a correlation coefficient.** The 95% confidence interval for the correlation is included in the output of `cor.test`. If all you want is the confidence interval, it can be extracted from the `boobyCor` calculated in an earlier step.
```{r}
boobyCor$conf.int
```
-----
## Example 16.2. [Inbreeding wolves](http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter16/chap16e2InbreedingWolves.csv)
*__Test a linear correlation__ between inbreeding coefficients of litters of mated wolf pairs and the number of pups surviving their first winter.*
Read and inspect data.
```{r}
wolf <- read.csv(url("http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter16/chap16e2InbreedingWolves.csv"))
head(wolf)
```
**Scatter plot.**
```{r, fig.width=4, fig.height=4}
plot(nPups ~ inbreedCoef, data = wolf)
```
A fancier scatter plot with more options:
```{r, fig.width=4, fig.height=4}
plot(nPups ~ inbreedCoef, data = wolf, pch = 16, col = "firebrick", las = 1, bty = "l", cex = 1.2, xlab = "Inbreeding coefficient", ylab = "Number of pups")
```
**Test of zero correlation.** The results of the test are included in the output of `cor.test`.
```{r}
cor.test(wolf$nPups, wolf$inbreedCoef)
```
-----
## Figure 16.4-1. [Stream invertebrates](http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter16/chap16f4_1StreamInvertebrates.csv)
*__Effect of the range of the data on the correlation coefficient__ between population density of (log base 10 of number of individuals per square meter) and body mass (g) of different species of stream invertebrates.*
Read and inspect the data.
```{r}
streamInvert <- read.csv(url("http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter16/chap16f4_1StreamInvertebrates.csv"))
head(streamInvert)
```
**Scatter plot.**
```{r, fig.width=4, fig.height=4}
plot(log10Density ~ log10Mass, data = streamInvert)
```
Commands to make a scatter plot of these data with more options:
```{r, fig.width=4, fig.height=4}
plot(log10Density ~ log10Mass, data = streamInvert, pch = 16, col = "firebrick", las = 1, bty = "l", cex = 1.2, xlab = "Log population density", ylab = "Log body mass")
```
**Effect of the range of the data on the correlation coefficient.** Here is the correlation coefficient for the full range of the data. The command uses `cor.test` but we extract just the correlation coefficient for this exercise.
```{r}
cor.test(streamInvert$log10Density, streamInvert$log10Mass)$estimate
```
Here is the correlation coefficient for the subset of the data corresponding to a log10Mass between 0 and 2.
```{r}
streamInvertReduced <- subset(streamInvert, log10Mass > 0 & log10Mass < 2)
cor.test(streamInvertReduced$log10Density, streamInvertReduced$log10Mass)$estimate
```
-----
## Example 16.5. [Indian rope trick](http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter16/chap16e5IndianRopeTrick.csv)
*__Spearman rank correlation__ between impressiveness score of the Indian rope trick and the number of years elapsed bewteen the witnessing of the trick and the telling of it in writing.*
Read and inspect the data.
```{r}
trick <- read.csv(url("http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter16/chap16e5IndianRopeTrick.csv"))
head(trick)
```
**Scatter plot.**
```{r, fig.width=4, fig.height=4}
plot(impressivenessScore ~ years, data = trick)
```
Commands to make a scatter plot of these data with more options:
```{r, fig.width=4, fig.height=4}
plot(impressivenessScore ~ years, data = trick, pch = 16, col = "firebrick", las = 1, bty = "l", cex = 1.2, xlab = "Years elapsed", ylab = "Impressiveness score")
```
**Test of zero Spearman rank correlation.** In this example, the variable "impressivenessScore" is a number score with lots of tied observations. Because of the ties, R will warn you that the $P$-value in the output is not exact.
```{r}
cor.test(trick$years, trick$impressivenessScore, method = "spearman")
```