--- title: "Correlation between numerical variables: R code for Chapter 16 examples" author: "Michael Whitlock and Dolph Schluter" output: html_document: toc: yes toc_depth: 3 --- _Note: This document was converted to R-Markdown from [this page](http://whitlockschluter.zoology.ubc.ca/r-code/rcode16) by M. Drew LaMar. You can download the R-Markdown [here](https://qubeshub.org/collections/post/1394/download/chap16.Rmd)._ Download the R code on this page as a single file [here](http://whitlockschluter.zoology.ubc.ca/wp-content/rcode/chap16.r) ## New methods Hover over a function argument for a short description of its meaning. The variable names are plucked from the examples further below. **Linear correlation**: > cor.test(booby\$futureBehavior, booby$nVisitsNestling) **Spearman rank correlation**: > cor.test(trick\$years, trick$impressivenessScore, method = "spearman") ## Example 16.1. [Flipping Booby](http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter16/chap16e1FlippingBird.csv) *__Estimate a linear correlation__ between the number of non-parent adult visits experienced by boobies as chicks and the number of similar behaviors performed by the same birds when adult.* Read and inspect the data ```{r} booby <- read.csv(url("http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter16/chap16e1FlippingBird.csv")) head(booby) ``` **Scatter plot.** ```{r, fig.width=4, fig.height=4} plot(futureBehavior ~ nVisitsNestling, data = booby) ``` For a fancier scatter plot using more options (Figure 16.1-4): ```{r, fig.width=4, fig.height=4} plot(futureBehavior ~ nVisitsNestling, data = booby, pch = 16, col = "firebrick", las = 1, bty = "l", cex = 1.2, xlab = "Events experienced as a nestling", ylab = "Future behavior") ``` **Correlation coefficient.** The `cor.test` function computes a number of useful quantities, which we save in the object `boobyCor`. The quantities can be extracted one at a time or shown all at once. ```{r} boobyCor <- cor.test(booby$futureBehavior, booby$nVisitsNestling) boobyCor ``` If only the estimated correlation and standard error are of interest, they can be obtained as follows. The calculation of standard error uses `nrow(booby)` to get the sample size for the correlation, but this will only be true if there are no missing values. ```{r} r <- boobyCor$estimate r SE <- sqrt( (1 - r^2)/(nrow(booby) - 3) ) unname(SE) ``` **Confidence limit for a correlation coefficient.** The 95% confidence interval for the correlation is included in the output of `cor.test`. If all you want is the confidence interval, it can be extracted from the `boobyCor` calculated in an earlier step. ```{r} boobyCor$conf.int ``` ----- ## Example 16.2. [Inbreeding wolves](http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter16/chap16e2InbreedingWolves.csv) *__Test a linear correlation__ between inbreeding coefficients of litters of mated wolf pairs and the number of pups surviving their first winter.* Read and inspect data. ```{r} wolf <- read.csv(url("http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter16/chap16e2InbreedingWolves.csv")) head(wolf) ``` **Scatter plot.** ```{r, fig.width=4, fig.height=4} plot(nPups ~ inbreedCoef, data = wolf) ``` A fancier scatter plot with more options: ```{r, fig.width=4, fig.height=4} plot(nPups ~ inbreedCoef, data = wolf, pch = 16, col = "firebrick", las = 1, bty = "l", cex = 1.2, xlab = "Inbreeding coefficient", ylab = "Number of pups") ``` **Test of zero correlation.** The results of the test are included in the output of `cor.test`. ```{r} cor.test(wolf$nPups, wolf$inbreedCoef) ``` ----- ## Figure 16.4-1. [Stream invertebrates](http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter16/chap16f4_1StreamInvertebrates.csv) *__Effect of the range of the data on the correlation coefficient__ between population density of (log base 10 of number of individuals per square meter) and body mass (g) of different species of stream invertebrates.* Read and inspect the data. ```{r} streamInvert <- read.csv(url("http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter16/chap16f4_1StreamInvertebrates.csv")) head(streamInvert) ``` **Scatter plot.** ```{r, fig.width=4, fig.height=4} plot(log10Density ~ log10Mass, data = streamInvert) ``` Commands to make a scatter plot of these data with more options: ```{r, fig.width=4, fig.height=4} plot(log10Density ~ log10Mass, data = streamInvert, pch = 16, col = "firebrick", las = 1, bty = "l", cex = 1.2, xlab = "Log population density", ylab = "Log body mass") ``` **Effect of the range of the data on the correlation coefficient.** Here is the correlation coefficient for the full range of the data. The command uses `cor.test` but we extract just the correlation coefficient for this exercise. ```{r} cor.test(streamInvert$log10Density, streamInvert$log10Mass)$estimate ``` Here is the correlation coefficient for the subset of the data corresponding to a log10Mass between 0 and 2. ```{r} streamInvertReduced <- subset(streamInvert, log10Mass > 0 & log10Mass < 2) cor.test(streamInvertReduced$log10Density, streamInvertReduced$log10Mass)$estimate ``` ----- ## Example 16.5. [Indian rope trick](http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter16/chap16e5IndianRopeTrick.csv) *__Spearman rank correlation__ between impressiveness score of the Indian rope trick and the number of years elapsed bewteen the witnessing of the trick and the telling of it in writing.* Read and inspect the data. ```{r} trick <- read.csv(url("http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter16/chap16e5IndianRopeTrick.csv")) head(trick) ``` **Scatter plot.** ```{r, fig.width=4, fig.height=4} plot(impressivenessScore ~ years, data = trick) ``` Commands to make a scatter plot of these data with more options: ```{r, fig.width=4, fig.height=4} plot(impressivenessScore ~ years, data = trick, pch = 16, col = "firebrick", las = 1, bty = "l", cex = 1.2, xlab = "Years elapsed", ylab = "Impressiveness score") ``` **Test of zero Spearman rank correlation.** In this example, the variable "impressivenessScore" is a number score with lots of tied observations. Because of the ties, R will warn you that the $P$-value in the output is not exact. ```{r} cor.test(trick$years, trick$impressivenessScore, method = "spearman") ```