---
title: 'Displaying data: R code for Chapter 2 examples'
author: "Michael Whitlock and Dolph Schluter"
date: "February 1, 2017"
output:
html_document:
toc: yes
toc_depth: 3
word_document: default
---
_Note: This document was converted to R-Markdown from [this page](http://whitlockschluter.zoology.ubc.ca/r-code/rcode02), with modifications, by M. Drew LaMar. You can download the R-Markdown [here](https://qubeshub.org/collections/post/989/download/chap02.Rmd)._
Download the R code on this page as a single file [here](http://whitlockschluter.zoology.ubc.ca/wp-content/rcode/chap02.r)
-----
## New methods
Hover over a function argument for a short description of its meaning. The variable names are plucked from the examples further below.
**Read data** from a comma-separated (.csv) file into a data frame. R will take columns having only numbers and make them numeric variables in the data frame. Columns having characters will be converted to factor variables, rather than to character variables.
> locustData <- read.csv(chap02f1_2locustSerotonin.csv)
**Inspect data in a data frame.** The `head` function shows the first few lines of the data frame to confirm the data was read properly. The `nrow` function indicates the number of cases (rows) in the data frame.
> head(locustData)
nrow(locustData)
**Basic strip chart** relating a numeric variable to a categorical variable:
> stripchart(serotoninLevel ~ treatmentTime, data = locustData, method = "jitter", vertical = TRUE)
**Frequency table or contingency table:**
> table(tigerData$activity)
**Bar graph** for count data:
> barplot(tigerTable, ylab = "Frequency", cex.names = 0.5)
**Histogram** for a numeric variable:
> hist(birdAbundanceData$abundance, right = FALSE)
**Scatter plot** showing association between two numeric variables:
> plot(sonAttractiveness ~ fatherOrnamentation, data = guppyFatherSonData)
**Other new methods:**
* Multiple-histogram plot.
* Line plot.
* Load an R package.
* Set the desired ordering of categories of a factor variable in tables and graphs.
## Plotting cheatsheet
These tables were created by M. Drew LaMar, based on ones at the end of Chapter 2 of Whitlock & Schluter's textbook.
**Frequency distributions of univariate data**
Type of data | Graphical method | R command | Example
------------ | ---------------- | --------- | -------
Categorical | Bar graph | `barplot` | [Figure 2.1-3. Education spending (processed)](#barplot1)
" | " | " | [Example 2.2A. Deaths from tigers (raw)](#barplot2)
Numerical | Histogram | `hist` | [Figure 2.2-5. Salmon body size](#histogram)
**Showing association of bivariate data**
Type of data | Graphical method | R command | Example
--------------- | ---------------------------------- | ------------ | -------
Two numerical | Scatter plot | `plot` | [Example 2.3B. Guppy father and son comparison](#scatterplot)
" | Line plot | `plot` | [Example 2.4A. Measles outbreaks](#lineplot)
" | Map | --- | ---
Two categorical | Grouped bar graph | `barplot` | [Example 2.3A. Bird malaria](#twocat)
" | Mosaic plot | `mosaicplot` | [Example 2.3A. Bird malaria](#twocat)
Mixed | Strip chart | `stripchart` | [Example 2.3C. Human hemoglobin and elevation](#mixed)
" | Box plot | `boxplot` | [Example 2.3C. Human hemoglobin and elevation](#mixed)
" | Multiple histograms | `hist` | [Example 2.3C. Human hemoglobin and elevation](#mixed)
" | Cumulative frequency distributions | --- | ---
## Figure 2.1-2. [Locust serotonin](http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter02/chap02f1_2locustSerotonin.csv)
*__Strip chart__ of serotonin levels in the central nervous system of desert locusts that were experimentally crowded for 0 (the control group), 1, and 2 hours.*
**Read the data and store in data frame** (here named locustData). The following command uses `read.csv` to grab the data from a file on the internet (on the current web site).
```{r}
locustData <- read.csv("http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter02/chap02f1_2locustSerotonin.csv")
```
**Show the first few lines of the data**, to ensure it read correctly. Determine the number of cases in the data.
```{r}
head(locustData)
nrow(locustData)
```
**Draw a stripchart** (the tilde "~" means that the first argument below is a formula, relating one variable to the other).
```{r}
stripchart(serotoninLevel ~ treatmentTime,
data = locustData,
method = "jitter",
vertical = TRUE)
```
A fancier strip chart, closer to that shown in Figure 2.1-2, by including more options.
```{r}
# Stripchart with options
par(bty = "l") # plot x and y axes only, not a complete box
stripchart(serotoninLevel ~ treatmentTime,
data = locustData,
vertical = TRUE,
method = "jitter",
pch = 16,
col = "firebrick",
cex = 1.5,
las = 1,
ylab = "Serotonin (pmoles)",
xlab = "Treatment time (hours)",
ylim = c(0, max(locustData$serotoninLevel)))
# The following command calculates the means in each treatment group (time)
meanSerotonin = tapply(locustData$serotoninLevel,
locustData$treatmentTime,
mean)
# "segments" draws draws lines to indicate the means
segments(x0 = c(1,2,3) - 0.1,
y0 = meanSerotonin,
x1 = c(1,2,3) + 0.1,
y1 = meanSerotonin, lwd = 2)
```
-----
## Figure 2.1-3. [Education spending](http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter02/chap02f1_3EducationSpending.csv)
*__A bar graph__ of education spending per student in different years in British Columbia.*
Read the data into a data frame named educationSpending, and inspect.
```{r}
educationSpending <- read.csv("http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter02/chap02f1_3EducationSpending.csv")
head(educationSpending)
```
**Draw a bar graph**. The `names` argument generates numbers between 1998 and 2004 in 1-year increments, to be used as labels along the horizontal axis.
```{r}
barplot(educationSpending$spendingPerStudent,
names.arg = educationSpending$year,
ylab = "Education spending ($ per student)")
```
A slightly fancier bar graph like that in Figure 2.1-3, which includes additional options, is shown here.
```{r}
# Bar graph with more options
barplot(educationSpending$spendingPerStudent,
names.arg = educationSpending$year,
las = 1,
col = "firebrick",
ylim = c(0,8000),
ylab = "Education spending ($ per student)")
```
-----
## Example 2.2A. [Deaths from tigers](http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter02/chap02e2aDeathsFromTigers.csv)
*__Frequency table__ and __bar graph__ showing activities of 88 people at the time they were killed by tigers near Chitwan National Park, Nepal, from 1979 to 2006.*
Read the data into data frame named `tigerData`
```{r}
tigerData <- read.csv("http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter02/chap02e2aDeathsFromTigers.csv")
head(tigerData)
```
**Generate a frequency table.** The `sort` function is included to sort the categories by their frequencies.
```{r}
tigerTable <- sort(table(tigerData$activity), decreasing = TRUE)
tigerTable
```
You can arrange the frequency table vertically.
```{r}
data.frame(Frequency = tigerTable)
```
Use the `addmargins` command to include sums in your frequency table.
```{r}
data.frame(Frequency = addmargins(tigerTable))
```
**Draw a bar graph**. The additional arguments `cex.names = 0.5` shrinks the axis labels by 50%, and `las = 2` flips the labels, so that they all fit in the window.
```{r}
barplot(tigerTable,
ylab = "Frequency",
cex.names = 0.5,
las = 2)
```
A slightly fancier bar graph of the data with more options, like that shown in Figure 2.2-1, is shown here.
```{r}
oldpar = par(no.readonly = TRUE) # stores a backup copy of current graph settings in "oldpar"
par(mar = c(8, 4, 4, 2) + 0.1) # creates more room for labels below the x-axis
barplot(tigerTable,
las = 2,
col = "firebrick",
cex.names = 0.8,
ylim = c(0,50),
xlab = "",
ylab = "Frequency (number of people)")
mtext("Activity",
side = 1,
line = 7,
cex = 0.9) # adds the text under the x-axis
par(oldpar) # reverts graph settings back to default
```
-----
## Example 2.2B. [Desert bird abundance](http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter02/chap02e2bDesertBirdAbundance.csv)
*__Frequency table__ and __histogram__ illustrating the frequency distribution of bird species abundance at Organ Pipe Cactus National Monument.*
Read and inspect the data.
```{r}
birdAbundanceData <- read.csv("http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter02/chap02e2bDesertBirdAbundance.csv")
head(birdAbundanceData)
```
**Generate a frequency table** of the numeric bird abundance variable. The option `right = FALSE` ensures that abundance value 300 (for example) is counted in the 300-400 bin rather than in the 200-300 bin.
```{r}
birdAbundanceTable <- table(cut(birdAbundanceData$abundance,
breaks = seq(0,650,by=50),
right = FALSE))
birdAbundanceTable
```
The same table oriented vertically and including the sum.
```{r}
data.frame(Frequency = addmargins(birdAbundanceTable))
```
**Draw a histogram** of bird abundances.
```{r}
hist(birdAbundanceData$abundance, right = FALSE)
```
Commands to draw a histogram with more options, such as Figure 2.2-3, are here.
```{r}
# Histogram with options
hist(birdAbundanceData$abundance,
right = FALSE,
breaks = seq(0,650,by=50),
col = "firebrick",
las = 1,
xlab = "Abundance (No. individuals)",
ylab = "Frequency (No. species)", main = "")
```
-----
## Figure 2.2-5. [Salmon body size](http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter02/chap02f2_5SalmonBodySize.csv)
*__Histograms__ of body mass of 228 female sockeye salmon sampled from Pick Creek in Alaska. The same data are plotted for three different interval widths: 0.1 kg, 0.3 kg, and 0.5 kg.*
Read the data into data frame named `salmonSizeData`
```{r}
salmonSizeData <- read.csv("http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter02/chap02f2_5SalmonBodySize.csv")
head(salmonSizeData)
```
**Histograms with different bin widths** (0.1, 0.3, and 0.5).
```{r}
hist(salmonSizeData$massKg,
right = FALSE,
breaks = seq(1,4,by=0.1),
col = "firebrick")
hist(salmonSizeData$massKg,
right = FALSE,
breaks = seq(1,4,by=0.3),
col = "firebrick")
hist(salmonSizeData$massKg,
right = FALSE,
breaks = seq(1,4,by=0.5),
col = "firebrick")
```
-----
## Example 2.3A. [Bird malaria](http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter02/chap02e3aBirdMalaria.csv)
*__Contingency table__, __grouped bar plot__, and __mosaic plot__ showing the association between egg removal treatment and incidence of malaria in female great tits.*
Read the data from the data file and inspect.
```{r}
birdMalariaData <- read.csv("http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter02/chap02e3aBirdMalaria.csv")
head(birdMalariaData)
```
Optional step: *Set the desired order of treatment categories in tables and graphs*. The `factor` command allows us to order the levels so that the category "Egg removal" comes before "Control" in tables and graphs (categories are otherwise ordered alphabetically).
```{r}
birdMalariaData$treatment <- factor(birdMalariaData$treatment, levels= c("Egg removal", "Control"))
```
**Create a contingency table**. Use `table` with the names of two categorical variables as arguments, beginning with the response variable.
```{r}
birdMalariaTable <- table(birdMalariaData$response, birdMalariaData$treatment)
birdMalariaTable
```
**Include row and column sums** in the contingency table.
```{r}
addmargins(birdMalariaTable,
margin = c(1,2),
FUN = sum,
quiet = TRUE)
```
**Draw a grouped bar graph** using the contingency table.
```{r}
barplot(as.matrix(birdMalariaTable), beside = TRUE)
```
Commands to produce a grouped bar graph with more options are shown here.
```{r}
# Grouped bar graph with options
barplot(as.matrix(birdMalariaTable),
beside = TRUE,
space = c(0.1, 0.3),
las = 1,
xlab = "Treatment",
ylab = "Relative frequency",
col = c("firebrick", "goldenrod1"),
legend.text = rownames(birdMalariaTable),
args.legend = list(x = 0.3,
y = max(birdMalariaTable),
xjust = 0))
```
**Draw a mosaic plot**. The "`t`" command flips (transposes) the table to ensure that the explanatory variable is along the horizontal axis.
```{r}
mosaicplot(t(birdMalariaTable),
col = c("firebrick", "goldenrod1"),
sub = "Treatment",
ylab = "Relative frequency",
cex.axis = 1.1,
main = "")
```
-----
## Example 2.3B. [Guppy father and son comparison](http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter02/chap02e3bGuppyFatherSonAttractiveness.csv)
*__Scatter plot__ of the relationship between the ornamentation of male guppies and the average attractiveness of their sons.*
Read the data from the file.
```{r}
guppyFatherSonData <- read.csv("http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter02/chap02e3bGuppyFatherSonAttractiveness.csv")
head(guppyFatherSonData)
```
**Draw a scatter plot** using the formula approach (with the "`~`").
```{r}
plot(sonAttractiveness ~ fatherOrnamentation, data = guppyFatherSonData)
```
See here for commands to draw a fancier scatter plot like that in Figure 2.3-3.
```{r}
# Scatter plot with options
plot(sonAttractiveness ~ fatherOrnamentation,
data = guppyFatherSonData,
las = 1,
pch = 16,
col = "firebrick",
cex = 1.5,
bty = "l",
xlab = "Father's ornamentation",
ylab = "Son's attractiveness")
```
-----
## Example 2.3C. [Human hemoglobin and elevation](http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter02/chap02e3cHumanHemoglobinElevation.csv)
*__Strip chart__, __box plot__, and __multiple histograms__ showing hemoglobin concentration in men living at high altitude in three different parts of the world and in a sea level population (USA).*
Read the data.
```{r}
hemoglobinData <- read.csv("http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter02/chap02e3cHumanHemoglobinElevation.csv")
head(hemoglobinData)
```
Obtain the sample sizes in each of the four populations
```{r}
table(hemoglobinData$population)
```
**Draw a strip chart** of the hemoglobin data.
```{r}
stripchart(hemoglobin ~ population,
data = hemoglobinData,
method = "jitter",
vertical = TRUE)
```
Commands for a fancier strip chart like that shown in Figure 2.3-4 are here.
```{r}
par(bty = "l")
stripchart(hemoglobin ~ population,
data = hemoglobinData,
vertical = TRUE,
method = "jitter",
jitter = 0.2,
pch = 1,
col = "firebrick",
las = 1,
xlab = "Male population",
ylab = "Hemoglobin concentration (g/dL)")
```
**Draw a box plot**, relating hemoglobin to population.
```{r}
boxplot(hemoglobin ~ population, data = hemoglobinData)
```
A box plot with more options, like that in Figure 2.3-4, is shown here.
```{r}
par(bty = "l")
boxplot(hemoglobin ~ population,
data = hemoglobinData,
col = "goldenrod1",
boxwex = 0.5,
whisklty = 1,
outcol = "black",
outcex = 1,
outlty = "blank",
las = 1,
xlab="Male population",
ylab = "Hemoglobin concentration (g/dL)")
```
Show the association between hemoglobin and population using the **multiple histograms** approach (Figure 2.3-5). The following commands use the `lattice` package, which must first be loaded.
```{r}
library(lattice)
histogram(~ hemoglobin | population,
data = hemoglobinData,
layout = c(1,4),
col = "firebrick",
breaks = seq(10,26,by=1))
```
Here we show commands to draw the multiple histograms in basic R, without using the `lattice` package. This approach is more tedious, but the resulting graphs are often easier to modify.
```{r}
# Multiple histograms using base graphics, plotting them one at a time.
# The "oma" option adjusts the outer margins of the whole figure
# The "mar" option tweaks the margins within each of the 4 plots.
oldpar = par(no.readonly = TRUE) # make backup of default graph settings
par(mfrow = c(4,1),
oma = c(4, 6, 2, 6),
mar = c(2, 5, 4, 2))
hist(hemoglobinData$hemoglobin[hemoglobinData$population == "Andes"],
col = "firebrick",
las = 1,
main = "Andes",
breaks = seq(10,26,by=1),
ylab = "Frequency")
hist(hemoglobinData$hemoglobin[hemoglobinData$population == "Ethiopia"],
col = "firebrick",
las = 1,
main = "Ethiopia",
breaks = seq(10,26,by=1),
ylab = "Frequency")
hist(hemoglobinData$hemoglobin[hemoglobinData$population == "Tibet"],
col = "firebrick",
las = 1,
main = "Tibet",
breaks = seq(10,26,by=1),
ylab = "Frequency")
hist(hemoglobinData$hemoglobin[hemoglobinData$population == "USA"],
col = "firebrick",
las = 1,
main = "USA",
breaks = seq(10,26,by=1),
ylab = "Frequency")
mtext("Hemoglobin concentration (g/dL)",
side = 1,
outer = TRUE,
padj = 1.5)
par(oldpar) # revert to default graph settings
```
-----
## Example 2.4A. [Measles outbreaks](http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter02/chap02e4aMeaslesOutbreaks.csv)
*__Line graph__ showing confirmed cases of measles in England and Wales from 1995 to 2011. Annual counts are quarterly.*
Read the data from file.
```{r}
measlesData <- read.csv("http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter02/chap02e4aMeaslesOutbreaks.csv")
head(measlesData)
```
**Drawing a line graph** uses the same command as a scatter plot but adds the `type = "l"` argument to draw a line graph instead.
```{r}
plot(confirmedCases ~ yearByQuarter,
data = measlesData,
type="l")
```
Commands to draw a fancier line plot (like that in Figure 2.4-1) are shown here.
```{r}
# Line plot with options
plot(confirmedCases ~ yearByQuarter,
data = measlesData,
type = "l",
las = 1,
lwd = 1,
bty = "l",
xlab = "Year",
ylab = "Number of cases",
xaxp = c(1995,2012,2012-1995))
points(confirmedCases ~ yearByQuarter,
data = measlesData,
pch = 16,
col = "firebrick",
cex = 0.7)
```