--- title: 'Displaying data: R code for Chapter 2 examples' author: "Michael Whitlock and Dolph Schluter" date: "February 1, 2017" output: html_document: toc: yes toc_depth: 3 word_document: default --- _Note: This document was converted to R-Markdown from [this page](http://whitlockschluter.zoology.ubc.ca/r-code/rcode02), with modifications, by M. Drew LaMar. You can download the R-Markdown [here](https://qubeshub.org/collections/post/989/download/chap02.Rmd)._ Download the R code on this page as a single file [here](http://whitlockschluter.zoology.ubc.ca/wp-content/rcode/chap02.r) ----- ## New methods Hover over a function argument for a short description of its meaning. The variable names are plucked from the examples further below. **Read data** from a comma-separated (.csv) file into a data frame. R will take columns having only numbers and make them numeric variables in the data frame. Columns having characters will be converted to factor variables, rather than to character variables. > locustData <- read.csv(chap02f1_2locustSerotonin.csv) **Inspect data in a data frame.** The `head` function shows the first few lines of the data frame to confirm the data was read properly. The `nrow` function indicates the number of cases (rows) in the data frame. > head(locustData)
nrow(locustData) **Basic strip chart** relating a numeric variable to a categorical variable: > stripchart(serotoninLevel ~ treatmentTime, data = locustData, method = "jitter", vertical = TRUE) **Frequency table or contingency table:** > table(tigerData$activity) **Bar graph** for count data: > barplot(tigerTable, ylab = "Frequency", cex.names = 0.5) **Histogram** for a numeric variable: > hist(birdAbundanceData$abundance, right = FALSE) **Scatter plot** showing association between two numeric variables: > plot(sonAttractiveness ~ fatherOrnamentation, data = guppyFatherSonData) **Other new methods:** * Multiple-histogram plot. * Line plot. * Load an R package. * Set the desired ordering of categories of a factor variable in tables and graphs. ## Plotting cheatsheet These tables were created by M. Drew LaMar, based on ones at the end of Chapter 2 of Whitlock & Schluter's textbook. **Frequency distributions of univariate data** Type of data | Graphical method | R command | Example ------------ | ---------------- | --------- | ------- Categorical | Bar graph | `barplot` | [Figure 2.1-3. Education spending (processed)](#barplot1) " | " | " | [Example 2.2A. Deaths from tigers (raw)](#barplot2) Numerical | Histogram | `hist` | [Figure 2.2-5. Salmon body size](#histogram) **Showing association of bivariate data** Type of data | Graphical method | R command | Example --------------- | ---------------------------------- | ------------ | ------- Two numerical | Scatter plot | `plot` | [Example 2.3B. Guppy father and son comparison](#scatterplot) " | Line plot | `plot` | [Example 2.4A. Measles outbreaks](#lineplot) " | Map | --- | --- Two categorical | Grouped bar graph | `barplot` | [Example 2.3A. Bird malaria](#twocat) " | Mosaic plot | `mosaicplot` | [Example 2.3A. Bird malaria](#twocat) Mixed | Strip chart | `stripchart` | [Example 2.3C. Human hemoglobin and elevation](#mixed) " | Box plot | `boxplot` | [Example 2.3C. Human hemoglobin and elevation](#mixed) " | Multiple histograms | `hist` | [Example 2.3C. Human hemoglobin and elevation](#mixed) " | Cumulative frequency distributions | --- | --- ## Figure 2.1-2. [Locust serotonin](http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter02/chap02f1_2locustSerotonin.csv) *__Strip chart__ of serotonin levels in the central nervous system of desert locusts that were experimentally crowded for 0 (the control group), 1, and 2 hours.* **Read the data and store in data frame** (here named locustData). The following command uses `read.csv` to grab the data from a file on the internet (on the current web site). ```{r} locustData <- read.csv("http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter02/chap02f1_2locustSerotonin.csv") ``` **Show the first few lines of the data**, to ensure it read correctly. Determine the number of cases in the data. ```{r} head(locustData) nrow(locustData) ``` **Draw a stripchart** (the tilde "~" means that the first argument below is a formula, relating one variable to the other). ```{r} stripchart(serotoninLevel ~ treatmentTime, data = locustData, method = "jitter", vertical = TRUE) ``` A fancier strip chart, closer to that shown in Figure 2.1-2, by including more options. ```{r} # Stripchart with options par(bty = "l") # plot x and y axes only, not a complete box stripchart(serotoninLevel ~ treatmentTime, data = locustData, vertical = TRUE, method = "jitter", pch = 16, col = "firebrick", cex = 1.5, las = 1, ylab = "Serotonin (pmoles)", xlab = "Treatment time (hours)", ylim = c(0, max(locustData$serotoninLevel))) # The following command calculates the means in each treatment group (time) meanSerotonin = tapply(locustData$serotoninLevel, locustData$treatmentTime, mean) # "segments" draws draws lines to indicate the means segments(x0 = c(1,2,3) - 0.1, y0 = meanSerotonin, x1 = c(1,2,3) + 0.1, y1 = meanSerotonin, lwd = 2) ``` ----- ## Figure 2.1-3. [Education spending](http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter02/chap02f1_3EducationSpending.csv) *__A bar graph__ of education spending per student in different years in British Columbia.* Read the data into a data frame named educationSpending, and inspect. ```{r} educationSpending <- read.csv("http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter02/chap02f1_3EducationSpending.csv") head(educationSpending) ``` **Draw a bar graph**. The `names` argument generates numbers between 1998 and 2004 in 1-year increments, to be used as labels along the horizontal axis. ```{r} barplot(educationSpending$spendingPerStudent, names.arg = educationSpending$year, ylab = "Education spending ($ per student)") ``` A slightly fancier bar graph like that in Figure 2.1-3, which includes additional options, is shown here. ```{r} # Bar graph with more options barplot(educationSpending$spendingPerStudent, names.arg = educationSpending$year, las = 1, col = "firebrick", ylim = c(0,8000), ylab = "Education spending ($ per student)") ``` ----- ## Example 2.2A. [Deaths from tigers](http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter02/chap02e2aDeathsFromTigers.csv) *__Frequency table__ and __bar graph__ showing activities of 88 people at the time they were killed by tigers near Chitwan National Park, Nepal, from 1979 to 2006.* Read the data into data frame named `tigerData` ```{r} tigerData <- read.csv("http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter02/chap02e2aDeathsFromTigers.csv") head(tigerData) ``` **Generate a frequency table.** The `sort` function is included to sort the categories by their frequencies. ```{r} tigerTable <- sort(table(tigerData$activity), decreasing = TRUE) tigerTable ``` You can arrange the frequency table vertically. ```{r} data.frame(Frequency = tigerTable) ``` Use the `addmargins` command to include sums in your frequency table. ```{r} data.frame(Frequency = addmargins(tigerTable)) ``` **Draw a bar graph**. The additional arguments `cex.names = 0.5` shrinks the axis labels by 50%, and `las = 2` flips the labels, so that they all fit in the window. ```{r} barplot(tigerTable, ylab = "Frequency", cex.names = 0.5, las = 2) ``` A slightly fancier bar graph of the data with more options, like that shown in Figure 2.2-1, is shown here. ```{r} oldpar = par(no.readonly = TRUE) # stores a backup copy of current graph settings in "oldpar" par(mar = c(8, 4, 4, 2) + 0.1) # creates more room for labels below the x-axis barplot(tigerTable, las = 2, col = "firebrick", cex.names = 0.8, ylim = c(0,50), xlab = "", ylab = "Frequency (number of people)") mtext("Activity", side = 1, line = 7, cex = 0.9) # adds the text under the x-axis par(oldpar) # reverts graph settings back to default ``` ----- ## Example 2.2B. [Desert bird abundance](http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter02/chap02e2bDesertBirdAbundance.csv) *__Frequency table__ and __histogram__ illustrating the frequency distribution of bird species abundance at Organ Pipe Cactus National Monument.* Read and inspect the data. ```{r} birdAbundanceData <- read.csv("http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter02/chap02e2bDesertBirdAbundance.csv") head(birdAbundanceData) ``` **Generate a frequency table** of the numeric bird abundance variable. The option `right = FALSE` ensures that abundance value 300 (for example) is counted in the 300-400 bin rather than in the 200-300 bin. ```{r} birdAbundanceTable <- table(cut(birdAbundanceData$abundance, breaks = seq(0,650,by=50), right = FALSE)) birdAbundanceTable ``` The same table oriented vertically and including the sum. ```{r} data.frame(Frequency = addmargins(birdAbundanceTable)) ``` **Draw a histogram** of bird abundances. ```{r} hist(birdAbundanceData$abundance, right = FALSE) ``` Commands to draw a histogram with more options, such as Figure 2.2-3, are here. ```{r} # Histogram with options hist(birdAbundanceData$abundance, right = FALSE, breaks = seq(0,650,by=50), col = "firebrick", las = 1, xlab = "Abundance (No. individuals)", ylab = "Frequency (No. species)", main = "") ``` ----- ## Figure 2.2-5. [Salmon body size](http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter02/chap02f2_5SalmonBodySize.csv) *__Histograms__ of body mass of 228 female sockeye salmon sampled from Pick Creek in Alaska. The same data are plotted for three different interval widths: 0.1 kg, 0.3 kg, and 0.5 kg.* Read the data into data frame named `salmonSizeData` ```{r} salmonSizeData <- read.csv("http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter02/chap02f2_5SalmonBodySize.csv") head(salmonSizeData) ``` **Histograms with different bin widths** (0.1, 0.3, and 0.5). ```{r} hist(salmonSizeData$massKg, right = FALSE, breaks = seq(1,4,by=0.1), col = "firebrick") hist(salmonSizeData$massKg, right = FALSE, breaks = seq(1,4,by=0.3), col = "firebrick") hist(salmonSizeData$massKg, right = FALSE, breaks = seq(1,4,by=0.5), col = "firebrick") ``` ----- ## Example 2.3A. [Bird malaria](http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter02/chap02e3aBirdMalaria.csv) *__Contingency table__, __grouped bar plot__, and __mosaic plot__ showing the association between egg removal treatment and incidence of malaria in female great tits.* Read the data from the data file and inspect. ```{r} birdMalariaData <- read.csv("http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter02/chap02e3aBirdMalaria.csv") head(birdMalariaData) ``` Optional step: *Set the desired order of treatment categories in tables and graphs*. The `factor` command allows us to order the levels so that the category "Egg removal" comes before "Control" in tables and graphs (categories are otherwise ordered alphabetically). ```{r} birdMalariaData$treatment <- factor(birdMalariaData$treatment, levels= c("Egg removal", "Control")) ``` **Create a contingency table**. Use `table` with the names of two categorical variables as arguments, beginning with the response variable. ```{r} birdMalariaTable <- table(birdMalariaData$response, birdMalariaData$treatment) birdMalariaTable ``` **Include row and column sums** in the contingency table. ```{r} addmargins(birdMalariaTable, margin = c(1,2), FUN = sum, quiet = TRUE) ``` **Draw a grouped bar graph** using the contingency table. ```{r} barplot(as.matrix(birdMalariaTable), beside = TRUE) ``` Commands to produce a grouped bar graph with more options are shown here. ```{r} # Grouped bar graph with options barplot(as.matrix(birdMalariaTable), beside = TRUE, space = c(0.1, 0.3), las = 1, xlab = "Treatment", ylab = "Relative frequency", col = c("firebrick", "goldenrod1"), legend.text = rownames(birdMalariaTable), args.legend = list(x = 0.3, y = max(birdMalariaTable), xjust = 0)) ``` **Draw a mosaic plot**. The "`t`" command flips (transposes) the table to ensure that the explanatory variable is along the horizontal axis. ```{r} mosaicplot(t(birdMalariaTable), col = c("firebrick", "goldenrod1"), sub = "Treatment", ylab = "Relative frequency", cex.axis = 1.1, main = "") ``` ----- ## Example 2.3B. [Guppy father and son comparison](http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter02/chap02e3bGuppyFatherSonAttractiveness.csv) *__Scatter plot__ of the relationship between the ornamentation of male guppies and the average attractiveness of their sons.* Read the data from the file. ```{r} guppyFatherSonData <- read.csv("http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter02/chap02e3bGuppyFatherSonAttractiveness.csv") head(guppyFatherSonData) ``` **Draw a scatter plot** using the formula approach (with the "`~`"). ```{r} plot(sonAttractiveness ~ fatherOrnamentation, data = guppyFatherSonData) ``` See here for commands to draw a fancier scatter plot like that in Figure 2.3-3. ```{r} # Scatter plot with options plot(sonAttractiveness ~ fatherOrnamentation, data = guppyFatherSonData, las = 1, pch = 16, col = "firebrick", cex = 1.5, bty = "l", xlab = "Father's ornamentation", ylab = "Son's attractiveness") ``` ----- ## Example 2.3C. [Human hemoglobin and elevation](http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter02/chap02e3cHumanHemoglobinElevation.csv) *__Strip chart__, __box plot__, and __multiple histograms__ showing hemoglobin concentration in men living at high altitude in three different parts of the world and in a sea level population (USA).* Read the data. ```{r} hemoglobinData <- read.csv("http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter02/chap02e3cHumanHemoglobinElevation.csv") head(hemoglobinData) ``` Obtain the sample sizes in each of the four populations ```{r} table(hemoglobinData$population) ``` **Draw a strip chart** of the hemoglobin data. ```{r} stripchart(hemoglobin ~ population, data = hemoglobinData, method = "jitter", vertical = TRUE) ``` Commands for a fancier strip chart like that shown in Figure 2.3-4 are here. ```{r} par(bty = "l") stripchart(hemoglobin ~ population, data = hemoglobinData, vertical = TRUE, method = "jitter", jitter = 0.2, pch = 1, col = "firebrick", las = 1, xlab = "Male population", ylab = "Hemoglobin concentration (g/dL)") ``` **Draw a box plot**, relating hemoglobin to population. ```{r} boxplot(hemoglobin ~ population, data = hemoglobinData) ``` A box plot with more options, like that in Figure 2.3-4, is shown here. ```{r} par(bty = "l") boxplot(hemoglobin ~ population, data = hemoglobinData, col = "goldenrod1", boxwex = 0.5, whisklty = 1, outcol = "black", outcex = 1, outlty = "blank", las = 1, xlab="Male population", ylab = "Hemoglobin concentration (g/dL)") ``` Show the association between hemoglobin and population using the **multiple histograms** approach (Figure 2.3-5). The following commands use the `lattice` package, which must first be loaded. ```{r} library(lattice) histogram(~ hemoglobin | population, data = hemoglobinData, layout = c(1,4), col = "firebrick", breaks = seq(10,26,by=1)) ``` Here we show commands to draw the multiple histograms in basic R, without using the `lattice` package. This approach is more tedious, but the resulting graphs are often easier to modify. ```{r} # Multiple histograms using base graphics, plotting them one at a time. # The "oma" option adjusts the outer margins of the whole figure # The "mar" option tweaks the margins within each of the 4 plots. oldpar = par(no.readonly = TRUE) # make backup of default graph settings par(mfrow = c(4,1), oma = c(4, 6, 2, 6), mar = c(2, 5, 4, 2)) hist(hemoglobinData$hemoglobin[hemoglobinData$population == "Andes"], col = "firebrick", las = 1, main = "Andes", breaks = seq(10,26,by=1), ylab = "Frequency") hist(hemoglobinData$hemoglobin[hemoglobinData$population == "Ethiopia"], col = "firebrick", las = 1, main = "Ethiopia", breaks = seq(10,26,by=1), ylab = "Frequency") hist(hemoglobinData$hemoglobin[hemoglobinData$population == "Tibet"], col = "firebrick", las = 1, main = "Tibet", breaks = seq(10,26,by=1), ylab = "Frequency") hist(hemoglobinData$hemoglobin[hemoglobinData$population == "USA"], col = "firebrick", las = 1, main = "USA", breaks = seq(10,26,by=1), ylab = "Frequency") mtext("Hemoglobin concentration (g/dL)", side = 1, outer = TRUE, padj = 1.5) par(oldpar) # revert to default graph settings ``` ----- ## Example 2.4A. [Measles outbreaks](http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter02/chap02e4aMeaslesOutbreaks.csv) *__Line graph__ showing confirmed cases of measles in England and Wales from 1995 to 2011. Annual counts are quarterly.* Read the data from file. ```{r} measlesData <- read.csv("http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter02/chap02e4aMeaslesOutbreaks.csv") head(measlesData) ``` **Drawing a line graph** uses the same command as a scatter plot but adds the `type = "l"` argument to draw a line graph instead. ```{r} plot(confirmedCases ~ yearByQuarter, data = measlesData, type="l") ``` Commands to draw a fancier line plot (like that in Figure 2.4-1) are shown here. ```{r} # Line plot with options plot(confirmedCases ~ yearByQuarter, data = measlesData, type = "l", las = 1, lwd = 1, bty = "l", xlab = "Year", ylab = "Number of cases", xaxp = c(1995,2012,2012-1995)) points(confirmedCases ~ yearByQuarter, data = measlesData, pch = 16, col = "firebrick", cex = 0.7) ```