---
title: 'Analyzing proportions: R code for Chapter 7 examples'
author: "Michael Whitlock and Dolph Schluter"
output:
html_document:
toc: yes
toc_depth: 3
---
_Note: This document was converted to R-Markdown from [this page](http://whitlockschluter.zoology.ubc.ca/r-code/rcode07) by M. Drew LaMar. You can download the R-Markdown [here](https://qubeshub.org/collections/post/1132/download/chap07.Rmd)._
Download the R code on this page as a single file [here](http://whitlockschluter.zoology.ubc.ca/wp-content/rcode/chap07.r) (make sure to install the "binom" package before running).
```{r, echo=FALSE, message=FALSE}
# Install required binom package if not present
if (!require("binom")) { install.packages("binom", dependencies=T, repos="http://cran.rstudio.com/") }
```
## New methods
Hover over a function argument for a short description of its meaning. The variable names are plucked from the examples further below.
**Calculate binomial probabilities:**
> dbinom(6, size = 27, prob = 0.25)
**Binomial test:**
> binom.test(10, n = 25, p = 0.061)
**Install an R package**, the `binom` package to calculate confidence interval for a proportion.
> install.packages("binom", dependencies = TRUE)
**Agresti-Coull 95% confidence interval for the proportion** using the `binom` package.
> binom.confint(30, n = 87, method = "ac")
**Other new methods:**
Sampling distribution of a proportion by repeated sampling from a known population.
## Table 7.1-1 and Figure 7.1-1. Binomial distribution with n = 27 and p = 0.25
*__Table and histogram of binomial probabilities.__ Uses the data from Chapter 6 on the genetics of mirror-image flowers.*
**Calculate a binomial probability**, the probability of obtaining $X$ successes in n trials when trials are independent and probability of success $p$ is the same for every trial. The probability of getting exactly 6 left-handed flowers when $n = 27$ and $p = 0.25$ is
```{r}
dbinom(6, size = 27, prob = 0.25)
```
**Table of probabilities** for all possible values for the number of left-handed flowers out of 27.
```{r}
xsuccesses <- 0:27
probx <- dbinom(xsuccesses, size = 27, prob = 0.25)
probTable <- data.frame(xsuccesses, probx)
probTable
```
**Histogram of binomial probabilities** for the number of left-handed flowers out of 27. This illustrates the full binomial distribution when $n = 27$ and $p = 0.25$.
```{r, fig.width=4, fig.height=4}
barplot(height = probx, names.arg = xsuccesses, space = 0, las = 1, ylab = "Probability", xlab = "Number of left-handed flowers")
```
## Figure 7.1-2. Sampling distribution of a binomial proportion
*__Compare sampling distributions for the proportion__ based on n = 10 and n = 100.*
Take **a large number of random samples** of $n = 10$ from a population having probability of success $p = 0.25.$ Convert to proportions by dividing by the sample size. Do the same for the larger sample size $n = 100$. The following commands use 10,000 random samples.
```{r}
successes10 <- rbinom(10000, size = 10, prob = 0.25)
proportion10 <- successes10 / 10
successes100 <- rbinom(10000, size = 100, prob = 0.25)
proportion100 <- successes100 / 100
```
**Plot and visually compare the sampling distributions** of the proportions based on $n = 10$ and $n = 100$. The `par(mfrow = c(2,1))` command sets up a graph window that will plot both graphs arranges in 2 rows and 1 column.
```{r, fig.width=4, fig.height=6}
par(mfrow = c(2,1))
hist(proportion10, breaks = 10, right = FALSE, xlim = c(0,1), xlab = "Sample proportion")
hist(proportion100, breaks = 20, right = FALSE, xlim = c(0,1), xlab = "Sample proportion")
par(mfrow = c(1,1))
```
Commands for a fancier plot:
```{r, fig.width=4, fig.height=6}
oldpar <- par(no.readonly = TRUE) # make backup of default graph settings
par(mfrow = c(2,1), oma = c(4, 0, 0, 0), mar = c(1, 6, 4, 1)) # adjust margins
saveHist10 <- hist(proportion10, breaks = 10, right = FALSE, plot = FALSE)
saveHist10$counts <- saveHist10$counts/sum(saveHist10$counts)
plot(saveHist10, col = "firebrick", las = 1, cex.lab = 1.2,
ylim = c(0,0.3), xlim = c(0,1), ylab = "Relative frequency",
xlab = "", main = "")
text(x = 1, y = 0.25, labels = "n = 10", adj = 1, cex = 1.1)
saveHist100 <- hist(proportion100, breaks = 40, right = FALSE, plot = FALSE)
saveHist100$counts <- saveHist100$counts/sum(saveHist100$counts)
plot(saveHist100, col = "firebrick", las = 1, cex.lab = 1.2,
ylim = c(0,0.1), xlim = c(0,1), ylab = "Relative frequency",
xlab = "", main = "")
text(x = 1, y = 0.08, labels = "n = 100", adj = 1, cex = 1.1)
mtext("Proportion of successes", side = 1, outer = TRUE, padj = 2)
par(oldpar) # Revert to backup graph settings
```
## Example 7.3. [Sex and the X chromosome](http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter07/chap07e2SexAndX.csv)
*The __binomial test__, used to test whether spermatogenesis genes in the mouse genome occur with unusual frequency on the X chromosome.*
**Read and inspect the data.** Each row in the data file represents a different spermatogenesis gene.
```{r}
mouseGenes <- read.csv(url("http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter07/chap07e2SexAndX.csv"))
head(mouseGenes)
```
**Tabulate the number of spermatogenesis genes** on the X-chromosome and the number not on the X-chromosome.
```{r}
table(mouseGenes$onX)
```
**Calculate the binomial probabilities** of all possible outcomes under the null hypothesis (Table 7.2-1). Under the binomial distribution with $n = 25$ and $p = 0.061$, the number of successes can be any integer between 0 and 25.
```{r}
xsuccesses <- 0:25
probx <- dbinom(xsuccesses, size = 25, prob = 0.061)
data.frame(xsuccesses, probx)
```
Use these probabilities to **calculate the $P$-value** corresponding to an observed 10 spermatogenesis genes on the X chromosome. Remember to multiply the probability of 10 or more successes by 2 for the two-tailed test result.
```{r}
2 * sum(probx[xsuccesses >= 10])
```
For a faster result, try **R's built-in binomial test**. The resulting $P$-value is slightly different from our calculation. In the book, we get the two-tailed probability by multiplying the one-tailed probability by 2. As we say on page 188, computer programs may calculate the probability of extreme results at the "other" tail with a different method. The output of `binom.test` includes a confidence interval for the proportion using the Clopper-Pearson method, which is more conservative than the Agresti-Coull method.
```{r}
library(binom) # Load the binom package
binom.test(10, n = 25, p = 0.061)
```
## Example 7.2. [Radiologists' missing sons](http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter07/chap07e2SexAndX.csv)
*__Standard error and 95% confidence interval for a proportion__ using the Agresti-Coull method for the confidence interval.*
Read and inspect the data.
```{r}
radiologistKids <- read.csv(url("http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter07/chap07e3RadiologistOffspringSex.csv"))
head(radiologistKids)
```
**Frequency table** of female and male offspring number.
```{r}
table(radiologistKids$offspringSex)
```
**Calculate the estimated proportion** of offspring that are male, and the total number of radiologists.
```{r}
n <- sum(table(radiologistKids$offspringSex))
n
pHat <- 30 / n
pHat
```
**Standard error of the sample proportion.**
```{r}
sqrt( (pHat * (1 - pHat))/n )
```
**Agresti-Coull 95% confidence interval** for the population proportion.
```{r}
pPrime <- (30 + 2)/(n + 4)
pPrime
lower <- pPrime - 1.96 * sqrt( (pPrime * (1 - pPrime))/(n + 4) )
upper <- pPrime + 1.96 * sqrt( (pPrime * (1 - pPrime))/(n + 4) )
c(lower = lower, upper = upper)
```
**Agresti-Coull 95% confidence interval** for the population proportion **using the `binom` package.** To use this package you will need to install it (this needs to be done only once per computer) and load it using the `library` command (this needs to be done once per R session). The confidence interval from the `binom` package will be very slightly different from the one you calculated above because the formula we use takes a slight shortcut.
```{r}
binom.confint(30, n = 87, method = "ac")
```