---
title: "The normal distribution: R code for Chapter 10 examples"
author: "Michael Whitlock and Dolph Schluter"
output:
html_document:
toc: yes
toc_depth: 3
---
_Note: This document was converted to R-Markdown from [this page](http://whitlockschluter.zoology.ubc.ca/r-code/rcode10) by M. Drew LaMar. You can download the R-Markdown [here](https://qubeshub.org/collections/post/1250/download/chap10.Rmd)._
Download the R code on this page as a single file [here](http://whitlockschluter.zoology.ubc.ca/wp-content/rcode/chap10.r)
## New methods
Hover over a function argument for a short description of its meaning. The variable names are plucked from the examples further below.
**Probabilities under the normal curve:**
> pnorm(157.5, mean = 177.6, sd = 9.7)
**Other new methods:**
Normal approximation to a binomial distribution.
## Example 10.4. One small step for man?
**Calculate probabilities under the normal curve.** The command `pnorm(Y)` gives the probability of obtaining a value *__less than__* $Y$ under the normal distribution. The arguments `mean` and `sd` give the mean and standard deviate of the desired normal distribution.
Pr[Height < 157.5]
```{r}
pnorm(157.5, mean = 177.6, sd = 9.7)
```
Pr[Height > 190.54]
```{r}
1 - pnorm(190.54, mean = 177.6, sd = 9.7)
```
Pr[Height < 157.5 or Height > 190.54]
```{r}
pnorm(157.5, mean = 177.6, sd = 9.7) + 1 - pnorm(190.54, mean = 177.6, sd = 9.7)
```
## Figure 10.6. [Ages at death during the Spanish flu](http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter10/chap10e6AgesAtDeathSpanishFlu1918.csv)
*__Demonstration of the central limit theorem__, using the distribution of sample mean age at death in samples from a highly non-normal distribution: the frequency distribution of age at death in Switzerland in 1918 during the Spanish flu epidemic.*
Read and inspect the data.
```{r}
flu <- read.csv(url("http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter10/chap10e6AgesAtDeathSpanishFlu1918.csv"))
head(flu)
```
**Histogram showing the frequency distribution** of ages at death in Switzerland in 1918 during the Spanish flu epidemic.
```{r, fig.width=4, fig.height=4}
hist(flu$age, right = FALSE)
```
Commands for a histogram with more options:
```{r, fig.width=4, fig.height=4}
hist(flu$age, right = FALSE, breaks = seq(0,102,2), col = "firebrick", las = 1, xlab = "Age at death (yrs)", ylab = "Frequency", main = "")
```
Demonstrate the **central limit theorem.** Treat the age at death measurements from Switzerland in 1918 as the population. Take a large number of random samples, each of size $n$, from the population of age at death measurements and plot the sample means. Note: your results won't be the identical to the one in Figure 10.6-2, because 10,000 random samples is not large enough for extreme accuracy. Change $n$ below to another number and rerun to see the effects of sample size on the shape of the distribution of sample means.
```{r}
n <- 4
results <- vector()
for(i in 1:10000) {
AgeSample <- sample(flu$age, size = n, replace = FALSE)
results[i] <- mean(AgeSample)
}
```
**Histogram of the sample means**, with options.
```{r, fig.width=4, fig.height=4}
hist(results, right = FALSE, breaks = 50, col = "firebrick", las = 1, xlab = "Mean age at death (yrs)", ylab = "Frequency", main = "")
```
## Example 10.7. The only good bug is a dead bug
*__Normal approximation to the binomial distribution__ applied to the brown recluse spider example.
The $P$-value from the binomial test is $P = 2 Pr[X \geq 31]$, which is the same as $2 (1 - Pr[X < 30])$, since $Pr[X \geq 31] = 1 - Pr[X < 30]$. We can use the normal approximation as follows. Remember that $n = 41$ and $p = 0.5$.
```{r}
spiderProb <- 1 - pnorm( (30 + 1/2 - 41 * 0.5) / sqrt(41 * 0.50 * 0.5))
Pvalue <- 2 * spiderProb
Pvalue
```
Compare with the result obtained when using the binomial distribution, `dbinom`, which we encountered in the Chapter 7 R page.
```{r}
2 * sum( dbinom(31:41, size = 41, prob = 0.5) )
```
Or use `pbinom`.
```{r}
2 * (1 - pbinom(30, size = 41, prob = 0.5))
```