---
title: "Instructor: QUBES Buffers and Managing Pool-Breeding Amphibians"
author: "Liz"
date: "Spring 2022"
output: html_document
---

### Hello Instructors! 'Knit'ing this file and viewing the html document might be easier. All answers for R related questions are found in this file as well as the instructor lesson document.

#### First we will need to load in all of the necessary packages to get this document to run properly. If you cannot get a library to load correctly, try loading them with the "install.packages" function (ex: install.packages(ggplot2)).

#### Within this chunk we will also be loading in our CSV file, make sure this is located in the correct folder so that R can find it.

```{r}
library(tidyverse)
library(ggplot2)
library(ggfortify)

BuffAmphs <- read.csv("Data-QUBESBuffAmphs.csv")
```

```{r}
#Check to make sure your data loaded correctly
summary(BuffAmphs)
```

#### What are these variable?

wetland- unique ID to the vernal pool assigned by the authors.

year- when the sample was collect.

treatment- Which treatment the vernal pool was assigned (categorical variable) 30m or 100m.

species- pool-breeding amphibian being counted SALA for spotted salamanders or FROG for wood frogs.

TOTAL.adults- all breeding adult amphibians; male and female (response variable).

MEAN.hydro - The mean hydroperiod of each vernal pool (continuous variable).

#### The code below allows us to choose what treatment will be used as the reference either 30m or 100m. This will be important when we look at the GLM output below. We will change this by making 30m our reference. By default, R chooses the reference alphanumerically, which would be 100m treatment for our data set.  

```{r}
BuffAmphs$treatment <- relevel(factor(BuffAmphs$treatment),  ref="30m")
```

#### Now, let's filter the data to include only the species we want to look at, in this case we are looking at salamanders (SALA).

```{r}
BuffAmphs %>% 
 filter(species == "SALA") ->BuffSala

#Check to make sure you did what you think you did
head(BuffSala)
```

```{r}
#Exploration of the salamander data, plot a histogram of TOTAL.adults within our filtered dataset.
BuffSala %>% 
  ggplot(aes(TOTAL.adults))+
  geom_histogram()+
  xlab("Total Salamanders") +
  ylab("Count Frequency")
```

**QUESTION: Is this data normally distributed? Why or why not?**

*No, this data does not have a bell-shaped curve which would indicate that we have a normal distribution. Instead, we see that this data is skewed to the left with smaller counts being dominate. This is not a surprise, as we are working with count data (number of salamanders). Since this data is not normally distributed, we will need to use the Poisson distribution.*

```{r}
#GLM without Poisson distribution. 
wrong <- glm(TOTAL.adults ~ MEAN.hydro*treatment, data=BuffSala)

#GLM data summary
summary(wrong)
```

```{r}
#Model diagnostics of the data, for people who like to have visuals
#want to know how to read these graphs? Go here: https://youtu.be/upJJmfSbBuQ 
autoplot(wrong)
```

#### Let's plot this data, the graph below is similar to the one found in our paper (fig. 5)

```{r}
#GGplot of Salamander- MEAN.hydro, TOTAL.adults, and treatment  
BuffSala %>% 
  ggplot(aes(x = MEAN.hydro, y = TOTAL.adults, colour = treatment)) +
  geom_point() +
  geom_smooth(method= "glm", se=TRUE)+ 
  ggtitle("Total Adult Salamanders - WRONG")+
  xlab("Mean Hydroperiod")+
  ylab("Total Breeding Adults")+ 
  theme_bw()
```

**QUESTION: Based on the graph and its corresponding R output what can we say about the data?**

*• It appears that our lines are not parallel, but to what degree? Let's do some more digging...\
• We will examine our residuals, remember we want our median to be close to 0, with the minimum, 1Q (quadrant), 3Q, and maximum to be roughly even on each side. Here we see:\
Median = 3.185 - not awful but it could surely be better\
1Q/3Q = -27.914 / 21.544 -- close but could be better\
Min / Max = -102.153 / 83.847 - could be much better\
• The standard error (std. error) column on the output and the gray bars on the graph could use a lot of work; we want these numbers to be small, larger numbers represent less certainty.\
• The 30m treatment (reference) group saw an increase at a slower rate than those in the 100m treatment group.\
• The 100m treatment group saw a quicker increase in counts.\
• Our statistical interaction is insignificant. The slopes are not different enough.*

#### Now let's check out the GLM with the Poisson distribution

```{r}
#GLM with the Poisson log transformation 
SALAPoisson <- glm(TOTAL.adults ~ MEAN.hydro*treatment, family = poisson(link = "log"), data=BuffSala)

#GLM data summary
summary(SALAPoisson)
```

```{r}
#graphical output of the data, for people who like to have visuals
#want to know how to read these graphs? Go here: https://youtu.be/upJJmfSbBuQ 
autoplot(SALAPoisson)
```

```{r}
#Lets build a better plot for our data that takes the GLM with Poisson and a log link function into account
BuffSala %>% 
ggplot(aes( x=MEAN.hydro, y=TOTAL.adults, colour = treatment))+
  geom_point()+
  stat_smooth(method= "glm", method.args = list(family="poisson"), se=T)+
  ggtitle("Total Adult Salamanders")+
  xlab("Mean Hydroperiod")+
  ylab("Total Breeding Adults")+
  theme_bw()
```

**QUESTION:Do these results seem like a better fit to the data when compared to the GLM without the Poisson distribution? What does this mean for the salamanders?**

*• We see an improvement on our p-value significance levels from the previous GLM without Poisson. All levels are significantly different.\
But of course, let's do some more digging...\
• Our residuals have also improved as well.\
Here we see:\
Median = -1.230 -- closer to 0!\
1Q/3Q = -3.616 / 2.002 - close!\
Min / Max = -11.489 / 10.425 -- close!\
• The standard error (std. error) column has improved significantly as well. Smaller numbers represent more certainty with our data.\
• For the salamanders this shows us that the 100m treatment has an effect up to a point but eventually the hydroperiod allowed the breeding adult population to increase more rapidly. An interaction has occurred!*

### ON YOUR OWN - Frog Data **CODES NOT PROVIDED TO STUDENTS BELOW THIS POINT**

Encourage your students to run these analyses with different names than what was taught for the salamander data, running these commands multiple times with the same names, and expecting different results may confuse R.

#### Like before, let's filter the data to include only the species we want to look at, now we will be looking at wood frog (FROG).

```{r}
#filter data to only include wood frog (FROG) data. Make sure you assign it to its own unique operator (above that was '-> BuffSala')
BuffAmphs %>% 
 filter(species == "FROG") ->BuffFrog

#Check to make sure you did what you think you did
head(BuffFrog)
```

```{r}
#Exploration of the frog data, plot a histogram of TOTAL.adults (total breeding adults frogs)
#Is this data normally distributed?
BuffFrog %>% 
  ggplot(aes(TOTAL.adults))+
  geom_histogram()+
  xlab("Total Frogs")+
  ylab("Count Frequency")
```

```{r}
#lets look at the GLM model, should you use the formula with or without the the Poisson distribution?
FROGPoisson <- glm(TOTAL.adults ~ MEAN.hydro*treatment, family = poisson(link = "log"), data=BuffFrog)

#summary of output
summary(FROGPoisson)
```

```{r}
#graphical output of the data, for people who like to have visuals
autoplot(FROGPoisson)
```

```{r}
#plot for our data that takes the GLM with Poisson and a log link function into account
BuffFrog %>% 
ggplot(aes( x=MEAN.hydro, y=TOTAL.adults, colour = treatment))+
  geom_point()+
  stat_smooth(method= "glm", method.args = list(family="poisson"), se=T)+
  ggtitle("Total Adult Wood Frogs")+
  xlab("Mean Hydroperiod")+
  ylab("Total Breeding Adults")+
  theme_bw()
```

*• Our residuals look pretty good.\
Here we see:\
Median = -0.215 - not bad\
1Q/3Q = -4.193/ 2.807 - good enough\
Min / Max = -10.311 / 9.154\
• The standard error (std. error) column looks good, and gray bars on the graph look good too. Smaller numbers represent more certainty with our data.\
• Here the 100m buffer is increasing faster than the 30m buffer, the difference in these slopes is significant.*