---
title: "Leaf Decomposition"
output: html_notebook
author: "John McIntosh"
---

#If tidyverse is not already in your library, install the package for it using: install.packages("tidyverse"). We will also install "ggplot2" and load it into our library to make some neat plots!

#let's read in our data from the folder, make sure the data name reads as: "LeafDecompData"
```{r}
library(tidyverse)
library(ggplot2)

LeafDecomp <- read.csv("LeafDecompData.csv")
```
#### If the LeafDecomp datatable doesn't show the first column as being read strictly as "Stream" then run the code given below. This may happen if you are using a Window's System computer. This code will rename the Stream column and drop the older stream column that was edited during the download and integration into R.

```{r}
LeafDecomp$Stream <- LeafDecomp$ï..Stream
LeafDecomp <- subset(LeafDecomp, select = -c(ï..Stream))
```


------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

#Part 1: Building Linear regressions per stream to find its Leaf Litter Decomposition Rate

#For each of the functions below, we are creating a linear regression test per stream with all of it's incompassing data. We are also log-transforming the percent_remaining to ensure that we meet tests of normality (as the author's did in their study) so that the regressions perform at their fullest.

```{r}
LeafDecomp %>% 
  filter(Stream == "CC") %>%
  lm(log(Percent_Remaining) ~ Day, data = .) %>%
  summary()

LeafDecomp %>% 
  filter(Stream == "VEN") %>%
  lm(log(Percent_Remaining) ~ Day, data = .) %>%
  summary()

LeafDecomp %>% 
  filter(Stream == "HP") %>%
  lm(log(Percent_Remaining) ~ Day, data = .) %>%
  summary()

LeafDecomp %>% 
  filter(Stream == "MONT") %>%
  lm(log(Percent_Remaining) ~ Day, data = .) %>%
  summary()

LeafDecomp %>% 
  filter(Stream == "LL") %>%
  lm(log(Percent_Remaining) ~ Day, data = .) %>%
  summary()

LeafDecomp %>% 
  filter(Stream == "MAT") %>%
  lm(log(Percent_Remaining) ~ Day, data = .) %>%
  summary()
```
#From each of these linear regressions, record the Stream Name, Slope (Absolute Value form), and standard error for each of the 6 regressions done above into your 1st spreadsheet named "Stream_Decomposition_Rates_Table".


#Now let's build a bar plot that neatly shows the decomposition rates of each stream.
```{r}
#First lets load in our 1st table
stream_rates <- read.csv("Stream_Decomposition_Rates_Table.csv")
```

#### The Stream column also may have an edited name. Run the code below if your Stream column of the "stream_rates" dataset's doesn't show "Stream"
```{r}
stream_rates$Stream <- stream_rates$ï..Stream
stream_rates <- subset(stream_rates, select = -c(ï..Stream))
```

#Using ggplot2 we will build a plot with correctly labeled axis and a title. Also we will show the standard error associated with each stream's leaf litter decomposition rate.
```{r}
ggplot(stream_rates, aes(x=Stream, y=Decomp_Rate, fill=Stream)) +
  geom_col() +
  geom_errorbar(aes(ymin=Decomp_Rate-SE, ymax=Decomp_Rate+SE, width=.2)) +
  ggtitle("Mean leaf litter decomposition rate between each stream") +
  xlab("Stream Identity") +
  ylab("Leaf Litter Decomposition Rate")
```

#Question 1: Looking at the plot and the standard error bars, what can you interpret from this plot?

#Question 2: Based on the reading, which Stream seems to have the least urbanization? Which streams seem to be heavily affected by urbanization? (hint: look at one of the factors bein measured in Table 1 that seems to be the best represent urbanization)

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

#Part2: Relating Urbanization to Leaf Litter Decomposition Rate

#For this short exercise you will need to fill in the missing column of data for the "ImpSA_DecompRate" table attached to this exercise. To do this you can copy and paste the decomposition rates from ""Stream_Decomposition_Rates_Table" from Part 1 of this exercise.

#let's load in the data table
```{r}
ImperviousArea <- read.csv("ImpSA_DecompRate_Instructor.csv")
```


#### The Stream column also may have an edited name. Run the code below if your Stream column of the ImperviousArea dataset's doesn't show "Stream"
```{r}
ImperviousArea$Stream <- ImperviousArea$ï..Stream
ImperviousArea <- subset(ImperviousArea, select = -c(ï..Stream))
```


#Now let's build a plot to visualize the data in this table
```{r}
ggplot(ImperviousArea, aes(x=Impervious_Area_Percent, y=Decomp_Rate)) +
  geom_smooth(method = lm, se=F) +
  geom_point() +
  ggtitle("Change in Decomposition rates due to Urbanization") +
  labs(x = "Impervious Surface Area (Percent)", y = "Decomposition Rate")
```

#Question 3: Look at this graph. What can you infer about this pot? How can the decomposition rate be higher in ~66% impervious surface area than where the impervious surface area is ~58%? 


#Now we will perform a linear regression to understand this trendline shown in the plot above. In this linear regression we are going to understand the relationship of the decomposition rate that we discovered in Part 1 with the amount of impervious surface area the scientists gathered while out in the field. We are hoping to see an inverse relationship as that was our hypothesis.
```{r}
ImperviousArea %>%
  lm(Decomp_Rate ~ Impervious_Area_Percent, data = .) %>%
  summary()
```

#Question 4: Is there a significant relationship between impervious surface area and leaf litter decomposition rate? Utilize the r2 and p-value along with the slope and SE. Was our hypothesis correct?