---
title: "Striped Bass YOY Excercise"
author: "Alex McCrickard"
date: "February 27, 2019"
output: word_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
##Lets investigate the Maryland Striped Bass Young of Year (YOY) survey data. For this excercise we will utilize the YOY survey data from the Upper Chesapeake Bay. The Upper Chesapeake Bay including the Susquehanna Flats are the largest spawning grounds in the entire bay. Using least squares linear regression, we will see if there is a linear relationship between YOY Index and year based on three time intervals. Imagaine that you are a fisheries biologist, working for the Maryland Department of Natural Resources (MDNR), trying to infer overall striped bass population conditions based on YOY surveys which tend to shed light on spawning success. As a biologist, its essential that you can utilize your data to make informed management decisions for the future of the fishery. Lets jump in and take a look at one major piece of data that influenced the Striped Bass moratorium in Maryland (1985-1989).
###############################################################################################################
#Part 1: Pre-Moratorium
###############################################################################################################
#loading in the Pre-Moratorium YOY Data Set
```{r}
library(readr)
GM_Pre_Moratorium_YOY <- read_csv("GM_Pre_Moratorium_YOY.csv")
View(GM_Pre_Moratorium_YOY)
#Take a look at the data set and take note of column headings and titles.
```
#Lets visualize the data. Its important to test for normality in the data before running a linear regression. We will conduct a shapiro-wilke normality test.
```{r}
hist(GM_Pre_Moratorium_YOY$GeomeanIndex, main = NULL, xlab = "Geometric Mean Index for YOY")
shapiro.test(GM_Pre_Moratorium_YOY$GeomeanIndex)
#remember if the p value is less than 0.05 then the probability that the data is distributed normally is very low.
```
#We can attempt to transform the data to see if normality is improved
```{r}
#Lets log transform the data to see how it behaves
#Sometimes, a log transformation can improve normality when performing parametric statistics and visualizing linear relationships.
hist(log(GM_Pre_Moratorium_YOY$GeomeanIndex, 10), main = NULL, xlab="Logged Geometric Mean YOY")
```
#Question 1. Do you think the data appears to be more normally distributed? What issues may have arised wit the YOY data after log transforming?
```{r}
#If negative values are present in your data after a log transformation, you can try to complete a log transformation + 1.
hist(log(GM_Pre_Moratorium_YOY$GeomeanIndex + 1, 10), main = NULL, xlab="Log + 1 Geometric Mean YOY")
shapiro.test(log(GM_Pre_Moratorium_YOY$GeomeanIndex + 1, 10))
```
#Question 2. Based upon the histogram and Shapiro-Wilkes normality test, did the log + 1 transformation improve the data? What can you infer about the data based on the p-value?
```{r}
#Now that our data are conforming to assumptions of normality, we can continue with a parametric approach to analyze the striped bass population decline. Run the code below to perform a linear regression on the log + 1 transformation of the geometric mean index for YOY striped bass (Dependent Variable) as a function of Year (Independent Variable). Review the regression diagnositcs by looking at the Residuals vs. Fitted plot and QQ plot.
#Linear Regression Model
fit1 <- lm(log(GeomeanIndex + 1, 10) ~ Year, data = GM_Pre_Moratorium_YOY)
fit1
plot(log(GeomeanIndex + 1, 10) ~ Year, data = GM_Pre_Moratorium_YOY,
xlab = "Year", ylab="Logged Geometric Mean Index")
abline(a=fit1$coefficients[1], b=fit1$coefficients[2], col="red")
# Model Diagnostics
model1 <- lm(log(GM_Pre_Moratorium_YOY$GeomeanIndex + 1, 10) ~ GM_Pre_Moratorium_YOY$Year)
plot(model1)
summary(model1)
#We review these diagnostic plots to interpret the success of our linear model. For this exercise, we are just going to focus on the first two plots. For the residulas vs. fitted plot you ideally want your resiudals evenly scattered above and below the horizontal fitted red line. Pay close attention to outliers and patterns in the residuals.
#For the Normal Q-Q plot, the residuals are plotted against theoretical quanitles of a perfectly normal distribution. In a perfect world, you would want your residuals to follow the normality line perfectly.
```
#Question 3. Based upon the linear regression above, is the model statistically significant? Don't forget to utilize the p value and Adjusted R-Squared value in your explanation. Based on what you know about YOY as indicators of population stability, do you think the data raises concerns about the future of striped bass? As a fisheries biologist, what conclusions would you draw from this plot and how would these conclusions influence management decisions? (Hint: make sure to include trends with model summary)
###############################################################################################################
#Part 2 - Moratorium
###############################################################################################################
#Loading in Moratorium Data Set
```{r}
library(readr)
GM_MoratoriumYears_YOY <- read_csv("GM_MoratoriumYears_YOY.csv")
View(GM_MoratoriumYears_YOY)
```
```{r}
#Again, its important to make sure your data are normal before conducting a parametric model.
#Lets visualize unlogged
hist(GM_MoratoriumYears_YOY$GeomeanIndex, main = NULL, xlab = "Geometric Mean YOY")
shapiro.test(GM_MoratoriumYears_YOY$GeomeanIndex)
```
```{r}
#Lets visualize logged
hist(log(GM_MoratoriumYears_YOY$GeomeanIndex + 1, 10), main=NULL, xlab="Log + 1 Geometric Mean YOY")
shapiro.test(log(GM_MoratoriumYears_YOY$GeomeanIndex + 1, 10))
```
#Question 4. Did the log transformation improve the data? Utilize the p-value in your answer.
#Running a linear regression for the 5 moratorium years
```{r}
#Linear Regression Model
#Now that our data are conforming to assumptions of normality, we can continue with a parametric approach to analyze the striped bass population over the moratorium years. Run the code below to perform a linear regression on the log + 1 transformation of the geometric mean index for YOY striped bass (Dependent Variable) as a function of Year (Independent Variable). Review the regression diagnositcs by looking at the Residuals vs. Fitted plot and the QQ Norm.
fit2 <- lm(log(GeomeanIndex + 1, 10) ~ Year, data = GM_MoratoriumYears_YOY)
fit2
plot(log(GeomeanIndex + 1, 10) ~ Year, data = GM_MoratoriumYears_YOY,
xlab = "Year", ylab="Logged Geometric Mean Index")
abline(a=fit2$coefficients[1], b=fit2$coefficients[2], col="red")
# Model Diagnostics
model2 <- lm(log(GM_MoratoriumYears_YOY$GeomeanIndex + 1, 10) ~ GM_MoratoriumYears_YOY$Year)
plot(model2)
summary(model2)
```
#Question 5: Based upon the linear regression above, is the model statistically significant? Don't forget to utilize the p value and Adjusted R-Squared value in your explanation. Based on what you now know about striped bass and YOY as indicators of population stability, what is your interpretation of the data? Why was there an increase in striped bass YOY in 1989? As a fisheries biologist, what conclusions would you draw from this plot and how would these conclusions influence management decisions? (Keep in mind, fisheries decisions were based on the data from these 5 moratorium years)...
###############################################################################################################
#Part 3 - Post Moratorium
###############################################################################################################
#Loading in the Post Moratorium data set
```{r}
library(readr)
GM_Post_Moratorium_YOY <- read_csv("GM_Post_Moratorium_YOY.csv")
View(GM_Post_Moratorium_YOY)
```
```{r}
#Again, its important to make sure your data are normal before conducting a parametric model.
#Visualize unlogged
hist(GM_Post_Moratorium_YOY$GeomeanIndex, main = NULL, xlab = "Geometric Mean YOY")
shapiro.test(GM_Post_Moratorium_YOY$GeomeanIndex)
```
```{r}
#Visualize logged
hist(log(GM_Post_Moratorium_YOY$GeomeanIndex + 1, 10), main = NULL, xlab="Log + 1 Geometric Mean YOY")
shapiro.test(log(GM_Post_Moratorium_YOY$GeomeanIndex + 1, 10))
#normality impoved
```
#Question 6. Did the log transformation improve the data? Utilize the p-values in your answer.
#Now we will run a linear regression for Post-Moratorium years...
```{r}
#Linear Regression Model
#Now that our data are conforming to assumptions of normality, we can continue with a parametric approach to analyze the striped bass population post moratorium to present day. Run the code below to perform a linear regression on the log + 1 transformation of the geometric mean index for YOY striped bass (Dependent Variable) as a function of Year (Independent Variable). Review the regression diagnositcs by looking at the Residuals vs. Fitted plot.
fit3 <- lm(log(GeomeanIndex + 1, 10) ~ Year, data = GM_Post_Moratorium_YOY)
fit3
plot(log(GeomeanIndex + 1, 10) ~ Year, data = GM_Post_Moratorium_YOY,
xlab = "Year", ylab="Logged Geometric Mean Index")
abline(a=fit3$coefficients[1], b=fit3$coefficients[2], col="red")
# Model Diagnostics
model3 <- lm(log(GM_Post_Moratorium_YOY$GeomeanIndex + 1, 10) ~ GM_Post_Moratorium_YOY$Year)
plot(model3)
summary(model3)
```
#Question 7. Based upon the linear regression above, is the model statistically significant? Don't forget to utilize the p value and Adjusted R-Squared value in your explanation. What is your interpretation of the data? As a fisheries biologist, what conclusions would you draw from this plot and how would these conclusions influence management decisions? Based off the YOY data, does the population appear to be decreasing, increasing, or stable? Strongly consider interannual variability and trends when assessing this. What factors could impact interannual variability in spawning success?
#Question 8. Based off our our question, was our hypothesis supported or not? Looking at all of this information, do you think the moratorium was effective? As a fisheries biologist would you have done anything differently (shorter vs longer) in regards to the moratorium? (hint: think about policy, stakeholders, and the limitations of our data)