---
title: "Biodiversity & Agriculture"
author: "Your Name"
date: "Today's Date"
output: html_document
---

# Objectives
In this lesson, we will be exploring the relationship of species richness with local intensity and landscape complexity in agricultural fields using meta-analysis. We will be quantifying these relationships for 3 different taxonomic groups (plants, invertebrates and vertebrates). Using a simplified dataset, we will run basic linear models separately for local intensity and landscape complexity. We will discuss why this type of model is insufficient due to the nature of the data and will run linear mixed models with the random effect "Study" accordingly. We will calculate the marginal means and 95% confidence intervals using our mixed random effects models to plot these values for each taxonomic group. As we run these analyses, consider what implications does intensity and complexity have on different species and why might a multi-scale approach be needed to conserve overall biodiversity.

# Setup

***Load in all necessary libraries***
```{r setup, include=FALSE}
library(ggplot2)
library(tidyverse)
library(dplyr)
library(lme4)
library(ggfortify)
library(emmeans)

#If you do not already have some of these packages loaded, you will need to install the packages by running the function install.packages() in the console. Make sure to put the name of the package in quotes.
```

***Load in data***
```{r}
conserve <- read.csv("AgConservation.csv")

head(conserve)
summary(conserve)
```
If you do not see the headings as simply "Study", "TaxGroup", "CropType", etc. and instead see ï.. before each name in the summary, refer to the troubleshooting chunk below. Otherwise, skip the troubleshooting step.

***Troubleshooting***
```{r}
#If your headings are not looking correct, please run this chunk by removing the # signs before each line.

#conserve$Study <- conserve$ï..Study
#conserve <- subset(conserve, select=-c(ï..Study))

#conserve$TaxGroup <- conserve$ï..TaxGroup
#conserve <- subset(conserve, select=-c(ï..TaxGroup))

#conserve$CropType <- conserve$ï..CropType
#conserve <- subset(conserve, select=-c(ï..CropType))

#conserve$LocalIntense <- conserve$ï..LocalIntense
#conserve <- subset(conserve, select=-c(ï..LocalIntense))

#conserve$LandscapeComp <- conserve$ï..LandscapeComp
#conserve <- subset(conserve, select=-c(ï..LandscapeComp))
```

Notice that some variables have "character" or "integer" data types. We must change these types to "factor" in order to run linear and linear mixed models later.

***Making variables factors***
```{r}
#Change character and integer data into factor data 
TaxGroup <- as.factor(conserve$TaxGroup)
CropType <- as.factor(conserve$CropType)
Study <- as.factor(conserve$Study)

#Assign new factor variables into a dataframe and assign it back to "conserve"
conserve <- data.frame(Study, TaxGroup, CropType, conserve$LocalIntense, conserve$LandscapeComp)

#Simplify names
names(conserve)[4] <- "LocalIntense"
names(conserve)[5] <- "LandscapeComp"

```

# Basic Linear Models

Now let's run some basic linear models and look at the overall effect of local intensity and landscape complexity with taxonomic group and crop type. Crop type is used, as it can impact species richness in the area.

***Making linear models***
```{r}
#Local intensity linear model
locallm <- lm(LocalIntense ~ TaxGroup + CropType,  data = conserve)
summary(locallm)

#Landscape complexity linear model
landscapelm <- lm(LandscapeComp ~ TaxGroup + CropType,  data = conserve)
summary(landscapelm)
```

Look at the R-squared values and p-values specifically to draw conclusions on how well the model fits the data. R-squared values are how well the variation in the dependent variable can be explained by the independent variable and is represented as a percentage. The higher the percentage, the better the data is explained by the model. P-values is a measurement of statistical significance. P-values under 0.05 indicate higher statistical significance.

*Question 1: How well do these two linear models explain the data? Are there differences between how well the model fits for local vs. landscape factors?*

Now let's look at the model outputs as plots

***Plotting linear models***
```{r}
#linear model plots for local intensity
autoplot(lm(LocalIntense ~ TaxGroup + CropType , data = conserve), label.size = 3)

#linear model plots for landscape complexity
autoplot(lm(LandscapeComp ~ TaxGroup + CropType , data = conserve), label.size = 3)
```

Focus on the Residuals vs Fitted and Normal Q-Q Plots. 

Residual values are a measure of how much a regression (best fit) line vertically misses a data point. A residual plot has the residual values on the y axis and the independent variable on the x axis. These plots show how widely the points deviate from the line and if there are any data outliers. The blue trend line should be as close to the zero line as possible, as this indicates a better fit from our model and less deviation of our data from the regression line.

Normal Q-Q plots are used to assess, in our case, if our data is normally distributed like we are assuming.The data points should be as close the grey trend line as possible. A deviation of the points on either end of the axis indicate a less optimal fit from our model.

*Question 2: Looking at the data visually, how well are these model fitting the data now? Is a linear model the best method for assessing this data? Are we not accounting for something?*


# Linear Mixed Models

Our study is a meta-analysis which includes data points from a variety of previous studies. A basic linear model can't account for multiple data points coming from one study, which is non-independent data because data points from one study would be more related to each other than data from another study. We would be creating pseudoreplication and create statistical relationships that may or may not actually exist. To account for non-independence, we must run a linear mixed model with a random effect of "Study". 


***Making Mixed Models with Random Effect (Study Variable)***
```{r}
#Linear Mixed Model with Random Effect for Local Intensity
LocalLMM <- lmer(LocalIntense ~ TaxGroup + CropType + (1|Study), data = conserve)

LocalLMM

#Linear Mixed Model with Random Effect for Landscape Complexity
LandscapeLMM <- lmer(LandscapeComp ~ TaxGroup + CropType + (1|Study), data = conserve)

LandscapeLMM
```


Those models outputs are useful to see random effects vs fixed effects (similar to our basic linear model before) but it may be more useful for our understanding to see the marginal means of each separate taxonomic group plotted. It is also important to see the confidence intervals to see if the relationship is significant. Let's plot the marginal means of each model by taxonomic group with a 95% confidence interval. 


***Calculating marginal mean values and 95% confidence intervals for species richness values by local and landscape factors***
```{r}
#Utilizing emmeans package to calculate marginal means for local intensity
local <- emmeans(LocalLMM, "TaxGroup")
local

#Utilizing emmeans package to calculate marginal means for landscape complexity
landscape <- emmeans(LandscapeLMM, "TaxGroup")
landscape

```


Now let's plot the marginal means with a 95% confidence interval.

***Plotting marginal means with a 95% confidence interval***
```{r}
#Plot for Local Intensity
plot(local) + theme_bw() + 
  labs(x = "Estimated marginal mean for local intensity", y = "Taxonomic Group")

#Plot for Landscape Complexity
plot (landscape) + theme_bw() + 
  labs(x = "Estimated marginal mean for landscape complexity", y = "Taxonomic Group")

```

Now we can visually see the significance of our model outputs by taxonomic group. You can tell if the relationship is significant if the purple 95% confidence interval is completely above or completely below the zero line. Crossing the zero line indicates a lack of statistical significance. We are assuming the optimal agricultural environment for biodiversity in this case, which is less local intensity and more landscape complexity. Positive marginal means indicate positive relationships between the predictor (local vs landscape) and response (taxonomic group species richness) variable.

*Question 3: Which taxonomic groups have a significant relationship with less local intensity? Which taxonomic groups have a significant relationship with more landscape complexity?*

*Question 4: How might different species respond to local management intensity and landscape complexity differently? (Hint: think about the mobility of plants, invertebrates and vertebrates)*

*Question 5: Why is a multi-scale conservation approach essential?*