--- title: "Instructor Copy" author: "Dylan Stephens" date: "2024-04-25" output: html_document --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) require(tidyverse) ``` ##Students Will Be Able To *Utilize tools included in the Tidyverse package to understand and manage datasets* *Use Linear Regressions to help understand the relationships between variables* *Test the significance of linear relationships with an Analysis of Covariance (ANCOVA) test* ##Assessing knowledge from the reading ###Quesions 1. What are some anthropgenic factors that contribute to coastal wetland habitat loss? *Climate Change Land Use Change Sea Level Rise* 2. What are some of the main factors that contribute to nutrient pollution? *Agricultural Fertilizer Human and Animal Waste Loss of protective land around waterways* 3. How do you think nutrient pollution and coastal wetland habitat loss might interact? *Nutrient pollution can increase the rate that coastal wetland habitat loss occurs by encouraging the rate of growth of invasive or destructive species. Nutrient pollution can also cause dead zones by driving uncontrolled algal growth known as algae blooms leading to eutrophication.* 4. Are there any specific ways that wetlands might interact differently with nutrient pollution? *Wetlands are particularly sensitive to the impacts of nutrient pollution because of their function as natural filters. By performing their normal function of accumulating sediment and nutrients they draw an excess of the impacts from anthropogenic inputs to waterways.* ##Wetlands: What they do and why it is important Wetlands provide a wide variety of ecosystem services which range from mitigating the impact of erosion, to filtering water, and sequestering incredible quantities of carbon. These impacts are becoming more important in the face of anthropogenic climate change, in large part because of that carbon sequestration effect but also because they soften the blow from extreme weather events and retain water during drought conditions. ##Looking at the data ##The Data ```{r} #Reading in the data LongIslandData<-read.csv("LessonData.csv", stringsAsFactors = T, check.names =0) ``` #Data handling ```{r} ##The Dplyr Toolset includes the Glimpse() command which allows us to retrieve a brief overview of our data glimpse(LongIslandData) ``` The Tidyverse package includes tools to manipulate, visualize, and understand our data as well as tools to enhance your ability as a coder. One tool we will use is the pipe operator %>% (SHORTCUT: CTRL+SHIFT+M) which is used to chain together multiple functions in a concise and readable way. It takes the output of one function on the left and uses it as the input to the function on the right. Our data does not include the original area of the Marshes, however we have the total area lost, as well as what percentage of the total that represents. We can use the Mutate() function which allows us to reference our dataset in the creation of new columns ```{r} LongIslandData %>% #pipe operator tells the function to use this dataset mutate('Total Marsh Area (ha)'=`Total Marsh Area Lost (ha)`/(`Marsh Area Change (%)`/100)*-1)->LongIslandData #The select column lets us summon columns by name LongIslandData %>% select(`Site Name`,`Total Marsh Area (ha)`) ``` You will notice that some of the rows show *NA* rather than a number this means that in the data the information was not provided to create our new column, so lets get rid of any rows missing data. To do this we will use the command drop_na() from the dplyr package which scans through any dataframes or columns fed into it and removes incomplete rows. By assigning our dataset without the *NA's* to the same name we replace the original with the cleaned up data. ```{r} LongIslandData<-drop_na(LongIslandData) #Now using the same code, do you see how the output has changed? LongIslandData %>% select(`Site Name`,`Total Marsh Area (ha)`) ``` Lets visualize the data so we can better understand some of the individual variables. Firstly lets look at the distribution of the size of the marshes using a histogram we will be using the ggplot structure included in the GGplot2 package within the Tidyverse These simple plots will help us see what the spread of our data looks like, and can be an important method for determining the structure of the distribution of our data ```{r} LongIslandData %>% #Calling our dataset and feeding into the next function ggplot( #Begin GGplot structure aes(x=`Total Marsh Area (ha)`))+ #Defines the plot axis and data displayed geom_histogram(bins = 10)+#Defines the type of plot which will be created labs(title = "Total Marsh Area for Study Sites", y='# of Sites')#Add axis labels or Titles ``` #Introducing Soil δ15N This variable represents the amount of a specific isotope or type of the element Nitrogen found in soil samples gathered at the study sites. δ15 Nitrogen is used in this study to stand in as a measure of nutrient pollution because man made practices like agriculture and waste generate higher ratios of δ15N compared to the Nitrogen present in the environment. ##Different Types of Salt Marshes Differentiating the two by plant species, the researchers focused on two types of marsh the High marsh and the Intertidal marsh. The scientific names of the Saltmarsh Cordgrass and Saltmeadow Cordgrass have changed and are indicated below. Defined as follows: Intertidal Marsh: Lower elevation marsh dominated by Spartina alterniflora (now known as Sporobolus Alterniflora) ```{r,echo=F} knitr::include_graphics("Alterniflora.jpeg") ``` *Sporobolus Alterniflora on Long Island Photo By iNaturalist User: lenorakdaniel* High Marsh: higher elevation marsh dominated by Spartina patens (now known as Sporobolus Pumilus) and Distichlis spicata ```{r,echo=F} knitr::include_graphics("Pumilis.jpeg") ``` *Sporobolus Pumilus on Long Island Photo By iNaturalist User: teresa_d* ```{r,echo=F} knitr::include_graphics("Distichlis.jpeg") ``` *Distichlis Spicata on Long Island Photo By iNaturalist User: teresa_d* ##The Linear Regression In the paper the researchers found some fascinating and unexpected responses from their analysis. Here we will focus on one of the predictor variables of Soil δ15N and dive deeper into understanding the relationship between a predictor variable and response variables. The researchers found that there was an interesting effect happening when you compared the effect on High Marsh Loss and Intertidal Marsh loss. This finding was that the different types of marsh experienced not just different rates of loss based on the different levels of Soil δ15N. But a total inversion of the correlation, where one marsh type showed a positive correlation the other showed a negative correlation. Lets walk through how to plot the Total Marsh Lost against Soil δ15N ```{r} LongIslandData %>% #Calling in the dataset ggplot(aes(x=`soil d15N`, #Predictor variable y=`Marsh Area Change (%)`))+ #Response Variable geom_point()+ #Uses a geometry of points labs(title = "Change in Marsh Area as a Factor of Soil δ15N", #title x="Soil δ15N", #x axis label y="Percent Change in Marsh Area")+ #y axis label geom_smooth(formula= y~x,method = "lm",fullrange=T, level=F) #generates a line with the chosen method, in this case a linear regression and extends it to the full range of the graphs. This can also show confidence intervals but they have been suppressed using level=F ``` Does this explain significant variation in the data? To see if it does lets look at a summary of the linear model. ```{r} summary(lm(`Marsh Area Change (%)`~`soil d15N`,data = LongIslandData)) ``` What is the "Multiple R Squared" *0.001* What is the "p-value" *0.811* These tell us that the amount of variation explained by this variable is *small*, because the R-squared value is *low*, and that it likely *does not* have a significant impact on the overall spread of the data, because of the p value is much *greater* than 0.05. Using the Structure shown above Plot the Change in High Marsh Area as a Factor of Soil δ15N ```{r} LongIslandData %>% ggplot(aes(y=`High Marsh Area Change (%)`, x=`soil d15N`))+ geom_point()+ labs(title = "Change in High Marsh Area as a Factor of Soil δ15N", x="Soil δ15N", y="Percent Change in Marsh Area")+ geom_smooth(formula= y~x,method = "lm",fullrange=T, level=F) ``` Does this explain significant variation in the data? ```{r} summary(lm(`High Marsh Area Change (%)`~`soil d15N`,data = LongIslandData)) ``` Plot the Change in High Marsh Area as a Factor of Soil δ15N ```{r} LongIslandData %>% ggplot(aes(y=`Intertidal Marsh Area Change (%)`, x=`soil d15N`))+ geom_point()+ labs(title = "Change in Intertidal Marsh Area as a Factor of Soil δ15N", x="Soil δ15N", y="Percent Change in Marsh Area")+ geom_smooth(formula= y~x,method = "lm",fullrange=T, level=F) ``` Does this explain significant variation in the data? ```{r} summary(lm(`Intertidal Marsh Area Change (%)`~`soil d15N`,data = LongIslandData)) ``` Using the R-Squared and p-values as well as the slope of the linear regressions on the graphs what can you say about the impact of Soil δ15N on the two types of Marsh Loss? The relationship between High Marsh Loss and Soil δ15N was (Positive/Negative) and *Significant* with a p-value of *0.001* and a Multiple R-Squared value of *0.272* The relationship between Intertidal Marsh Loss and Soil δ15N was (Positive/Negative) and *Significant* with a p-value of *0.033* and a Multiple R-Squared value of *0.132* ##Ancova Using an Ancova will allow us to determine whether or not the variation in the impact between the high and low marshes is statistically significant or not. To do this we will need to change the dataset a little bit: The new dataset you will be reading in contains columns which represents the percentage change in marsh (irrespective of the type), the level of Soil δ15N, and a new column which represents the type of marsh the percentage is refering to. This allows us to perform the Ancova using the marsh type as a categorical variable which will help separate out whether or not the impact is significant across the two marsh types. ```{r} read.csv("AncovaData.csv", stringsAsFactors = T, check.names =0) %>% drop_na()->AncovaData ``` ```{r} lm(`Marsh Area Change (%)`~`soil d15N`*Type, data = AncovaData)->lmH summary(lmH) ``` What is the "Multiple R Squared" *0.179* What is the "p-value" *0.003* What does this tell us about the difference in impact of Soil δ15N on the loss of each type of marsh? The first items to observe are the Intercepts, which are reported as (Intercept) = *-64.344* (This represents a theoretical derived amount of High Marsh Change where Soil δ15N = 0) and TypeIntertidal =*126.708* (This represents a theoretical derived amount of Intertidal Marsh Change where Soil δ15N = 0) this tells us that in this state based on our model that High Marsh would be *Shrinking/Growing* at a rate of *-64.344*% while Intertidal Marsh would be *Shrinking/Growing* at a rate of *126.708*% without the impact of Soil δ15N. Does this line up with what our graphs looked like? Looking at the results from the ANCOVA we can see that the Soil δ15N, reports a slope of *9.506* (increase in percent marsh loss) in the High marsh, this tracks with what we have seen before in the graph comparing High Marsh Change and the amount of Soil δ15N. In this case the High marsh is considered the main term and Intertidal is compared to it. Similarly the `Soil δ15N`:TypeIntertidal term shows the slope of *-19.861* for the rate of Intertidal Marsh Change. Because the High marsh is considered the main term we know that by adding them together we can find the impact of Soil δ15N on Intertidal Marsh Change.In this case the slope is *Positive/Negative* which again tracks with the graphs we created above showing that the slopes go in the *Same/Opposite* directions. Add the Two reported Slopes together *9.506* + *-19.861* = *-10.355* This tells us that while the effect of Soil δ15N on High Marsh and Intertidal Marsh are of *Opposite/Similar* direction, the magnitude is *Similar/Opposite*. Is there a significant variation between the rate of the two types of marsh change in their relationship with Soil δ15N? *There is no significant variation between the rate of change of High or Intertidal marsh due to Soil δ15N levels* Break into small groups and discuss reasons why this might be. After your discussion individually write a brief summary of your group's discussion in the area provided below *Instructor points* *What does it mean that the Magnitude is the same but the direction is different?* *How can you see this impacting the environment?* *How can we apply this knowledge to what we know about wetlands and their functions?* Once you have completed this, please select the Knit button at the top of the screen to generate a .html file which you can then submit to your instructor.