--- title: "Instructor: Seagrass and Coral Disease Qubes Lesson" author: "Madison Hagen" output: html_notebook --- ## Welcome to the R portion of the Qubes lesson! Ensure that this RMD File is saved in the same location as the CSV file so the data reads in properly. First we will load in the libraries we need to use. If you do not have the following packages, use "install.packages" function (ex: install.packages(dplyr)). ```{r} library(readr) library(dplyr) library(ggplot2) ``` We will now load in the manipulated coral disease csv file and then check the first few lines to ensure the data is correct and working properly.For this QUBES lesson we will only be working with the following data/variables: *Status (Seagrass or No Seagrass) - Explanatory categorial data * *WS (White Syndrome) - Continuous response variable * *SEB (Skeletal Eroding Band) - Continuous response variable* ```{r} corals <- read_csv("Corals.csv") head(corals) ``` Now that the data is updated in the data.frame, the next step is to look at the distribution of the response variable, WS, using the following code: ```{r} ggplot(corals, aes(x=WS)) + geom_histogram() ``` **Is this data normally distributed?** As you can see in the histogram the count data is not normally distributed since the data is skewed to the left. If the data was normally distributed it would appear as a bell curve. Therefore, we need to use a poisson generalized linear model in order to create the correct model. ```{r} model1 <- glm(WS~Status,family=poisson(link="log"), data=corals) summary(model1) ``` By looking at the summary we can better visualize the data and see what the intercepts and p-values are. **Intercept of No Seagrass - 1.4469** **P-value - <2e-16** **Intercept of Seagrass - -1.9859** **P-value - <8.35-07** Finally, we can plot the model to see if the status of seagrass meadows has an effect on White Syndrome count. ```{r} ggplot(model1, aes(x=Status, y=WS)) + geom_boxplot() + geom_point(size=4,color ='lightgrey',alpha=0.5)+ xlab("Seagrass Status") + ylab("White Syndrome Count") ``` **Based on the graph above, the intercept value, and the p-value shown in summary of model1, does Seagrass status have an effect on White Syndrome Count??** *Yes - Seagrass meadow status has an effect on White Syndrome Count. This is observed in the graph above and the summary of the GLM model contains intercept values that are statistically significant from one another since P-values are equal to <2e-16 and 8.35e-07. The intercept values are 1.4469 and -1.9859 which are not similar to one another. ### ON YOUR OWN - Skeletal Eroding Band Data ```{r} #Plot a histogram of the SEB data to test if the data is normally distributed ggplot(corals, aes(x=SEB)) + geom_histogram() ``` ```{r} #Is the data normally distributed? If no, create a GLM with a poisson distribution for SEB and provide a summary of the data to find the intercept and p-value. model2 <- glm(SEB~Status,family=poisson(link="log"), data=corals) #Use the summary code to look at the intercepts and P-values. summary(model2) ``` **Intercept of No Seagrass- 0.5108** **P-value- 0.0223** **Intercept of Seagrass- 0.3716** **P-value- 0.2011** ```{r} #Plot the data to visualize if seagrass status effects Skeletal Eroding Band count. ggplot(model2, aes(x=Status, y=SEB)) + geom_boxplot() + stat_smooth(method="glm", method.args=list(family="poisson"), se = T) + ylab("Skeletal Eroding Band count") + xlab("Seagrass Status") ``` **Question- Based on the graph above, the intercept value, and the p-value shown in summary of model2, does Seagrass status have an effect on Skeletal Eroding Band Count?** - No - Seagrass Status does not have an effect on Skeletal Eroding Band count.This is observed in the graph above and the summary of the GLM model contains intercept values that are not statistically significant from one another since the P-values are equal to 0.0233 and 0.2011. The intercepts of No Seagrass and Seagrass are 0.5108 and 0.3716 which are similar to one another.