---
title: "Ocean Acidification: Quantitative Skills"
author: "Paige Punzalan"
date: "5/11/2022"
output: html_document
--- 

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

#This R Markdown pairs with the "Quantitative Skills" portion of the QUBES lesson: "Ocean Acidification: Predator-Prey Interactions of Intertidal Snails and Sea Stars."
#Please follow along in the QUBES lesson document as you progress through this portion of the lesson!

First we load in the libraries.

#See the accompanying "Snails_Data_Info_Sheet.docx" for more information on these libraries and if there are any issues with installing any required packages.
```{r}
library(ggiraphExtra)
library(ggeffects)
library(ggplot2) 
```

Now, we load in the intertidal snail cue, proportion time out of water, and pH data. Make sure that the Snails_Lesson_Data.csv file is in the correct directory.
If you have issues loading in the file this way, it may be easier to go directly to your Files, click on the link for "Snails_Lesson_Data.csv," click "Import Dataset..." and click "Import." The data should then open up as a new tab next to this R Markdown tab.  

#Also note: File names may change during the download process. You may have to rename files in order to successfully run the code!

We will read in the data file as "Snail."
View the data to ensure that it has imported in correctly. There should be 32 rows and 4 columns. Make sure that the columns read "Snail," "Cue," "pH," and "Proportion." 
When we view our data, notice that there are no repeated measures, so there are no random effects to call for a mixed model. 
#See the accompanying "Snails_Data_Info_Sheet.docx" for more information on the data contained in these columns. 
```{r}
library(readr)
Snails <- read_csv("Snails_Lesson_Data.csv")
View(Snails)
```

To make the creation of the model syntax easier, we will rename our variables. 
```{r}
proportion <- Snails$Proportion
pH <- Snails$pH
cue <- Snails$Cue
```

Here is the code to produce our histogram of the data. We are able to see that these data do not fit a normal distribution.
```{r}
hist(proportion, main="Proportion of Time Out of Water", 
      xlab="Proportion")
```

Here, we will see if a log or log+1 transformation will improve the normality of out data. Code for the Shapiro-Wilkes Normality tests to test the normality of our data are also provided. 

we can visually see that normality is not improved by either transformation. To be sure, we can conduct Shapiro-Wilkes Normality tests. 
A significant p-value, below 0.05, shows us that we can conclude that, at a 5% significance, our data is non-normal. From this, we can deduce that a GLM may be the best way to go about modelling our data.
```{r}
hist(log(proportion, 10), main = "Log Transformation Proportion of Time Out of Water", xlab="Log Proportion of Time Out of Water")

shapiro.test(log(proportion, 10))

hist(log(proportion + 1, 10), main = "Log + 1 Transformation Proportion of Time Out of Water", xlab="Log + 1 Proportion of Time Out of Water")

shapiro.test(log(proportion + 1, 10))
```

We will attempt to recreate something similar to figure 4c as seen in the focal paper. This figure examines the relationship between pH, acting as ocean acidification, and the proportion of time out of water for the intertidal snails. We can use the glm() function to fit our logistic regression model. The summary() function allows us to observe the output of our model and analyze our results. 

In our syntax, we use "*" to include our interaction term between cue and pH. This interaction term reflects a difference in the effect of pH on snail refuge-seeking between the cue and no-cue treatments. In the focal paper, pH only decreased the tendency of snails to exit the water when low pH was accompanied by predator cue. The authors found this through the interaction term.

Paying attention to the “Coefficients” part of the summary, we can look to the last column, showing our p-values. None of these values indicate significance with an alpha value of 0.05 (below 0.05).  We found that our predictors (cue, pH, as well as the cue and pH interaction term) did not have a statistically significant effect on proportion of time out of water.
```{r}
Snailsglm <- glm(proportion ~ cue * pH, family = binomial(link="logit"), data = Snails)

summary(Snailsglm)
```

Here is the code to visualize and plot the model that we just fit using the ggPredict() function. 

Using ggPredict (), the arguments we utilize are “fit,” “se,” and “digits.” For “fit,” we use the model label from the previous section “snailsglm.” For “se,” we use “TRUE” as we can to show standard error on our plot. For “digits,” we use “2” as we want two decimal places to be shown. We label our plot “snailsplot” to be able to edit axis labels and the plot title. 

From our produced plot, there are similarities between our visualization and the focal paper's figure 4c.We were able to achieve a similar S-curve achieved by the authors from their GLMM model becuase of our use of logistic regression. Despite our contrasting modelling results, we are still able to visually see a reduction in proportion of time out of water for these snails in the presence of predator cue at lower levels of seawater pH.
```{r}
Snailsplot <- ggPredict(Snailsglm,se=TRUE,digits=2) 
plot(Snailsplot) +  
  labs (
    x = "pH", 
    y = "Proportion of Time out of Water", 
    title = "pH Effect on Proportion of Time out of Water")
```