---
title: "603 QUBES Coral Microplastics R Markdown"
author: "Alyssa Hoekstra"
date: "4/27/2021"
output: html_document
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

##Install any required packages using install.packages, and load all required libraries. 


```{r}
##In order to run the code and run certain functions, you will first need to have R compatible packages installed and loaded. If you try to load a library and get an error message that the package is not installed, you can install by running the code: install.packages("PACKAGENAME") . Once installed, you can then load the library. For example:

install.packages("coin")
library(coin)
##Repeat as necessary 


##LOAD LIBRARIES

library(readxl)
library(readr)
##The readxl and readr library are used to import the data into R as a dataframe

library(ggpubr)
library(ggplot2)
library(PairedData)
## The ggplot2 and ggpubr libraries are used to create graphs and plots. The paired data library adds additional functionality for plotting paired data that we will use. 

library(dplyr)
library(tidyverse)
##The  tidyverse and dplyr libraries are used to transform and visualize data. We are specifically using some of the piping functions. 

library(rstatix)
library(coin)
## The coin library will be used for the Wilcoxon test and rstatix will be used to pipe data using the coin functions. 

```

## We are going to use the data behind Figures 3 and 5 from the Rotjan et. al (2019) paper

## Let's start by importing the data from the excel spreasheet
```{r}
## Make sure you have the spreadsheet downloaded and that the file name matches below.  
 Rotjan_et_al_data_fig3 <- read_excel("Rotjan et al data.xlsx") #import the data 
summary(Rotjan_et_al_data_fig3)  # take a look at the data


## Next we are going to use the filter function to separate the data into data frames by particle type to make it easier to manipulate the data for some of our analyses
 
Rotjan_et_al_data_fig3 %>%
  filter(Type == "Microplastic") -> MP2 #filter Microplastic data, assign to new dataframe
MP2     #view the data 

Rotjan_et_al_data_fig3 %>% 
  filter(Type == "Brine Shrimp Eggs") -> BSE2 #filter Brine Shrimp Egg data, assign to new dataframe
BSE2    # view the data 
```

## Next let's check normality of the data 

```{r}

## visualize the data using a histogram, then perform a Shapiro test to test normality of the data

hist(MP2$`Particles consumed / mm3`)
shapiro.test(MP2$`Particles consumed / mm3`) 

hist(BSE2$`Particles consumed / mm3`)
shapiro.test(BSE2$`Particles consumed / mm3`) 

##Answer Question 1 on the QUBES worksheet 
```
## We are going to plot the data using a boxplot

```{r}
##We will use ggplot to create a boxplot. A boxplot plots the data in a way that shows five different summaries of the data: the minimum and maximum and represented by points at the top and bottom of the plot, the first and third quartile of the median are represented by the top and bottom of the box, and the median is the middle line.

ggplot(Rotjan_et_al_data_fig3,                        #use ggplot, select the dataframe
       aes(x=`Type`, y=`Particles consumed / mm3`))   #define columns for x and y variables
       + geom_boxplot(fill="orange")  -> Fig3Boxplot  #configure the boxplot 
Fig3Boxplot  


##Answer Question 2 on the QUBES worksheet 
```

## Next we will perform the statistical analyses 
## Because the data is not normally distributed, we are going to proceed with a paired permutation test. A paired permutation test is appropriate in this instance due to the non-normality of the data as well as the small sample size. 

##PAIRED PERMUTATION TEST: We will be using the Wilcoxon test. This test uses permutations and compares the observed median of the data to the calculated permuted calculated medians.  We are going to do a few steps to visualize the data before we run the statistical analysis, as well as run a few calculations to get some additional measures about the data including median, interquartile range, and effect size. 


```{r}

##First we are going to visualize the paired data by plotting the pairs in a paired boxplot. Note the lines connecting two points, signifying a colony pair. 

PairedBoxplot <- ggpaired(Rotjan_et_al_data_fig3, # use the ggpaired fxn, select dataframe
    x = "Type", y = "Particles consumed / mm3",   #select columns for x and y variables
   order = c("Brine Shrimp Eggs", "Microplastic"), #set the order in the plot
   ylab = "Particles consumed / mm3", xlab = "Particle Type") #adjust labels on the plot
PairedBoxplot


##We are going to calculate the Median and interquartile range. The median is the value at the midpoint of the data distribution. The interquartile range or IQR is the range (or difference) between the value at the first quarter and the third quarter. Remember that when we did the boxplot, the first and third quarter are represented by the top and bottom of the box. 

Rotjan_et_al_data_fig3 %>%
  group_by(Type) %>%       #group by type of particle
  get_summary_stats(`Particles consumed / mm3`, type = "median_iqr") #use function to calculate median and IQR by selecting median_iqr as type of test to be run

```




```{r}
## Now we will run the Wilcoxon paired permutation test 

stat.test <- Rotjan_et_al_data_fig3 %>%
rstatix::wilcox_test(`Particles consumed / mm3` ~ Type, # specify the library the wilcox_test                                                           will be run out of as two of our                                                             loaded libraries use the same text.                                                        # Select the variable to be calculated                                                          by group factor.
                     paired = TRUE) %>%  # Paired = TRUE signifies we have paired data                                                 between the columns and the test will treat the                                              data as paired
  add_significance() #add a p value to the output
stat.test


## By running a one-sided test and adding the alternative hypothesis that the Brine Shrimp Eggs (as group 1) have a lesser median than the Microplastic (as group 2) group, we can determine which direction the significant finding from our previous two-tailed test went:
stat.test2 <- Rotjan_et_al_data_fig3%>%
rstatix::wilcox_test(`Particles consumed / mm3` ~ Type, paired = TRUE, alternative = "less") %>%  #notice this time we have 'alternative = less' - without this, the test automatically runs as a two-sided test, but this will allow us to test an alternative hypothesis as specified
  add_significance()
stat.test2


## Run the below code to get the effect size. 

##To calculate the effect size when using wilcoxon test, the difference between the two groups' medians is divided by the standard deviation [which is known as the Z statistic], which is then divided by the square root of the sample size. For the wilcox_effsize, the Z statistic is taken from the wilcox_test function. 

##According to the R help text for the wilcox_effsize, interpretation values for r (effect size) commonly in published literature and on the internet are: 0.10 - < 0.3 (small effect), 0.30 - < 0.5 (moderate effect) and >= 0.5 (large effect).

Rotjan_et_al_data_fig3 %>%
  wilcox_effsize(`Particles consumed / mm3` ~ Type, paired = TRUE) # select the variable and group factor to be tested. Notice again we have 'paired = TRUE', the test will treat the data between the columns as paired 


##Answer Question 3 on the QUBES worksheet

```

##Figure 5 uses the same data from figure 3, but visualizes it differently. 

```{r}

##Because the data are paired, we can use a scatterplot to visualize the ingestion preference of each colony. Each point represents one colony. We are going to plot the Particles Consumed per mm3 for Microplastics on the x-axis and Brine Shrimp Eggs on the y-axis on the same plot for each colony. We are going to add a boundary line through the plot to divide the top half for values that showed a preference for Brine Shrimp Eggs, and the bottom half for values showing a preference for Microplastics to make it easier to visualize. Points closer to the line would be more neutral showing a 1:1 ratio, and points that fall on either side will have shown that preference. 

BSE  = Rotjan_et_al_data_fig3$`Particles consumed / mm3` [Rotjan_et_al_data_fig3$Type =="Brine Shrimp Eggs"]
MP = Rotjan_et_al_data_fig3$`Particles consumed / mm3` [Rotjan_et_al_data_fig3$Type =="Microplastic"]
plot(MP, BSE,
     pch = 16,                # shape of points
     cex = 1.0,               # size of points
     xlim=c(0, 3),          # limits of x axis
     ylim=c(0, 3),          # limits of y axis
     xlab="Microplastics, Particles consumed / mm3",  #label on x axis
     ylab="Brine Shrimp Eggs, Particles consumed / mm3")    #label on y axis

abline(0,1, col="blue", lwd=2)  #adding a line through the plot

##Answer question 4 on the QUBES lesson

```