---
title: "Tag You're It: A Lesson on Study Design and Survival Analysis on an American Shad Tagging Study"
output: html_document
date: "2023-05-07"
author: "Brycen Boettcher"
---


# Setup and Importing Data
```{r}


#The first thing we must do is to load in the libraries that are going to be needed to conduct this lesson.  If there are packages that are not installed already, the (install.packages) command can be used to import missing ones with "" around the package name. 

#install.packages("Example")


library(survival)

#This package is used to carry out survival analysis within R and carries the coxph, Surv, and survfit functions that will be used below  


library(ggplot2)

#This package will be used to create the figures that show the results of the survival analysis at the end of each experiment 
```

```{r}
#Once all packages have been loaded, we can import the datasets that will be used.  If you are having problems with loading, make sure that data names in the function match exactly the file data name as numbers or spaces may have been added and your working directory matches where you have placed them.  


tag.data<-read.csv("American_Tagging_Data.csv")

surv.graph<- read.csv("Survival_Data.csv")

```


```{r}

#Before we explore the analysis, it is important to look through the data to gather an understanding of what is being presented in the tables

head(tag.data)

#Dataset for all three experiments in the study that will be run through survival analysis

#Tank - Which tank fish was housed in

#Treatment - Control (1) or Treatment (2)  

#M.F - Male (1) or Female (2)

#Time - How many days of experiment that each fish survived

#Status - Alive (1) or Dead (2) at the end of 21 day period

```


```{r}

head(surv.graph)
#This dataset is used for creating the survival graphs after each analysis is run.  

#Day - Day in which a fish was recorded dead in experiment

#Surv - % of fish surviving from original population

#Lci and Uci - Lower and upper confidence intervals


```


```{r}
# Select out subset of data for each experiment from overall data table.  This is important so that not all of the data from all 3 experiments is tested together due to the changing experimental design between each experiment.


exp1.dat<-subset(tag.data,experiment==1)

exp2.dat<-subset(tag.data,experiment==2)

exp3.dat<-subset(tag.data,experiment==3)
```


```{r}
# Before advancing, we want to make sure that each experiment was subset correctly by using the (nrow) function in R which shows the number of rows in a dataset.  Run the function below and then refer back to the paper to determine if the given number matches the sample size of experiment 1.  

nrow(exp1.dat)
```


```{r}
#Once you have checked that experiment 1 matches, check both experiments 2&3 on your own in the space below


```

#Experiment 1 - Two free floating tags 
```{r}
#Creating model to plot fixed effects on survival

#Students should reference lesson for breakdown of model and its elements.


exp1.cph <- coxph(Surv(time, status) ~ Treatment + M.F, data = exp1.dat)


#Retrieving the summary statistics of the model from above. 

summary(exp1.cph)


###REFERENCE BACK TO THE LESSON PAPER FOR A BREAKDOWN OF THESE SUMMARY STATISTICS BEFORE ANSWERING QUESTION 5 BELOW


```


#Question 5

What conclusions can be drawn from the summary statistics of experiment 1?  Pay close attention to the p values and exp(coef) values.


```{r}

# Once again, we will subset data from an overall table, this time the survival rate table, to create the reference graphs for each experiment

exp.one<-subset(surv.graph,exp==1)

exp.two<-subset(surv.graph,exp==2)

exp.three<-subset(surv.graph,exp==3)

```


```{r}


#Now that you have created the model and seen the statistical output, we want to look at the graph that is created from the overall surviving number of American shad in each experimental treatment and control group. To do that, we will use the "survfit" function to create the points and confidence intervals for the graphs.  This is the process that the authors went through to create the Survival_Data dataset.  

exp1.survival <- survfit(Surv(time, status) ~ Treatment + (1 | M.F), data=exp1.dat)

summary(exp1.survival)


#Time - Day in which a death occurred

#N.risk - Number of fish at risk of an event happening to them

#N.event - The number of events(death) that happened on that day 

#Survival - % of fish remaining from the starting number


```


```{r}

##Now we will use the ggplot function to graph each of the data points above and their attached lower and upper confidence intervals.   

SurvCurveExp1<-ggplot(exp.one, aes(day,surv,color=study_group))+
  geom_ribbon(aes(ymin=lci, ymax=uci,fill = study_group), alpha=0.125, linetype='blank')+
  geom_line(size=0.5,linetype = "dashed")+   
  geom_point(size=2)+   
  theme_bw()+  
  ylim(0,1)+ 
  theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank())

SurvCurveExp1

#Students can return to the lesson document for a breakdown of this figure if needed.

```


#Experiment 2 - Attached tags
```{r}
#Create model to measure fixed effects impact on fish from experiment 2
exp2.cph <- coxph(Surv(time,status) ~ Treatment + M.F, data = exp2.dat)

#Retrieving summary statistics from model above
summary(exp2.cph)

```


```{r}
#Creating the event points and confidence intervals for plotting

exp2.survival <- survfit(Surv(time, status) ~ Treatment + (1 | M.F), data=exp2.dat)


#Summary statistics for the function above
summary(exp2.survival)
```


```{r}
#Plotting the surviving number of fish from experiment 2

SurvCurvExp2<-ggplot(exp.two, aes(day,surv,color=study_group))+
  geom_ribbon(aes(ymin=lci, ymax=uci,fill = study_group), alpha=0.125, linetype='blank')+
  geom_line(size=0.5,linetype = "dashed")+ 
  geom_point(size=2)+
  theme_bw()+
  ylim(0,1)+
  theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank())
SurvCurvExp2

```


#Question 6

How do the statistical outputs of the first two experiments compare?  What about the survivorship graphs themselves?


#Experiment 3 - Attached tags with no dart tag
```{r}

##ON YOUR OWN CREATE THE MODEL FOR EXPERIMENT 3 BELOW USING EXAMPLES ABOVE

#Create model to measure the fixed effects impact on fish from experiment 3


#Retrieving summary statistics from the model above


```


```{r}
##ON YOUR OWN RUN THE SURVFIT FUNCTION TO ACHIEVE DATA POINTS AND CONFIDENCE INTERVALS FOR EXPERIMENT 3 BELOW

#Place function below


#Run summary statistics on the function


```


```{r}
#Plotting the surviving number of fish from experiment 3

SurvCurvExp3<-ggplot(exp.three, aes(day,surv,color=study_group))+
  geom_ribbon(aes(ymin=lci, ymax=uci,fill = study_group), alpha=0.125, linetype='blank')+
  geom_line(size=0.5,linetype = "dashed")+ 
  geom_point(size=2)+
  theme_bw()+
  ylim(0,1)+
  theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank())

SurvCurvExp3
```


#Question 7
Were the same variables found to be significant in experiment 3 as they were in the first two experiments?


#Question 8
How does the survivorship graph for experiment 3 vary from the graphs of the first two experiments?  


#Question 9
In looking at the summary statistics, we can see that each experiment provides different results on which effects are significant to the survival of fish.  With those results, what would you say would be the main takeaways from this study for fisheries biologists that are interested in this study?  Could you pick one method or would further experimentation need to be conducted?