---
title: "10Lesson_Teacher_Version"
output: html_document
date: "2023-05-08"
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

#Setup and Importing Data
```{r}
#The first thing we must do is to load in the libraries that are going to be needed to conduct this lesson.  If there are packages that are not installed already, the (install.packages) command can be used to import missing ones with "" around the package name. 

#install.packages("Example")


library(survival)

#This package is used to carry out survival analysis within R and carries the coxph, Surv, and survfit functions that will be used below  


library(ggplot2)

#This package will be used to create the figures that show the results of the survival analysis at the end of each experiment 
```


```{r}
#Once all packages have been loaded, we can import the datasets that will be used.  If you are having problems with loading, make sure that data names in the function match exactly the file data name as numbers or spaces may have been added and your working directory matches where you have placed them.  


tag.data<-read.csv("American_Tagging_Data.csv")

surv.graph<- read.csv("Survival_Data.csv")

```


```{r}

#Before we explore the analysis, it is important to look through the data to gather an understanding of what is being presented in the tables

head(tag.data)

#Dataset for all three experiments in the study that will be run through survival analysis

#Tank - Which tank fish was housed in

#Treatment - Control (1) or Treatment (2)  

#M.F - Male (1) or Female (2)

#Time - How many days of experiment that each fish survived

#Status - Alive (1) or Dead (2) at the end of 21 day period

```


```{r}

head(surv.graph)
#This dataset is used for creating the survival graphs after each analysis is run.  

#Day - Day in which a fish was recorded dead in experiment

#Surv - % of fish surviving from original population

#Lci and Uci - Lower and uppper confidence intervals

```


```{r}
# Select out subset of data for each experiment from overall data table.  This is important so that not all of the data from all 3 experiments is tested together due to the changing experimental design between each experiment.


exp1.dat<-subset(tag.data,experiment==1)

exp2.dat<-subset(tag.data,experiment==2)

exp3.dat<-subset(tag.data,experiment==3)

```


```{r}

# Before advancing, we want to make sure that each experiment was subset correctly by using the (nrow) function in R which shows the number of rows in a dataset.  Run the function below and then refer back to the paper to determine if the given number matches the sample size of experiment 1.  

nrow(exp1.dat)
```


```{r}
#Once you have checked that experiment 1 matches, check both experiments 2&3 on your own in the space below

nrow(exp2.dat)

nrow(exp3.dat)

```


#Experiment 1 - Two free floating tags 
```{r}
#Creating model to plot fixed effects on survival

#Students should reference lesson for breakdown of model and its elements.


exp1.cph <- coxph(Surv(time, status) ~ Treatment + M.F, data = exp1.dat)


#Retrieving the summary statistics of the model from above. 

summary(exp1.cph)


###REFERENCE BACK TO THE LESSON PAPER FOR A BREAKDOWN OF THESE SUMMARY STATISTICS BEFORE ANSWERING QUESTION 5 BELOW

```


#Question 5

What conclusions can be drawn from the summary statistics of experiment 1?  Pay close attention to the p values and exp(coef) values.


We can see from the summary statistics output that the treatment effect has an extremely significant hazard ratio of over 39, meaning that the grouping in which each fish was placed in highly influenced whether it survived the experiment or died.  A very small p value also backs up our conclusions from this output showing that treatment has a much greater effect on death than whether the fish were male or female.


```{r}

# Once again, we will subset data from an overall table, this time the survival rate table, to create the reference graphs for each experiment

exp.one<-subset(surv.graph,exp==1)

exp.two<-subset(surv.graph,exp==2)

exp.three<-subset(surv.graph,exp==3)
```


```{r}

#Now that you have created the model and seen the statistical output, we want to look at the graph that is created from the overall surviving number of American shad in each experimental treatment and control group. To do that, we will use the "survfit" function to create the points and confidence intervals for the graphs. This is the process the authors went through to create the Survival_Data dataset.  

exp1.survival <- survfit(Surv(time, status) ~ Treatment + (1 | M.F), data=exp1.dat)

summary(exp1.survival)


#Time - Day in which a death occurred

#N.risk - Number of fish at risk of an event happening to them

#N.event - The number of events(death) that happened on that day 

#Survival - % of fish remaining from the starting number

```


```{r}
##Now we will use the ggplot function to graph each of the data points above and their attached lower and upper confidence intervals.   

SurvCurveExp1<-ggplot(exp.one, aes(day,surv,color=study_group))+
  geom_ribbon(aes(ymin=lci, ymax=uci,fill = study_group), alpha=0.125, linetype='blank')+
  geom_line(size=0.5,linetype = "dashed")+   
  geom_point(size=2)+   
  theme_bw()+  
  ylim(0,1)+ 
  theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank())

SurvCurveExp1

#Students can return to the lesson document for a breakdown of this figure if needed.
```


#Experiment 2 - Attached tags
```{r}

#Create model to measure fixed effects impact on fish from experiment 2
exp2.cph <- coxph(Surv(time,status) ~ Treatment + M.F, data = exp2.dat)

#Retrieving summary statistics from model above
summary(exp2.cph)
```


```{r}
#Creating the event points and confidence intervals for plotting

exp2.survival <- survfit(Surv(time, status) ~ Treatment + (1 | M.F), data=exp2.dat)


#Summary statistics for the function above
summary(exp2.survival)
```


```{r}
#Plotting the surviving number of fish from experiment 2

SurvCurvExp2<-ggplot(exp.two, aes(day,surv,color=study_group))+
  geom_ribbon(aes(ymin=lci, ymax=uci,fill = study_group), alpha=0.125, linetype='blank')+
  geom_line(size=0.5,linetype = "dashed")+ 
  geom_point(size=2)+
  theme_bw()+
  ylim(0,1)+
  theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank())

SurvCurvExp2
```


#Question 6

How do the statistical outputs of the first two experiments compare?  What about the survivorship graphs themselves?

The hazard ratio for experiment 2 has significantly decreased for the treatment effect down to a hazard ratio of 2.7 as compared to 39 in experiment 1.  However, our p value shows that treatment is overall not significant and Male/Female is also not significant.  In looking at the survival graphs, we can see that around half of the treatment fish managed to survive through this experiment period as compared to none in the first experiment.  Another big difference is the loss of more control fish in experiment 2, as only two fish died in experiment 1 compared to 5 in this experiment.


#Experiment 3 - Attached tags with no dart tag
```{r}

##ON YOUR OWN CREATE THE MODEL FOR EXPERIMENT 3 BELOW USING EXAMPLES ABOVE

#Create model to measure the fixed effects impact on fish from experiment 3

exp3.cph <- coxph(Surv(time,status) ~ Treatment + M.F, data = exp3.dat)


#Retrieving summary statistics from the model above

summary(exp3.cph)
```


```{r}

##ON YOUR OWN RUN THE SURVFIT FUNCTION TO ACHIEVE DATA POINTS AND CONFIDENCE INTERVALS FOR EXPERIMENT 3 BELOW

#Place function below

exp3.survival <- survfit(Surv(time, status) ~ Treatment + (1 | M.F), data=exp3.dat)

#Run summary statistics on the function

summary(exp3.survival)
```


```{r}

SurvCurvExp3<-ggplot(exp.three, aes(day,surv,color=study_group))+
  geom_ribbon(aes(ymin=lci, ymax=uci,fill = study_group), alpha=0.125, linetype='blank')+
  geom_line(size=0.5,linetype = "dashed")+ 
  geom_point(size=2)+
  theme_bw()+
  ylim(0,1)+
  theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank())

SurvCurvExp3
```


#Question 7
Were the same variables found to be significant in experiment 3 as they were in the first two experiments

No, this time Male/Female was found to be statistically significant instead of treatment being the significant effect.  The hazard ratio for Male/Female is almost the same as treatment for experiment 2, but our p value tells us that this effect is statistically significant for this experiment.  


#Question 8
How does the survivorship graph for experiment 3 vary from the graphs of the first two experiments?  

The survival graph for experiment 3 shows that there was an almost equal amount of death among both treatment and control groups which is different than the first two experiments. The authors discuss possible reasons for this increase, but now would be a good time for class discussion to see what they took away from the graphs.  


#Question 9
In looking at the summary statistics, we can see that each experiment provides different results on which effects are significant to the survival of fish.  With those results, what would you say would be the main takeaways from this study for fisheries biologists that are interested in this study?  Could you pick one method or would further experimentation need to be conducted? 

The main takeaway from this study would be that it appears that the separate tagging method would not be a wise method to use if scientists are interested in doing this kind of study and should instead focus on the combined tagging methods from experiments 2&3.  While it appears that combined tagging provides better survival for fish, more testing will need to be done as they state throughout the paper before deciding upon one method and factors varied among each experiment in this study.