---
title: 'MODELING CHANGES IN PROBABILITY OF PRESENCE OF MYOTIS LUCIFUGUS DUE TO WNS:
  AN INTRODUCTION TO LOGISTIC REGRESSION IN R'
author: "Mary Roseblock"
date: "2023-04-25"
output:
  html_document: default
  pdf_document: default
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```


```{r}
#First, we need to read in the packages we will be using in this lesson. If you don't already have these packages, they can be installed by using the install.packages("") function. 
library(dplyr)#Dplyr will be need to filter the data in the CSV.
library(ggplot2)#The ggplot2 package will be needed to create the final graph of our model.
library(arm)#Because our response variable is bounded between 0 and 1, the standard diagnostic plots won't work for us. We can use the binnedplot() function in the arm package to visually assess the fit of the model.  
```

```{r}
#Now we can read in the data.

bats<-read.csv("(5)bat_count_data.csv")

#This dataset is smaller than the original and only contains information for Myotis lucifugus (little brown bat), Myotis septentrionalis (northern long-eared bat), and Eptesicus fuscus (big brown bat). Additionally, the "detected" column has been added. Every site that had a count >0 was assigned a 1. Every site that had a count =0 was assigned a 0.
```

```{r}
# Let's check to see if the data pulled in correctly.
head(bats)
#There should be 6 columns: Species, Year, WNS, Site, Count, and detected. The WNS column indicates whether the year is pre, during, or post invasion. The count column shows how many bats were captured in mist nests at each site. The detected column shows if bats were present or absent at the sites. 

```

#Question 9 (NOTE questions 1-8 are in the PDF): What's your favorite site name? Hint* You can view all of the site names by clicking the spreadsheet icon next to lbb in the environment window. I am partial to Bat Bug Bank.


```{r}
#Now we need to filter out each species and create data frames for them. 
lbb<-filter(bats,Species=='Myotis lucifugus') 
nleb<-filter(bats, Species=='Myotis septentrionalis')
bbb<-filter(bats, Species=='Eptesicus fuscus')
```

```{r}
# We will walk-through fitting the model, running diagnostics, and interpreting the output summary for little brown bats. Students will go through the same steps on their own for northern long-eared bats, and big bag brown bats. 

# Let's confirm the data are bounded between 0 and 1 and see what the distribution looks like. 
ggplot(lbb, aes(detected)) +       
  geom_histogram()+
  ggtitle("Probability of Presence Histogram")

```


```{r}
#Let's fit the model. Logistic regression uses the glm function in base R, but we will specify the family and the link function so R knows to use the logistic model. 
modlbb <- glm(detected ~ Year, family = binomial(link = "logit"), data = lbb)
summary(modlbb)
```


```{r}
#We will use ggplot2 to graph the model we fit above. 
ggplot(lbb, aes(x=Year, y=detected)) + 
  geom_point() + 
   geom_jitter(width = 1, height = 0.1)+ #This allows us to see how the data are clustered. Notice there are fewer detections (values of 1) in 2019.
  geom_smooth(method="glm", method.args=list(family="binomial"), se = TRUE) +
  ylab("Probability of Presence") +
  xlab("Year") +
  ggtitle("The Probability of Presence of Little Brown Bats for a Given Year") 
```

```{r}
#Now that we have our model summary and our plot, let's see if the model is a good fit. The deviance residuals in the model summary appear to be roughly symmetric, and the model appears to fit relatively well. 

#Next we will create a binned residuals plots to visually assess if the model is a good fit.
x<-predict(modlbb) #This extracts the predicted values from the model.

y<-resid(modlbb) #This extracts the residuals from the model.

Figlbb<-binnedplot(x,y)#This creates the graph showing the average residuals as a function of the expected values. 

#The majority of the points fall within the grey lines, but it does seem to fit the values in the middle poorly. We will move forward with interpreting the summary output with this in mind. 
```


```{r}
#Now we can start interpreting the coefficients. The coefficients are in log-odds format. Lets convert them to odds to assess how much the odds of capturing a little brown bat are changing each year.

oddsratio<-exp(-0.28354)#This gives us the odds ratio from the coefficient. 

odds<-1-oddsratio#This gives us the odds. 

percent_change<-odds*100#This gives us the percent change in the odds. We can use the sign of the coefficient to interpret the direction of the change. 

percent_change
#Note that the decrease in odds is not the same as a decrease in probability. 
```

#Questions 10: By what percent did the odds decrease for 1 unit change in x? ~25%

```{r}
#We previously found the odds decreased by 25% for ever 1 unit increase in x. Let's find the change in probability of presence from 2006 to 2019 and see if tracks with the focal paper. 
odds2006<-exp(569.82594-0.2835*2006)#Plug in the y-intercept and subtract the year coefficient multiplied by the year. 
prob2006<-odds2006/(odds2006+1)
prob2006
```

```{r}
#Lets repeat the same steps for 2019.
odds2019<-exp(569.82594-0.2835*2019)
odds2019
prob2019<-odds2019/(odds2019+1)
```
#Our model predicts the probability of presence for little brown bats decreased from 0.755 to 0.077 from 2006 to 2019 period. The focal paper's model predicted the probability of presence went from 0.771 in 2006 to 0.064 in 2019.That's pretty close for our simplified model! 

#On your own: Find the probability of presence in 2003 and 2019 for northern long-eared bats using the nleb data frame. 0.878 in 2003 and 0.0717 in 2019.  

#Question 11: Do you get the same values as the focal paper for nelb (section 3.1 paragraph two)? The values are fairly close to the focal paper values of 0.868 in 2003 to 0.262 in 2019.

#Do the same for big brown bats. 
#Question 12: Did the probability increase or decrease for big brown bats? Does this match the focal paper (section 3.1 paragraph 3)? The probability of presence increased from 2003 to 2019. The probabilities are slightly different from the focal paper values, but they are still fairly close. 


```{r}
modnleb <- glm(detected ~ Year, family = binomial(link = "logit"), data = nleb)
summary(modnleb)
```
```{r}
oddsnleb2003<-exp(378.04889-0.18769*2003)
probnleb2003<-odds2003/(odds2003+1)
probnleb2003
```
```{r}
oddsnleb2019<-exp(378.04889-0.18769*2019)
probnleb2019<-odds2019/(odds2019+1)
probnleb2019
```

```{r}
modbbb <- glm(detected ~ Year, family = binomial(link = "logit"), data = bbb)
summary(modbbb)
```
```{r}
oddsbbb2003<-exp(-158.34749+0.07905*2003)
probbbb2003<-oddsbbb2003/(odds2003+1)
probbbb2003
```

```{r}
oddsbbb2019<-exp(-158.34749+0.07905*2019)
probbbb2019<-oddsbbb2019/(oddsbbb2019+1)
probbbb2019
```