---
title: "The Importance of Street Trees to Urban Avifauna"
output: html_document
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

#For this exercise, we will be exploring street tree and bird survey data from Los Angeles County, CA. The sites that were surveyed for this data were broken down into three socioeconomic groupings; low-income, medium-income, and high-income. Using an edited version of this data, we will learn how to run a generalized linear model with negative binomial distribution to assess relationships between socioeconomic status, tree density, and bird density. Then, we will use ggplot2 to visualize these relationships. Analyzing survey data of street trees and avian abundance is important for biodiversity conservation in urban spaces. As we run these models, think about how our results may influence the decisions of city planners.


#Let's begin the lesson by loading in the necessary libraries

```{r}
library(tidyverse)
library(ggplot2)
library(MASS)

#If this code chunk did not work, then you may need to install the packages by running the function install.packages() in the console.
```

#Next, we can download our data. Make sure that the name of your file matches the name in the code chunk below. 
```{r}
data <- read.csv("Tree_Bird_Data.csv")
view(data) 

#Look at the data and take note of the column titles.

#If this code chunk did not work, then you may need to install the packages by running the function install.packages() in the console.
```


#Let's explore the data to get a better idea of what we are working with.

#As mentioned above, we will be working with a type of generalized linear model to assess the relationship between tree and bird densities. Typically, we use these models when our count data do not fit the assumption of normality. We can check the normality of our response variable (total bird density) by running the chunk below.
```{r}
#Check for normality of response variable using a histogram and a Shapiro-Wilkes normality test.

hist(data$Total_Bird_Density, main = NULL, xlab = "Total Bird Density")   
shapiro.test(data$Total_Bird_Density)

#Note: if the p value is less than 0.05 then there is a low probability that the data is distributed normally. 

```

#We see that we are working with count data that does not follow a normal distribution. This is where a generalized linear model with Poisson distribution can be useful.
```{r}
tree.bird.glm <- glm(Total_Bird_Density ~ Tree_Density, family = poisson(link = "log"), data = data)
summary(tree.bird.glm)

#At this point, it is important to consider all aspects of the output before continuing with this model. 
#Of particular interest is the Residual deviance vs the degrees of freedom values. These values are important to assess the fit of the model. For a well-fitting model, the residual of deviance should be close to the degrees of freedom.
#If the residual deviance is far greater than degrees of freedom then overdispersion exists, and we need to consider a different model.
```
#Question 1: What are the values for residual deviance and the associated degrees of freedom? Is overdispersion present in this data? Why might this be an issue? 


#Another way to check for overdispersion is to compare the mean and variance of our response variable: Total Bird Density.
```{r}
mean(data$Total_Bird_Density);var(data$Total_Bird_Density) 

#Remember if the mean equals the variance, then there is no overdispersion.
```

#We can also check the variance ratio (aka coefficient of dispersion). This ratio is used to measure how dispersed or clustered set of events or objects are, in a given interval of time or space.
```{r}
var(data$Total_Bird_Density)/mean(data$Total_Bird_Density)

#If there is no overdispersion present then variance ratio should be equal to 1. If there is over-dispersion present, then the variance ratio will be greater than 1.
```
#Now that we have established that overdispersion is present in our data, we can run a negative binomial regression using the glm.nb function from the MASS library. 
```{r}
#Since log is the default link in this function, it does not need to be expressed. 

#tree.bird.nb <- glm.nb(Total_Bird_Density ~ Tree_Density, link = "log", data = data) 

#Therefore, the line of code above would give the same output as the line below. Try it out if you'd like.

tree.bird.nb <- glm.nb(Total_Bird_Density ~ Tree_Density, data = data)

summary(tree.bird.nb)  

#Theta is the dispersion parameter. This is a shape parameter. The larger value of theta indicates more spread (dispersion) in the data than a smaller value.
```
#Question 2: Compare the negative binomial model to the Poisson. How is the negative binomial model accounting for the overdispersion present in the data? (Hint: consider the coefficients) Is this model a better fit? Why? 


**Building a graph** 

#We will use expand_grid function to generate a set of new x-values. Remember to label the single column name as in the original dataset. In this case, our predictor variable is called Tree_Density. We will start by using the min and max functions to set the limits of the new x-values.
```{r}
#These functions set the limits of our new x values

min.tree.density <- min(data$Tree_Density)       
max.tree.density <- max(data$Tree_Density)
```

```{r}
#This function generates the new x values. Notice that we use the Tree_Density variable to name the column (same as the original data).

new.x <- expand.grid(Tree_Density =
seq(min.tree.density, max.tree.density, length= 36)) #36 is the length of our data
```

#We can now use the new x's with the predict function. We use predict() with three arguements: our model, the value for newdata, and a request for standard errors.
```{r}

# Generate fits and standard errors at new.x values.
new.y = predict(tree.bird.nb, newdata=new.x, se.fit = TRUE) 
new.y = data.frame(new.y)    #converts output to a data frame

# Check it!
head(new.y)
```

#Now we need to combine the new x and y values together by putting them in the same data frame.
```{r}
addThese <- data.frame(new.x, new.y) 

# check it!
head(addThese)
```

#We need to use the mutate() function to simultaneously calculate the confidence intervals (CIs) and include them in our data frame. Since we want predictions of the actual density, we also need to apply the inverse of the logarithm to any y-axis variables.
```{r}
# exponentiate the density and CI's to get back the 'response' scale (we are essentially unlogging them)
# Note: Using the mutate function allows us to name the fit output. 
#For our purposes, we are assuming a 95% CIs which uses the value 1.96 when calculating standard error.

addThese <- mutate(addThese,
                   Total_Bird_Density = exp(fit),
lwr = exp(fit - 1.96 * se.fit),                                                                             
upr = exp(fit + 1.96 * se.fit))

 # check it!
head(addThese)

```
#Finally, we can plot the relationship between street tree density and total bird density with our new values using ggplot2.
```{r}
ggplot(data, aes(x= Tree_Density, y= Total_Bird_Density )) + 
    geom_point(aes(color = Socioeconomic_Status ), size=4) +
    geom_smooth(data = addThese, aes(ymin = lwr, ymax = upr), stat = 'identity')+    #adds the fits and CIs created above  
    labs(color = "Socioeconomic Group") +
    scale_color_manual( breaks=c('low', 'medium', 'high'), values=c("#9de3ff", "#3b94ff", "#2310ff")) +
    theme_classic() +
  xlab("Street Tree Density (trees/km)") +
  ylab("Total Bird Density (birds/km)")

#The closer the points are to the line the stronger the relationship between the two variables.
```
#Question 3: What does the figure above indicate about the relationship between street trees and bird density? Does there appear to be a pattern associated with socioeconomic groups?



#Now that we have assessed the relationship between street tree and bird density. It will be useful to observe how street tree density and bird density appear to be influenced by socioeconomic grouping specifically.

#Using ggplot2 we can build boxplots to visualize the distribution of data between the three socioeconomic groupings.
```{r}
#First, lets examine how the socioeconomic groupings appear to effect street tree density.


  ggplot(data, aes(x=Socioeconomic_Status, y= Tree_Density, fill= Socioeconomic_Status)) + 
   scale_x_discrete(limits = c("low", "medium", "high")) +         #This line of code is to change the order of the x-axis units
  scale_fill_manual(values=c("#231123", "#558C8C", "#E8DB7D")) +   #Specifies colors of boxplots based on x-axis units
    geom_boxplot() +
    xlab("Socioeconomic Group") +
    theme_classic()+ 
    theme(legend.position="none") +                                #Gets rid of an unnecessary legend
    xlab("Socioeconomic Group") +
    ylab("Street Tree Density (trees/km)")+
    ggtitle("Distribution of Trees Among Socioeconomic Grouping")



#Boxplots are used to look at the distribution of data. To compare the boxplots, look at the median line. If the median line of a box plot lies outside of the box of a comparison box plot, then there is likely to be a difference between the two groups. 
```

```{r}
#Now, lets examine how the socioeconomic groupings appear to effect total bird density. 
#Try creating a boxplot of this relationship using the above code as a reference. 




#To create fun color palettes, copy and paste the link into the address bar of your web browser: https://coolors.co/ef798a-f7a9a8-7d82b8-613f75-e5c3d1 

#This should take you to Coolors Color Palette Generator website.
```

#Question 4: Interpret the boxplots depicting street tree density and total bird density. Do you think that there is support for the luxury effect hypothesis in this study? 



#Question 5: How could the results from this model be used to guide management of urban forests and conservation of biodiversity? Consider the output from the model and the observations from the plots in your answer.



#Question 6: What are other ways that we can support conservation of biodiversity in (or around) urban spaces?