---
title: "The Importance of Green Spaces and Native Plants to Urban Avifauna: A Lesson on how urban residential yards can support birds during their annual cycle - INSTRUCTOR VERSION"
author: "Lesson By: Nina Brundle"
date: 'Created on: 2023-05-06'
output: html_document
---



**QUESTIONS FROM THE READING**

First take the time to read the lesson PDF and the focal paper, by doing so will prepare you to answer these questions and work your way through the R analyses.



LESSON BACKGROUND INFORMATION QUESTIONS:

1.  How does urbanization impact biodiversity?

**Urbanization causes habitat fragmentation, which can lead to declines in species diversity.  Landcover change and habitat degradation are other impacts which can lead to opportunities for invasive species become established and potentially out compete native species.**

2.  Based on what you learned about island biogeography, envision residential yards as "islands". How would you expect the yards in this study to      reflect this theory?

**If the yard sites in this study follow the theory of island biogeography then it would be expected that the yards closer to urban green spaces would have higher bird richness. The size of the yard should predict species richness, meaning that larger yards are expected to have higher species richness.**

3.  How do native trees and shrubs support birds during their annual cycle?

**Native trees and shrubs provide nesting materials, foraging habitat, and shelter to birds.  Leaf litter also provides habitat for insects which is a food source for many birds.**

METHODS FILL IN THE BLANK 

a. In this observational study a paired sampling method was used to compare __32_ yards 

b. Yards were considered native if they contained greater than __79_% native vegetation

c. Bird surveys were conducted from 6am – 11:30 am during the months of __October__ 2020 – __March__ 2021, and lasted __20__ minutes



**Now Lets Begin the R portion of the lesson!**

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)

#Packages required for this lesson (Note: Packages only need to be installed once) 
#to install, simply remove the "#" and run the code to install (be sure to remove the line of code or add the # back so that it doesn't install every time you run the code)

#install.packages("tidyverse")
library(tidyverse)
#install.packages("stats")
library(stats)

```

First we will import and look at the data

```{r, Bring in the Data}
##Read in the data using the read.csv() function.  
Data <- read.csv("RegressionData.csv")

##Look at data.  the head() function will allow you to see the first several rows of the dataframe.
head(Data)
```

Next we will subset the data we will need to conduct the analyses we are trying to recreate from the focal paper.

When you look at the Data you will notice there are several columns of data we will not need for this lesson. In the code chunk below you will learn how to subset, rename and group the data. To explore how Los Angeles residential yards may or may not follow the theory of island biogeography, we are only interested in the following columns: Type of yard (column name = Type), Size of yard in meters (Size..m.), bird richness (Cumulative.Richness), and distance the yards are from urban green spaces in meters (Distance.urban..m.) such as parks, community gardens etc. We will be using these four columns of data because in this lesson we are interested in the relationship between birds in yards and both the size of the yard and distance to green spaces and how they may predict richness. To learn more about the other columns of data please see the metadata file from the authors included in the lesson folder.

**WORK WITH THE DATA**

```{r, Work with the data}
##now we will use a pipe operator ( %>% ) to subset the data we will want to use in our analyses. R pipes are a way to run multiple operations together in a concise way, which can then be made into a new dataframe without affecting the original dataframe.

Data %>% 
  dplyr::select(Type, Size..m., Cumulative.Richness, Distance.urban..m.) %>% ##this selects only the columns of data of interest 
  rename(YardSize.m = Size..m., 
         Bird.Richness = Cumulative.Richness, 
         DistGreenSpace.m = Distance.urban..m.) %>% ##This is how you rename column names
         group_by(Type) -> Yards


#Since objective 2 assessed unpaired native yards in these analyses, we will rename them so all unpaired native yards are called Native
Yards[Yards == "native_unpaired"] <- "Native"


##the str() function provides you with the internal structure of the dataframe and will give you information about the contents such as if its a character, number, integer, etc.  We can also easily check to see if our columns were correctly renamed.
str(Yards)


##Now lets look at the data we will be using in the following steps with the plot function. By plotting your data you can visualize how the data is behaving and it could provide insight that could lead to predictions.

plot(Yards$YardSize.m, Yards$Bird.Richness) #These variables are used in Figure 2D which we will recreate later in the lesson. We can see here that avian richness appears to increase as the size of the yard increases.


```

**TEST FOR NORMALITY IN THE DATA**

First we need to check the normality of the response variable, and if the data does not follow the normal distribution we will need to fit the data to a Generalized Linear Model.

```{r Test for Data Normality}
#Check for normality of response variable using a Shapiro-Wilk Normality Test - #if the p-value is greater than 0.05 we can conclude that the data are normally distributed.

# The shapiro.test() is part of the stats package which was installed above
shapiro.test(Yards$Bird.Richness)

### We can also visualize this by plotting histograms of the distribution of the response variable

hist(Yards$Bird.Richness) #a normal distribution will be shaped like a bell curve, and a non-normal distribution will be skewed.

```

**QUESTION:**
 
4. What does the Shapiro-Wilk Normality Test and visualizing a histogram of the response variable tell you about the distribution of this data?
 
 
 **After running a Sharpio Wilks Normality Test, we can see that the p-value is 0.00767 and because this is less than 0.05 we can conclude that the data is not normally distributed.  This non-normal distribution is confirmed after visualizing a histogram of the response variable, whIch is skewed, suggesting this data is indeed not normally distributed, which would tell use we will been to fit a GLM to analyze this data.**
 
 
 
 
 

**FITTING A GENERALIZED LINEAR MODEL:**


Now we will fit a Generalized Linear Model to our formula: Bird Richness(response variable) as a function of Yard Size (predictor variable). The default family is the Gaussian family and assumes that the response variable follows a normal distribution, and it is commonly used for continuous response variables.  While this isn't the appropriate GLM to fit to our data, we can use the comparison between the models to assess why a GLM with a Poisson distribution may more appropriate to analyze non-normal count data.
  


```{r Fiting a Generalized Linear Model}
## we will used the glm() function where the response variable is a function (~) of the dependent variable
## Response Variable = y and Predictor Variable = x
##  Model.Name <- glm(Response Variable ~ Predictor Variable, data = dataframe)

model.glm <-glm(Bird.Richness ~ YardSize.m, data = Yards)

#The summary function will provide the GLM summary statistical output
summary(model.glm)

#the AIC (Akaike Information Criterion) is a measure of the goodness of fit of a statistical model. *take note of this value, you will need it later*

```
 
 

**GENERALIZED LINEAR MODEL FOR COUNT DATA** 
 
Now since we know that the data doesn't follow a normal distribution and the response variable consists of count data we will fit a Generalized Linear Model with a Poisson Distribution.
 
```{r Fiting a Generalized Linear Model with Poisson Distribution} 

#Now we will fit a GLM where the response variable (y) is a function of the predictor variable (x) AND the family is Poisson.
#The formula will be set up below in this format: Model.Name <- glm(y ~ x, data = The.Data, family = "poisson")

model.poisson <- glm(Bird.Richness ~ YardSize.m, data = Yards, family = "poisson")


## When we run model the two summaries with the summary() function, we can assess the level of fitness between the model.glm and the model.poisson. The model with the lower AIC values indicates a better fitting model. Its important to note that you can only compare AIC values that are fit to the same data using the same likelihood function.  For example, you cant compare linear regression and logistic regression models.



summary(model.glm) #Bird.Richness ~ YardSize.m with a gaussian distribution 
summary(model.poisson) #Bird.Richness ~ YardSize.m with a Poisson distribution
```
**QUESTION:**

5. Based on the AIC values which Model is most fit? What else does the best fit model summary suggest about the Species-Area Relationship?

**We can see that the AIC of the model.glm (GLM with a default Gaussian distribution) is 183.7, and after fitting a GLM with a Poission distribution the AIC value is 177.  This improvement in AIC makes sense because a Poission distribution is ideal fit to count data that doesn’t follow a normal distribution.  We can also see that the p-value improves from 0.004  to  0.0009 in the Poisson model. The p-values in both models are significant, but the Poisson GLM is better equipped to handle this type of data and it makes sense that the significance in the relationship between bird richness and yard size was greater in this model.**





**LETS PLOT THE MODELS WE CREATED ABOVE**

Now that we know the best fit model, lets plot them both to see if visually there are any differences.


```{r Plotting Generalized Linear Models}
## The ggplot2 package (which is part of the tidyverse package we installed above) is used to create data visualizations that can be highly customized (learn more about this package here https://ggplot2.tidyverse.org/)




## to plot the data in our model.glm we will use the code below.

ggplot(Yards, aes(YardSize.m, Bird.Richness))+ # By adding "+" after each line of code will add additional elements to your plot
  
  theme_classic()+  #for more information about pre-made themes visit https://ggplot2.tidyverse.org/reference/ggtheme.html
  
  geom_point(aes(colour = Type), size =4 )+  # you can symbolize the data points by a variable, for example the authors chose to symbolize the type of yard, and choose a size.
  
  stat_smooth(method="glm", method.args = "gaussian", size = 1, color = "black")+ #this function allows you to add lines from a fitted regression to your plot. model.glm the default family is gaussian. 
  
  scale_colour_manual(breaks = c("Native", "Conventional"), values=c("Native" = "green", "Conventional" = "purple"))+ #this allows you to create your own color palette for the data points which we symbolized via yard Type (See this site for some color options http://sape.inf.usi.ch/quick-reference/ggplot2/colour .
  
  labs(x="Yard Size (m)", y = "Bird Richness") + #This will allow you to customize your axis labels
  
  theme(legend.position = "right") #a legend can be added to plots, and you can change the position of its placement. For example other placements options could be, top, bottom or left.





## Now lets re-create figure 2D from the focal paper.
ggplot(Yards, aes(YardSize.m, Bird.Richness))+
  
  theme_classic()+  
  
  geom_point(aes(colour = Type), size = 6)+  #the authors symbolized the points by yard type
  
  stat_smooth(method="glm", method.args = "poisson", size = 1, color = "black")+ # Note that for model.poisson we used a Poisson distribution and specifying this will plot a log transformed regression line.
  
  scale_colour_manual(breaks = c("Native", "Conventional"), values=c("Native" = "darkblue", "Conventional" = "orange"))+ 
  
  labs(x="Yard Size (m)", y = "Bird Richness") + 
  
  theme(legend.position = "right") 

```




**QUESTION:**

6. Other than color and point size, what is different between the two plots you created? (please explain your reasoning) *hint: consider why the      regression lines are different.

**The model.glm plots a linear regression line (Figure A), while the model.poisson (Figure B) plots the regression line using a Poisson method that log transforms the data displaying a curved line. This visualization was chosen because it more accurately represents the data that was fit to the model.poisson GLM.**




**NOW LETS EXPLORE HOW THE ISLAND BIOGEOGRAPHY DISTANCE EFFECTS MAY PREDICT BIRD RICHNESS IN LOS ANGELES YARDS**

To assess the question, "Are there more birds in yards closer to urban greenspaces?" we will fit a GLM and plot the data 
*Note, the authors were interested in abundance data to ask this ask this questions and we will be working with the bird richness data instead*

```{r Create your own Generalized Linear Model and Plot it}

##With what you learned above, fit a GLM for a Poisson regression that helps us answer the question: 
# IS BIRD RICHNESS GREATER IN YARDS CLOSER TO URBAN GREEN SPACES?  Hint: the Yards dataframe already has the variables selected for this model, but try running the code head(Yards) if you're you need a hint on the response and predictor variables


## FIT A GLM WITH POISSON DISTRIBUTION HERE

model.Distance <- glm(Bird.Richness ~ DistGreenSpace.m, data = Yards, family = "poisson")


## Use the Summary function to interpret the results of the model so you can answer questions below

summary(model.Distance)


## PLOT HERE (Refer back to the code above to plot the relationship between richness and distance the yard is from urban green spaces.)


ggplot(Yards, aes(DistGreenSpace.m, Bird.Richness))+
  
  theme_classic()+  
  
  geom_point(aes(colour = Type), size = 6)+ 
  
  stat_smooth(method="glm", method.args = "poisson", size = 1, color = "black")+ 
  
  scale_colour_manual(breaks = c("Native", "Conventional"), values=c("Native" = "darkblue", "Conventional" = "orange"))+ 
  
  labs(x="Distance of Yards From Urban Green Spaces (m)", y = "Bird Richness") + 
  
  theme(legend.position = "right") 


```

**QUESTIONS:**

7.  What does the model summary and figure you created on your own tell you about the relationship between bird richness and distance to urban         green spaces?

**The model summary suggests that there is a negative relationship with avian richness and distance to urban green spaces suggesting that there is higher richness in yards that are closer to urban green spaces and this relationship has a p-value that is 0.005 indicating its highly statistically significant.**


8. Overall, based on the findings of the Focal Paper, what you learned in the PDF lesson and this R analysis, does it appear that residential yards    act as islands in a sea of urbanization? 

**Based on the two analyses we conducted in this lesson; we can conclude that the yards in this study indeed follow the theory of island biogeography.  We can claim this because the Species-Area relationship suggests that the larger the islands (in our case yard) it’s likely that there is increased species diversity. The yards in this study also follow the distance effect which indicates that islands closer to the mainland (in our case urban green spaces like parks) will have increased diversity. It was found in this study that larger yards and yards closer to urban green spaces had higher bird abundance. **


9. How can native landscaped yards support urban avifauna and their communities, and do you think native plant landscaping is a useful conservation strategy, why or why not?

**Native landscaped yards are generally more complex in their structure and composition which provides an array of foraging habitat and variation of food items (berries, seeds and insects) and shelter.  These plants are adapted to their environments and require less water and maintenance while providing services such as excess rain runoff, which is why they are ideal choices in landscaping.**

NOW THAT YOU HAVE COMPLETED THIS LESSON, USE THE KNIT-TO-HTML OPTION SO THAT YOU CAN EASILY SUBMIT YOUR ASSIGNMENT TO YOUR INSTRUCTOR