--- title: "The Importance of Green Spaces and Native Plants to Urban Avifauna: A Lesson on how urban residential yards can support birds during their annual cycle" author: "Lesson By: Nina Brundle" date: 'Created on: 2023-05-06' output: html_document --- **QUESTIONS FROM THE READING** First take the time to read the lesson PDF and the focal paper, by doing so will prepare you to answer these questions and work your way through the R analyses. LESSON BACKGROUND INFORMATION QUESTIONS: 1. How does urbanization impact biodiversity? 2. Based on what you learned about island biogeography, envision residential yards as "islands". How would you expect the yards in this study to reflect this theory? 3. How do native trees and shrubs support birds during their annual cycle? METHODS FILL IN THE BLANK a. In this observational study a paired sampling method was used to compare ____ yards b. Yards were considered native if they contained greater than ___% native vegetation c. Bird surveys were conducted from 6am – 11:30 am during the months of _______ 2020 – _______ 2021, and lasted ___ minutes **Now Lets Begin the R portion of the lesson!** ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) #Packages required for this lesson (Note: Packages only need to be installed once) #to install, simply remove the "#" and run the code to install (be sure to remove the line of code or add the # back so that it doesn't install every time you run the code) #install.packages("tidyverse") library(tidyverse) #install.packages("stats") library(stats) ``` First we will import and look at the data ```{r, Bring in the Data} ##Read in the data using the read.csv() function. Data <- read.csv("RegressionData.csv") ##Look at data. the head() function will allow you to see the first several rows of the dataframe. head(Data) ``` Next we will subset the data we will need to conduct the analyses we are trying to recreate from the focal paper. When you look at the Data you will notice there are several columns of data we will not need for this lesson. In the code chunk below you will learn how to subset, rename and group the data. To explore how Los Angeles residential yards may or may not follow the theory of island biogeography, we are only interested in the following columns: Type of yard (column name = Type), Size of yard in meters (Size..m.), bird richness (Cumulative.Richness), and distance the yards are from urban green spaces in meters (Distance.urban..m.) such as parks, community gardens etc. We will be using these four columns of data because in this lesson we are interested in the relationship between birds in yards and both the size of the yard and distance to green spaces and how they may predict richness. To learn more about the other columns of data please see the metadata file from the authors included in the lesson folder. **WORK WITH THE DATA** ```{r, Work with the data} ##now we will use a pipe operator ( %>% ) to subset the data we will want to use in our analyses. R pipes are a way to run multiple operations together in a concise way, which can then be made into a new dataframe without affecting the original dataframe. Data %>% dplyr::select(Type, Size..m., Cumulative.Richness, Distance.urban..m.) %>% ##this selects only the columns of data of interest rename(YardSize.m = Size..m., Bird.Richness = Cumulative.Richness, DistGreenSpace.m = Distance.urban..m.) %>% ##This is how you rename column names group_by(Type) -> Yards #Since objective 2 assessed unpaired native yards in these analyses, we will rename them so all unpaired native yards are called Native Yards[Yards == "native_unpaired"] <- "Native" ##the str() function provides you with the internal structure of the dataframe and will give you information about the contents such as if its a character, number, integer, etc. We can also easily check to see if our columns were correctly renamed. str(Yards) ##Now lets look at the data we will be using in the following steps with the plot function. By plotting your data you can visualize how the data is behaving and it could provide insight that could lead to predictions. plot(Yards$YardSize.m, Yards$Bird.Richness) #These variables are used in Figure 2D which we will recreate later in the lesson. We can see here that avian richness appears to increase as the size of the yard increases. ``` **TEST FOR NORMALITY IN THE DATA** First we need to check the normality of the response variable, and if the data does not follow the normal distribution we will need to fit the data to a Generalized Linear Model. ```{r Test for Data Normality} #Check for normality of response variable using a Shapiro-Wilk Normality Test - #if the p-value is greater than 0.05 we can conclude that the data are normally distributed. # The shapiro.test() is part of the stats package which was installed above shapiro.test(Yards$Bird.Richness) ### We can also visualize this by plotting histograms of the distribution of the response variable hist(Yards$Bird.Richness) #a normal distribution will be shaped like a bell curve, and a non-normal distribution will be skewed. ``` **QUESTION:** 4. What does the Shapiro-Wilk Normality Test and visualizing a histogram of the response variable tell you about the distribution of this data? **FITTING A GENERALIZED LINEAR MODEL** Now we will fit a Generalized Linear Model to our formula: Bird Richness(response variable) as a function of Yard Size (predictor variable). The default family is the Gaussian family and assumes that the response variable follows a normal distribution, and it is commonly used for continuous response variables. While this isn't the appropriate GLM to fit to our data, we can use the comparison between the models to assess why a GLM with a Poisson distribution may more appropriate to analyze non-normal count data. ```{r Fiting a Generalized Linear Model} ## we will used the glm() function where the response variable is a function (~) of the dependent variable ## Response Variable = y and Predictor Variable = x ## Model.Name <- glm(Response Variable ~ Predictor Variable, data = dataframe) model.glm <-glm(Bird.Richness ~ YardSize.m, data = Yards) #The summary function will provide the GLM summary statistical output summary(model.glm) #the AIC (Akaike Information Criterion) is a measure of the goodness of fit of a statistical model. *take note of this value, you will need it later* ``` **GENERALIZED LINEAR MODEL FOR COUNT DATA** Now since we know that the data doesn't follow a normal distribution and the response variable consists of count data we will fit a Generalized Linear Model with a Poisson Distribution. ```{r Fiting a Generalized Linear Model with Poisson Distribution} #Now we will fit a GLM where the response variable (y) is a function of the predictor variable (x) AND the family is Poisson. #The formula will be set up below in this format: Model.Name <- glm(y ~ x, data = The.Data, family = "poisson") model.poisson <- glm(Bird.Richness ~ YardSize.m, data = Yards, family = "poisson") ## When we run model the two summaries with the summary() function, we can assess the level of fitness between the model.glm and the model.poisson. The model with the lower AIC values indicates a better fitting model. Its important to note that you can only compare AIC values that are fit to the same data using the same likelihood function. For example, you cant compare linear regression and logistic regression models. summary(model.glm) #Bird.Richness ~ YardSize.m with a gaussian distribution summary(model.poisson) #Bird.Richness ~ YardSize.m with a Poisson distribution ``` **QUESTION:** 5. Based on the AIC values which Model is most fit? What else does the best fit model summary suggest about the Species-Area Relationship? **LETS PLOT THE MODELS WE CREATED ABOVE** Now that we know the best fit model, lets plot them both to see if visually there are any differences. ```{r Plotting Generalized Linear Models} ## The ggplot2 package (which is part of the tidyverse package we installed above) is used to create data visualizations that can be highly customized (learn more about this package here https://ggplot2.tidyverse.org/) ## to plot the data in our model.glm we will use the code below. ggplot(Yards, aes(YardSize.m, Bird.Richness))+ # By adding "+" after each line of code will add additional elements to your plot theme_classic()+ #for more information about pre-made themes visit https://ggplot2.tidyverse.org/reference/ggtheme.html geom_point(aes(colour = Type), size =4 )+ # you can symbolize the data points by a variable, for example the authors chose to symbolize the type of yard, and choose a size. stat_smooth(method="glm", method.args = "gaussian", size = 1, color = "black")+ #this function allows you to add lines from a fitted regression to your plot. model.glm the default family is gaussian. scale_colour_manual(breaks = c("Native", "Conventional"), values=c("Native" = "green", "Conventional" = "purple"))+ #this allows you to create your own color palette for the data points which we symbolized via yard Type (See this site for some color options http://sape.inf.usi.ch/quick-reference/ggplot2/colour . labs(x="Yard Size (m)", y = "Bird Richness") + #This will allow you to customize your axis labels theme(legend.position = "right") #a legend can be added to plots, and you can change the position of its placement. For example other placements options could be, top, bottom or left. ## Now lets re-create figure 2D from the focal paper. ggplot(Yards, aes(YardSize.m, Bird.Richness))+ theme_classic()+ geom_point(aes(colour = Type), size = 6)+ #the authors symbolized the points by yard type stat_smooth(method="glm", method.args = "poisson", size = 1, color = "black")+ # Note that for model.poisson we used a Poisson distribution and specifying this will plot a log transformed regression line. scale_colour_manual(breaks = c("Native", "Conventional"), values=c("Native" = "darkblue", "Conventional" = "orange"))+ labs(x="Yard Size (m)", y = "Bird Richness") + theme(legend.position = "right") ``` **QUESTION:** 6. Other than color and point size, what is different between the two plots you created? (please explain your reasoning) *hint: consider why the regression lines are different. **NOW LETS EXPLORE HOW THE ISLAND BIOGEOGRAPHY DISTANCE EFFECTS MAY PREDICT BIRD RICHNESS IN LOS ANGELES YARDS** To assess the question, "Are there more birds in yards closer to urban greenspaces?" we will fit a GLM and plot the data *Note, the authors were interested in abundance data to ask this ask this questions and we will be working with the bird richness data instead* ```{r Create your own Generalized Linear Model and Plot it} ##With what you learned above, fit a GLM for a Poisson regression that helps us answer the question: # IS BIRD RICHNESS GREATER IN YARDS CLOSER TO URBAN GREEN SPACES? Hint: the Yards dataframe already has the variables selected for this model, but try running the code head(Yards) if you're you need a hint on the response and predictor variables ## FIT A GLM WITH POISSON DISTRIBUTION HERE ## Use the Summary function to interpret the results of the model so you can answer questions below ## PLOT HERE (Refer back to the code above to plot the relationship between richness and distance the yard is from urban green spaces.) ``` **QUESTIONS:** 7. What does the model summary and figure you created on your own tell you about the relationship between bird richness and distance to urban green spaces? 8. Overall, based on the findings of the Focal Paper, what you learned in the PDF lesson and this R analysis, does it appear that residential yards act as islands in a sea of urbanization? 9. How can native landscaped yards support urban avifauna and their communities, and do you think native plant landscaping is a useful conservation strategy, why or why not?