--- title: "Establishing A Blue Economy - R Analysis" output: html_document --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) ``` --- ### For this lesson, we will be exploring resource availability and enabling conditions data from 168 territories across the globe. Each territory assessed in this data set was also assigned a developmental status of high, medium, or low. There will be a strong emphasis on graphics and demonstrating publication-ready data visualization. Using an edited version of this data set, we will learn how to run a simple linear regression to assess the relationship between resource availability and enabling conditions. Then, we will use the `ggplot2`, `ggpubr`, and `ggrepel` libraries to visualize this relationship in order to determine if there exists an effect on the capacity for establishing a blue economy. Analyzing this relationship is important as highlighting prime territories in which to establish blue economies is crucial in order to work towards more socially equitable, environmentally sustainable, and economically viable ocean industries. # Section I: Set Up --------------------------------------- ##### First, we will load in all appropriate libraries. Please note, if you receive an error message stating you do not have the necessary packages downloaded in order to load in any of these libraries, please type `install.packages("ggplot2")`, or the respective library you need into the `Console` panel and follow the prompts for installation. ```{r} # Loading in libraries library(readxl) # This library is used to read in our source data, housed on an Excel spreadsheet. library( ggplot2 ) # This library is for creating advanced plots. library( ggpubr ) # This library is used to create specific plot edits unavailable in base R. library( ggrepel ) # This library allows us to label our data points in graphics. ``` ##### Second, we will read in our data set and create a data frame. ```{r} # Examining our data fig_dat <- read_excel("QUBES_Lesson_Ocean_Data.xlsx") head( fig_dat ) # Please note that the file name of the Excel spreadsheet we are reading in may change during the download process. If you have an issue loading in the .xlsx file, check to make sure the file name in the command matches the downloaded file name in your directory. Also check to make sure your files are housed in the correct directory. # Students can refer to the Data Info Sheet provided in this QUBES lesson to better understand the data. ``` ##### Let's take a closer look at this data. Note the column names and types of data house in each column. For a complete run through of the data, please refer to the Data Info Sheet. ##### Next, we will create a histogram to observe the behavior of each of our variables. First, we will make a histogram from Resource Availability, our independent variable. ```{r} # Histogram of Resource Availability ggplot( fig_dat, aes( x = Resource.Availability ) ) + geom_histogram( aes ( y = ..density.. ), color = "black", fill = "green", bins = 20 ) + geom_density( color = "red", lwd = 1.5 ) + xlab( "Resource Availability" ) # You can play around with color here! Try out your favorite color scheme by typing simple color names into the code chunk above. ``` ##### A histogram is a graphical representation that organizes data by condensing a data series into an easily interpreted visual, taking many data points and grouping them into logical ranges or bins. In the histogram above, we added in a trend line to more easily understand how our data is behaving. ##### We also made some cosmetic edits using the `ggplot2` library. First, we assigned a color to outline our bins, "black", then we chose a color to fill in the data, in this case "green". Finally, we assigned the color "red" to the trendliine, edited the trendline weight (or thickness, "lwd"), and renamed the x-axis for a cleaner finish. ##### On Your Own: We will now make a histogram for Enabling Conditions, our response variable. Using our histogram for Resource Availability as a model, how would you produce a histogram for Enabling Conditions? Give it a try below! ```{r} # Histogram of Enabling Conditions ``` ##### Please note, as we are not examining variables from an experiment, but rather from an observational study, it is not entirely necessary to create histograms to check for normality, but being able to create and then accurrately interpret a histogram is a valuable foundational skill in statistics. # Section II: Graphics ##### The next step is to create a plot that displays Resource Availability vs. Enabling Conditions. The figure we create will be a take on Figure 4a in the focal paper (titled "Fig.4|Resource availability and enabling conditions scores for coastal territories" on page 400). Let's use the `ggplot2` again library to visualize our data! ```{r} # Creating the Basic Plot ggplot( fig_dat, aes( x = Resource.Availability, y = Enabling.Conditions )) + geom_point() -> Fig1 Fig1 ``` ##### We have just created a very basic scatter plot. Each dot represents on territory. While this graph allows us to see how our data points behave across our two axes, it is not very informative. Let's add some more elements to increase the amount of information this graph conveys. As we continue through this section, we will bring down our code from the previous chunk and make additions. ```{r} # Addition of Color Gradient and Data Point Shapes ggplot( fig_dat, aes( x = Resource.Availability, y = Enabling.Conditions, color = Blue.Economy )) + geom_point( aes( shape = Development_Category ), size = 4, alpha = 0.7, show.legend = T ) -> Fig1 Fig1 ``` ##### Take a look at how our plot changed! By adding the `color = Blue.Economy`, the color of each data point now corresponds to its Blue Economy Capacity score, and RStudio assigned a random color palette to display the gradient of these scores. However, the gradient does not currently make much sense: why would data points with lower Blue Economy Capacity scores be darker? Shouldn't it be the other way around? We will edit this in the next code chunk! Before moving on, though, note that we also changed the shapes of the data points with the `shape = Development_Category` argument - now we can see which data points are categorized as High, Medium, Low, or have an NA value for their developmental status. Next, we edited the data point size (`size = 4`) to ensure the plot would not look too crowded. The `alpha = 0.7` argument edited the transparency of each data point; since we have lots of values that overlap, it's important to bring the transparency down from 100% to ensure we are seeing the full scope of our data. Finally, `show.legend = T` told RStudio to display the legend on our figure. Let's keep going! ```{r} # Adjusting the Color Gradient ggplot( fig_dat, aes( x = Resource.Availability, y = Enabling.Conditions, color = Blue.Economy )) + geom_point( aes( shape = Development_Category ), size = 4, alpha = 0.7, show.legend = T ) + scale_color_gradient( name = "Blue Economy Capacity", low = "yellow", high = "royalblue") -> Fig1 Fig1 ``` ##### Great! By adding in the `scale_color_gradient` we were able to give the color gradient a more defined range, making the variation within our data easier to see. We also reformated the title of the color gradient legend. Remember, just as we did the histograms above, you can apply whichever color scheme you prefer! ```{r} # Legend and Graph Limits ggplot( fig_dat, aes( x = Resource.Availability, y = Enabling.Conditions, color = Blue.Economy )) + geom_point( aes( shape = Development_Category ), size = 4, alpha = 0.7, show.legend = T ) + scale_color_gradient( name = "Blue Economy Capacity", low = "yellow", high = "royalblue", guide = guide_colorbar( ticks = FALSE, barwidth = 10, barheight = 0.5, direction = "horizontal" ), limits = c( 0,101 ) ) -> Fig1 Fig1 ``` ##### We have now started making cosmetic changes to the figure. Up until this chunk we have been editing our figure to increase the amount of information disseminated by the graph (adding shapes to data points, applying a color scheme, etc.). Now, our edits will focus on creating a more visually appealing and professional looking graph. So what did this chunk accomplish? First, we removed the tick marks from our color gradient legend, changed the width and height of the bar, and modified the orientation. You'll also notice that the area of our plot changed - by imposed `limits` we are modifying the extent of the axes to better conform to our data points. ```{r} # Proper Axes Names ggplot( fig_dat, aes( x = Resource.Availability, y = Enabling.Conditions, color = Blue.Economy )) + geom_point( aes( shape = Development_Category ), size = 4, alpha = 0.7, show.legend = T ) + scale_color_gradient( name = "Blue Economy Capacity", low = "yellow", high = "royalblue", guide = guide_colorbar( ticks = FALSE, barwidth = 10, barheight = 0.5, direction = "horizontal" ), limits = c( 0,101 ) ) + labs( y = "Enabling conditions\n(equity, sustainability, viability)", x = "Resource Availability\n(ocean sectors)" ) -> Fig1 Fig1 ``` ##### Our x-axis and y-axis are now properly labeled. The `\n` command lets you create a second line in your axis title. ##### Now let's label our data points so we know where each territory falls on the Figure. ```{r} # Adding Territory Labels ggplot( fig_dat, aes( x = Resource.Availability, y = Enabling.Conditions, color = Blue.Economy )) + geom_point( aes( shape = Development_Category ), size = 4, alpha = 0.7, show.legend = T ) + scale_color_gradient( name = "Blue Economy Capacity", low = "yellow", high = "royalblue", guide = guide_colorbar( ticks = FALSE, barwidth = 10, barheight = 0.5, direction = "horizontal" ), limits = c( 0,101 ) ) + labs( y = "Enabling conditions\n(equity, sustainability, viability)", x = "Resource Availability\n(ocean sectors)" ) + geom_label_repel( aes( label= Territory ), data= fig_dat ) -> Fig1 Fig1 # The `ggplot2` library will only display a select amount of territory labels; given that we have such a large amount of data points, labeling every single point would overcrowd our image. Notice that the points being labeled seem to be on the outskirts of the figure. ``` ##### Theme adjustments are next. ```{r} # Theme: Element Positions and Axes Properties ggplot( fig_dat, aes( x = Resource.Availability, y = Enabling.Conditions, color = Blue.Economy )) + geom_point( aes( shape = Development_Category ), size = 4, alpha = 0.7, show.legend = T ) + scale_color_gradient( name = "Blue Economy Capacity", low = "yellow", high = "royalblue", guide = guide_colorbar( ticks = FALSE, barwidth = 10, barheight = 0.5, direction = "horizontal" ), limits = c( 0,101 ) ) + labs( y = "Enabling conditions\n(equity, sustainability, viability)", x = "Resource Availability\n(ocean sectors)" ) + geom_label_repel( aes( label= Territory ), data= fig_dat ) + theme(legend.position = "top", legend.text = element_text( size = 8 ), legend.title = element_text( size = 10, vjust = 1.1 ), axis.text.x = element_text( size = 8 ), axis.text.y = element_text( size = 8 ), axis.title.x = element_text( vjust = -1, size = 12 ), axis.title.y = element_text( vjust = 3, size = 12 ), axis.line = element_line(colour = "black" ) ) -> Fig1 Fig1 ``` ##### We are now editing the theme of our figure. Let's break down these new additions line by line. - `legend.position = "top"`: the legends are now located above the figure - `legend.text = element_text ( size = 8 )`: the size of the legend elements (the numeric values on the bottom of the gradient scale/Economy Capacity and the "High", "Medium", "Low", and "NA" keys on the Development Category legend) - `legend.title = element_text( size = 9 )`: the size of the two legend titles - `axis.text.x = element_text(size = 8 )`: the size of the numeric values along the x-axis (i.e. the 40, 60, 80, 100) - `axis.text.y = element_text(size = 8 )`: the size of the numeric values along the y-axis (i.e. the 40, 50, 60, 70, 80) - `axis.title.x = element_text( vjust = -1, size = 12 )`: the vertical justification and size of the x-axis title - `axis.title.y = element_text( vjust = 3, size = 12 )`: the vertical justification and size of the y-axis title ```{r} # Theme: Background Removal ggplot( fig_dat, aes( x = Resource.Availability, y = Enabling.Conditions, color = Blue.Economy )) + geom_point( aes( shape = Development_Category ), size = 4, alpha = 0.7, show.legend = T ) + scale_color_gradient( name = "Blue Economy Capacity", low = "yellow", high = "royalblue", guide = guide_colorbar( ticks = FALSE, barwidth = 10, barheight = 0.5, direction = "horizontal" ), limits = c( 0,101 ) ) + labs( y = "Enabling conditions\n(equity, sustainability, viability)", x = "Resource Availability\n(ocean sectors)" ) + geom_label_repel( aes( label= Territory ), data= fig_dat ) + theme(legend.position = "top", legend.text = element_text( size = 8 ), legend.title = element_text( size = 10, vjust = 1.1 ), axis.text.x = element_text( size = 8 ), axis.text.y = element_text( size = 8 ), axis.title.x = element_text( vjust = -1, size = 12 ), axis.title.y = element_text( vjust = 3, size = 12 ), axis.line = element_line(colour = "black" ), panel.grid.major = element_blank(), panel.grid.minor = element_blank(), panel.border = element_blank(), panel.background = element_blank() ) -> Fig1 Fig1 ``` ##### Each of the four lines we just added removed parts of the grid background we saw displayed on all earlier versions of `Fig1`. ###### Let's walk through what we just created. First, we used ggplot to build our basic graphic, told R which variable was our response and which was our independent variable, and filled each data point with the Blue Economy Capacity gradient. Next, we assigned each data point a specific shape depending on its developmental status classification. We then formatted how our legend would be displayed, titled our axes, and added final specifications to how we want the graphic itself to be displayed (borders, clear background, etc.). Finally, we assigned this large chunk of code to a variable: `Fig1`; this will help us as we move through the lesson. Instead of copying this large chunk of code each time we want to display this figure, we can simply call the variable to produce the same graphic. ##### This code chunk contains our final aesthetic changes. Now that we have finished out aesthical edits, let's get to the statistical analysis. ## Section III: Statistics - Linear Regression ##### We have now arrived at the most important part of this lesson. We will perform a linear regression on Enabling Conditions as a function of Resource Availability. #### REMEMBER TO INSERT HYPOTHESIS HERE ```{r} # Beginning Linear Regression Fig1 + stat_smooth( method = "lm", col = "red" ) -> Fig2 Fig2 # Note that we are using `Fig1`. As we discussed above, it was much easier to use the short variable name instead of having to bring down our large chunk of original code. ``` ##### Finally, let's add the linear regression equation and the R-squared value to `Fig2`. ```{r} # Displaying Regression Equation Fig2 + stat_regline_equation(label.y = 110, aes( label = ..eq.label.. )) + stat_regline_equation(label.y = 105, aes( label = ..rr.label.. )) ``` #### DOES THIS SUPPORT HYPOTHESIS? ##### Knit your code to see the output produce in the `Viewer` pane. # Section IV: R Analysis Questions ### Part I - Statistical Questions #### Q1. How would you describe the histogram for Resource Availability? #### Q2. How would you describe the histogram for Enabling Conditions? #### Q3. Looking at the trendline added in `Fig2`, how would you describe the relationship between Enabling Conditions and Resource Availability? Does it appear to be statistically significant? What does the gray shading around the line represent? #### Q4. What, if anything, stands out to you about this trendline? Given our two variables, Resource Availability and Enabling Conditions, does our final figure (`Fig2`) surprise you? #### Q5. Take a look at the R-squared value of this plot. Given this value, does our model appear to be statistically significant? Explain your answer. #### Q6. Does the final `Fig2` output support our original hypothesis? Original hypothesis: "If a territory has high resource availability and high enabling conditions, then, regardless of development status, that territory has a greater capacity for establishing a viable blue economy." #### Q7. Based on the final figure output, which countries appear to be the most capable of supporting a blue economy? What do these countries have in common (geographically, socially, financially, etc.)? #### Q8. Given the statisical significance, or lack thereof, of our model, what other factors not accounted for in our model could be affecting a territory's ability to house a viable blue economy? Provide 3 examples. ### Part II - Ecological Questions #### Q9. What criteria underpin `Resource Availability`? `Enabling Conditions`? #### Q10. How can policymakers utilize this data to work towards establishing a blue economy in their home territory? #### Q11. Why is establishing a blue economy so important?