--- title: 'Migration Case Study: Patterns in sicklefin redhorse migration' output: word_document: toc: true html_document: toc: true --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE, warning = FALSE, message = FALSE) ``` ### **Introduction** Sicklefin redhorse are a species of sucker in the genus *Moxostoma* that are a candidate for the endangered species list. Various agencies and entities partnered together to conduct conservation work to protect, restore, and monitor sicklefin redhorse populations throughout their range. As part of this work, biologists from the Georgia Department of Natural Resources, Tennessee Valley Authority, the U.S. Department of Natural Resources, and other groups collected sicklefin redhorse from Brasstown Creek in north Georgia. Captured individuals were tagged with passive integrated transponder (PIT) tags during their spawning migration and released back into the stream. A PIT tag detection antenna was deployed at a fixed site along the stream bottom to record any tagged sicklefin redhorse that passed over the antenna. This data allowed biologists to determine the timing of migration and the duration of migration in the spawning site. Biologists also recorded the sex of individuals and their total length. The data set below provides detection data from 2017 and 2018. The data set can be loaded into R with the code below. Users will need to set the working directory to the location of the data set on their computer. ```{r} #Set the working directory appropriately. setwd("C:/Users/davisjg/Desktop/FMN Work") Migration <- read.csv("SFR Movement Data.csv") ``` The data set contains the following columns of data: **Individual** – A unique number assigned to each individual fish **First Date** – The first date that an individual was detected by the antenna, representing when the individual migrated into the spawning grounds. **Last Date** – The last date that an individual was detected by the antenna, representing when the individual migrated back downstream. **Year** – The year of the detection, being either 2017 or 2018. **Residence Time** – The number of days between the first date of detection and the last date of detection, representing the time of residency in the spawning grounds. **Total Length** – The size (mm) of the sicklefin redhorse as measured from the tip of the snout to the tip of its caudal fin (or tail). **Sex** – The biological sex of the individual (male or female) based upon the expression of spawning tubercles on the fins. ### **Hypothesis: There is no difference in residence time between years.** To compare the mean residence time of sicklefin redhorse in the spawning grounds in 2017 and 2018, an appropriate test to compare means between two groups in needed. First, the assumption that the residence time follows a normal distribution should be tested. This can be done by constructing a histogram and Q-Q plot of residence time and using a Shapiro-Wilk test to determine if the null hypothesis of the data having a normal distribution is supported. ```{r} hist(Migration$Residence.Time) ``` *Question*: Does the histogram show a normal distribution of the data? Describe the shape of the distribution? *Answer:* The histogram shows that residence time is skewed right. ```{r} library(car) qqPlot(Migration$Residence.Time) ``` *Question:* If a Q-Q plot has values that are near the line of best fit and within the shaded region, then there is evidence that the data fits a normal distribution? Does the Q-Q plot indicate a normal distribution for residence time? *Answer:* The Q-Q plot also shows that the data does not come from a normal distribution. ```{r} shapiro.test(Migration$Residence.Time) ``` *Question:* The null hypothesis of the Shapiro-Wilk test is as follows: H~0~: The data fits a normal distribution. Based upon the results of the test, is the data normally distributed? *Answer:* The Shapiro-Wilk test results in a P-value less than 0.05. Therefore, there is evidence to reject the null hypothesis that the data follows a normal distribution. Thus, residence time is not normally distributed. As a result of the data not fitting a normal distribution, a non-parametric test is appropriate. The Mann-Whitney U test is a non-parametric alternative to the more typical t-test. The Mann-Whitney U test determines whether the two groups have a similar distribution or compares the medians between two groups. The following statistical hypotheses are tested: H~0~: There is no significant difference between the median residence time between years. H~A~: There is a significant difference between the median residence time between years. ```{r} Migration$Year <- factor(Migration$Year) wilcox.test(Migration$Residence.Time~Migration$Year) ``` *Question:* Based upon the results of the Mann-Whitney U test, does residence time of sicklefin redhorse differ between 2017 and 2018? *Answer:* Based upon the results, the null hypothesis is rejected. Thus, there is evidence that residence time differs between years. The outcome of this analysis can be illustrated by a variety of plots, including a box plot, violin plot, strip chart, and histograms. Use the code below to generate each of these plots. ```{r} boxplot(Migration$Residence.Time~Migration$Year, main = "Comparison of Residence Time Between Years", xlab = "Year", ylab = "Residence Time (Days)") ``` The box plots shows the median as a horizontal line inside the box. The edges of the boxes represent the 25th and 75th percentiles. ```{r} library(ggplot2) ggplot(Migration, aes(x=Year, y=Residence.Time, fill = Year)) + geom_violin() + xlab("Year") + ylab("Residence Time (Days)") + theme_classic()+scale_fill_manual(values=c("#FFB531","#BC211A"))+ stat_summary(fun.y=mean, geom="point", color="black")+ theme(legend.position="none")+ theme(aspect.ratio=1) ``` A violin plot depicts distributions of data for groups using density curves. The width of each curve corresponds with the approximate frequency of data points in each region. ```{r} ggplot(Migration, aes(x = Year, y = Residence.Time)) + geom_point(color = "firebrick", size = 3, shape = 1) + stat_summary(fun.data = mean_cl_normal, geom = "errorbar", width = 0.1, position=position_nudge(x = 0.15))+ stat_summary(fun.y = mean, geom = "point", color = "firebrick", size = 3, position=position_nudge(x = 0.15)) + labs(x = "Year", y = "Residence Time (Days)") + theme_classic() ``` A strip chart is valuable because it also shows the location of all of the data points. This strip chart displays the mean values and standard error of the mean. ```{r} library(dplyr) library(hrbrthemes) p <- Migration %>% ggplot( aes(x=Residence.Time, fill=Year)) + geom_histogram( color="#e9ecef", alpha=0.5, position = 'identity') + scale_fill_manual(values=c("#69b3a2", "#404080")) + theme_ipsum() + labs(x = "Year", y = "Residence Time (Days)") + theme_classic() p ``` This histogram shows the distributions of each group overlapping each other. Thus, the histogram allows for a comparison of similarity of distributions. *Question:* Based upon the results of the analysis and the visual representation of the data from the graphs, summarize the results of the analysis. *Answer:* The null hypothesis that residence time does not differ between years is rejected. The residence time for 2018 was significantly greater (P\<0.001) than for 2017. These plots show that the mean residence time was significantly higher as well. The violin plot and histogram show that the variation in residence time was much less and typically less that \~20 days for 2017. There was much more variation in residence time for 2018. ### **Hypothesis: There is no difference in residence time between sexes.** Similar to the previous analysis, the hypothesis that residence time may differ between males and females can be tested. Because residence time is not normally distributed, the Mann-Whitney U test is used again. ```{r} Migration$Sex <- factor(Migration$Sex) wilcox.test(Migration$Residence.Time~Migration$Sex) ``` *Question:* What is the null and alternative hypothesis? *Answer:* H~0~: There is no significant difference between the median residence time between sex. H~A~: There is a significant difference between the median residence time between sex. *Question:* Based upon the results of the analysis, what conclusion can be drawn? *Answer:* Based upon the results, the null hypothesis that there is no difference in residence time among sexes is rejected. Thus, there is evidence that residence time is significantly different (P\<0.001) between males and females. Again, the outcome of this analysis can be illustrated by a box plot, violin plot, strip chart, and histogram. Construct the graphs using the following R code: ```{r} boxplot(Migration$Residence.Time~Migration$Sex, main = "Comparison of Residence Time Between Sex", xlab = "Sex", ylab = "Residence Time (Days)") library(ggplot2) ggplot(Migration, aes(x=Sex, y=Residence.Time, fill = Sex)) + geom_violin() + xlab("Sex") + ylab("Residence Time (Days)") + theme_classic()+scale_fill_manual(values=c("pink","blue"))+ stat_summary(fun.y=mean, geom="point", color="black")+ theme(legend.position="none")+ theme(aspect.ratio=1) ggplot(Migration, aes(x = Sex, y = Residence.Time)) + geom_point(color = "firebrick", size = 3, shape = 1) + stat_summary(fun.data = mean_cl_normal, geom = "errorbar", width = 0.1, position=position_nudge(x = 0.10))+ stat_summary(fun.y = mean, geom = "point", color = "firebrick", size = 3, position=position_nudge(x = 0.10)) + labs(x = "Sex", y = "Residence Time (Days)") + theme_classic() library(dplyr) library(hrbrthemes) p <- Migration %>% ggplot( aes(x=Residence.Time, fill=Sex)) + geom_histogram( color="#e9ecef", alpha=0.5, position = 'identity') + scale_fill_manual(values=c("pink", "#404080")) + theme_ipsum() + labs(x = "Sex", y = "Residence Time (Days)") + theme_classic() p ``` *Question:* Based upon the results of the analysis and the visual representation of the data from the graphs, summarize the results of the analysis. *Answer:* The null hypothesis that residence time does not differ between sex is rejected. The residence time for males was significantly greater (P\<0.001) than for females. The violin plot and histogram show that the variation in residence time was much less for females and less than \~20 days. There was much more variation in residence time for males with some males residing in the spawning areas for more than 60 days. ### **Hypothesis: There is no relationship between residence time and fish size.** For this analysis, there are two continuous numerical variables. The response variable of residence time is not normally distributed. The normality of the explanatory variable is determined with the following code: ```{r} hist(Migration$TotalLength) library(car) qqPlot(Migration$TotalLength) shapiro.test(Migration$TotalLength) ``` Total length is normally distributed. Before conducting the analysis, the relationship between residence time and total length can be viewed with a scatter plot. ```{r} ggplot(Migration, aes(TotalLength, Residence.Time)) + geom_smooth(method = "lm", se = TRUE, col = "black") + geom_point(size = 3, col = "firebrick") + labs(x = "Total Length (mm)", y = "Residence Time (Days)") + theme_classic() ``` *Question:* Based upon the scatter plot, does there appear to be a relationship between residence time and total length of sicklefin redhorse? *Answer:* Based upon the scatter plot, it appears that there is no relationship between residence time and fish size. A linear regression analysis can be conducted. The null hypothesis is that there is no significant relationship between residence time and total length. ```{r} attach(Migration) MigrationRegression <- lm(Residence.Time ~ TotalLength) summary(MigrationRegression) ``` *Question:* Based upon the results of the analysis and the scatter plot, what conclusion is drawn. *Answer:* Based upon the results of the analysis, there is no significant relationship (P=0.647; R^2^=0.001). The total length of the fish does not seem to determine the residence time in the spawning area. *Question:* Summarize all of the findings from the activity thus far. *Answer:* Residence time in the spawning area is affected by the year and presumably the environmental conditions present in each year and by sex. Residence time is not affected by total length. ### **Hypothesis: There is no relationship between residence time and fish size when accounting for differences between males and females.** Let's conduct a more thorough investigation of the data. Because residence time differed among males and females, this may be a confounding variable in the analysis of residence time and total length. The relationship between residence time and total length may differ among males and females. To do this, the data set is separated by males and females using the following code: ```{r} MaleMigration <- subset(Migration, Sex=="male") FemaleMigration <- subset(Migration, Sex=="female") ``` The relationship between residence time and total length will now be analyzed separately for males and females. A regression analysis for males and a scatter plot of the relationship is conducted with the following R code: ```{r} attach(MaleMigration) MaleRegression <- lm(Residence.Time ~ TotalLength) summary(MaleRegression) ggplot(MaleMigration, aes(TotalLength, Residence.Time)) + geom_smooth(method = "lm", se = TRUE, col = "black") + geom_point(size = 3, col = "firebrick") + labs(x = "Total Length (mm)", y = "Residence Time (Days)") + theme_classic() ``` *Question:* What conclusion can be drawn from the results of the analysis? *Answer:* Based upon the results, there is a significant relationship (P=0.0005) between residence time and total length for males. However, there is much variation in the relationship. The relationship between residence time and total length for females is investigated with a regression analysis, and a scatter plot of the relationship is created with the following R code: ```{r} attach(FemaleMigration) femaleRegression <- lm(Residence.Time ~ TotalLength) summary(femaleRegression) ggplot(FemaleMigration, aes(TotalLength, Residence.Time)) + geom_smooth(method = "lm", se = TRUE, col = "black") + geom_point(size = 3, col = "firebrick") + labs(x = "Total Length (mm)", y = "Residence Time (Days)") + theme_classic() ``` *Question:* What conclusion can be drawn from the results of the analysis? *Answer:* Based upon the results, there is not a significant relationship (P=0.731) between residence time and total length for females. The most effective way to visualize the differences between males and females in the relationship between residence time and total length is plot both relationships on the same graph. This is done with the following R code: ```{r} attach(Migration) cols <- c("pink", "blue") ggplot(Migration, aes(x = TotalLength, y = Residence.Time, color = Sex)) + geom_point() + geom_smooth(method = "lm", fill = NA)+ scale_color_manual(values = cols)+ labs(x="Total Length (mm)", y="Residence Time (Days)")+ theme_classic() ``` *Question:* Summarize the relationship between residence time and total length when accounting for the effect of sex on the relationship. Speculate as to why this relationship exists. *Answer:* The graph shows a clear difference in the relationship among males and females. Residence time increases as total length increases for males but does not increase with total length for females. *Discussion Question:* Speculate as to why residence time and differs among sexes. *Answer:* The result could be explained based upon many factors. Students may reference energy budgets. For example, the females invest more energy in egg productive and reproduction and drift back downstream immediately after spawning. In addition, females may have only one reproductive bought, releasing all eggs over a short time period and completing the spawning event. However, males may spawn with multiple females and therefore, remain in the spawning grounds longer to do so. Perhaps males display nest guarding behavior and stay behind to protect the nests. Answers do not have to be biologically accurate. The goal is to have students brainstorm answers based upon knowledge of ecological concepts. ### Discussion Questions The following questions can be discussed in class in small groups or can be assigned as an after-class activity. *Discussion Question:* Speculate as to why residence time differs between years. *Answer:* Several factors may affect residence time. From the previous lesson, students may discuss water temperature or floods. Encourage students to review the discharge histogram from Lesson 2 when answering this question. *Discussion Question:* Speculate as to why residence time is related to total length for males but not females. *Answer:* Males may compete for access to prime spawning grounds and access to females. It could be that larger males have a competitive advantage and can produce more sperm. Larger males are also older males. Based upon previous experience, larger males may arrive earlier in the spawning season. Students can investigate this idea in the data set. Sicklefin redhorse males have been observed competing with other males before spawning with females, displaying agonistic behavior. Larger males may more frequently win competitions and stay in the spawning grounds to win more spawning bouts. Because they contain more sperm, they may also remain in the spawning grounds to spawn with the few remaining females. *Discussion Question:* Evaluate the findings from this lesson in terms of nutrient cycles and the findings of Hudson et al. (2023) from Lesson 2. Hudson et al. (2023) found that nitrogen import was highest during peak redhorse abundance in the spawning grounds. They also found that nitrogen contributions were greatest from eggs. Thus, most nitrogen originates from females. Therefore, the timing and duration of females in spawning sites may influence nitrogen subsidies to the stream, affecting local primary production. Residence time also differs annually. In years in which residence time is shorter, especially for females, nitrogen subsidies may be less. The effect of climate change and human interactions in the watershed may alter timing and duration of spawning runs also. More research is needed on this. This could potentially reduce the amount of nitrogen that is biological available in headwater streams in the spring.