Today’s investigation. Temperature is an important factor influencing the fitness of marine invertebrates in the intertidal zone. This is especially true for reproductively active male fiddler crabs that live risking their life to mate. After all, the more time a crab spends sheltering in a burrow to avoid high temperatures, the less time it has for reproductive activities. These observations have motivated many studies on evolutionary strategies balancing ectotherms’ body sizes and reproductive performance with abiotic factors. Today we will use a two-way ANOVA to test how body size influences behavioral choices made by fiddler crabs in response to environmental variation in temperature and water availability using data generated by CSULB Marine Ecology lab.
Introduction
In this lab, we will carry out a two-way analysis of variance to determine how does body size interacts with humidity to influence behavioral choices made by fiddler crabs. Sand fiddler crabs are often exposed to high body temperatures and risk of desiccation while engaging in combat with other males to defend their territory or displaying to attract females (Figure 1). Because successful males are usually larger in body size, scientists have explored how large individuals exposed to high temperatures and the risk of desiccation can risk their survival and still win the contest of mating.
Today, we will test whether body size interacts with sand humidity to influence the body temperature selected by individual male crabs. Recall crabs are ectotherms and thus their bodies acquire environmental temperatures. For this, we will use data from CSULB Marine Ecology lab’s experiments led by Dr. Bengt Allen. In his experiments, Dr. Allen used a Plexiglas raceway mounted on a solid aluminum bar with a thermal gradient. The raceway was covered with a thin layer of sand and each experimental male crab was allowed to move freely on it before taking its body temperature. Dr. Allen used both small and large crabs in his experiments and discovered that larger males are better at tolerating harsh environmental conditions. So, let’s explore the two-way analysis of variance and how it is a useful statistical tool for describing differences and interactions across group means.
Upon completion of this lab, you should be able to:
References:
Worked example
In Chapter 9, we used a one-way ANOVA to compare the mean of more than two groups. However, when comparing the mean differences between groups of two or more categorical variables (factors) we use an extension of the one-way ANOVA, the two-way ANOVA. Basically, the two-way ANOVA is a multivariate analysis that examines whether there are interactions between the two explanatory variables on the response variable. As expected, the two-way ANOVA follows the same assumptions as the one-way ANOVA.
The two-way analysis of variance is a linear model (as the one-way ANOVA) used in factorial designs. Factorial designs are experiments in which all combinations of the values of two - or more - explanatory variables are tested. In this case, the explanatory variables are called factors. In Chapter 9, we estimated a one-way ANOVA step-by-step. A two-way ANOVA becomes more complicated given the potential large number of combinations among factors that may result, together with the number of replicates. So, for our purpose, let’s just define the two-way ANOVA in its linear form.
When testing the effects of two categorical variables, the linear model can be defined as:
\[ \begin{aligned} Y=\mu+A+B+A*B, \end{aligned} \]
where \(\mu\) is the mean or intercept, and \(A\) and \(B\) are the two categorical explanatory variables representing the main effects. \(A*B\) is the interaction term. Here, the degrees of freedom for a main effect are defined as # of levels-1, where levels are the different groups within each factor. The degrees of freedom for the interaction term are defined as the product of the main effect degrees of freedom values.
Say we are interested in testing whether the difference in mean territory size of reptiles species differs between diet types (carnivore vs herbivore) and climatic seasons (summer vs winter). Here, our model can be defined as:
\[
\begin{aligned}
TERRITORY=\mu+DIET+SEASON+DIET*SEASON.
\end{aligned}
\]
Here, we are interested in testing the following hypotheses:
DIET:
\(H_0\): There is no difference between diet types in mean territory size.
\(H_A\): There is a difference between diet types in mean territory size.
SEASON:
\(H_0\): There is no difference between climatic seasons in mean territory size.
\(H_A\): There is a difference between climatic seasons in mean territory size.
DIET*SEASON:
\(H_0\): The effect of diet type on mean territory size does not depend on climatic season.
\(H_A\): The effect of diet type on mean territory size depends on climatic season.
For our example, a significant DIET effect would indicate that territory size differs between diet types, when averaged over climatic seasons. A significant SEASON effect would indicate that territory size differs between climatic seasons, when averaged over diet types. The interaction term would be significant if the effect of diet type on mean territory size depends on climatic season.
Let’s explore visually three examples of these potential effects on territory size:
Here, either a single or a combination of these potential effects may be significant. Note that when an interaction between factors is significant, we expect non-parallel lines describing the effects on the response variable.
Say we estimated a two-way ANOVA with these data and got the table below:
Factor | Sum Sq. | df | Mean Sq. | F | p-value |
---|---|---|---|---|---|
DIET | 5.010 | 1 | 5.010 | 100.20 | 0.01 |
SEASON | 2.180 | 1 | 2.180 | 21.30 | 0.01 |
DIET*SEASON | 0.134 | 1 | 0.134 | 1.26 | 0.03 |
From Table 1 we can conclude that the two factors, including the interaction term, affect significantly the territory size of reptiles. Thus, the most likely scenario is a combination of all three scenarios 1-3 above. We conclude that there is a difference between diet types in mean territory size, there is a difference between climatic seasons in mean territory size, and that the effect of diet type on mean territory size depends on climatic season (they interact with each other).
Materials and Methods
Today’s activity Size-dependent temperature constraints in fiddler crabs is organized into one main exercise to describe whether body size interacts with hydration state of the substrate to influence the thermal preferences of fiddler crabs. This exercise will also help us familiarized and interpret R’s model output and model assumption diagnostics.
Research question 1: Does body size interact with hydration state to influence the thermal preferences of fiddler crabs?
1. Import the data Let’s import the “crab” dataset to RStudio and explore it.
Questions:
2. Check model assumptions of equal variances
As in Chapter 9, we can use the Barlett’s test to test for equal variances. For this, we will use the function bartlett.test(). The first argument of the function is the response variable, followed by the explanatory variable (factor).
# equal variances test between tb and size
bartlett.test(tb~size, data=crab)
# equal variances test between tb and sand
bartlett.test(tb~sand, data=crab)
3. Carry out a two-way analysis of variance
To estimate an analysis of variance in R, we use the function aov(). The first argument of the function is the response variable and the second argument is the explanatory variable. Lastly, we need to indicate the dataframe where such variables are located.
Let’s fit an ANOVA to our data and check the model output.
# ANOVA
crab_aov <- aov(tb ~ sand + size + sand*size, data=crab)
# ANOVA table
summary(crab_aov)
## Df Sum Sq Mean Sq F value Pr(>F)
## sand 1 1474.6 1474.6 264.431 <2e-16 ***
## size 2 7.5 3.7 0.669 0.5151
## sand:size 2 26.7 13.4 2.397 0.0972 .
## Residuals 84 468.4 5.6
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Stop, Think: Interpret the model output from R (ANOVA table). Stop and review the Worked example. Think about how you can obtain each value in the model output.
Info-Box! Let’s interpret R’s aov() summary output.
# ANOVA table
summary(crab_aov)
Following the equations for a one-way ANOVA in Chapter 9, we can estimate the ANOVA table:
Degrees of freedom; df:
\(Df_{sand}=2-1=1\)
\(Df_{size}=3-1=2\)
\(Df_{sand*size}=(2-1)(3-1)=2\)
Sum of squares; Sum Sq:
\(SS_{sand}=1474.6\)
\(SS_{size}=7.5\)
\(SS_{sand*size}=26.7\)
Mean sum of squares; Mean Sq:
\(MS_{sand}=\frac{1381.1}{1}=1474.6\)
\(MS_{size}=\frac{7.5}{2}=3.7\)
\(MS_{sand*size}=\frac{26.7}{2}=13.4\)
F-ratio; F value:
\(F_{sand}=\frac{1474.6}{5.6}=263.3\)
\(F_{size}=\frac{3.7}{5.6}=0.7\)
\(F_{sand*size}=\frac{13.4}{5.6}=2.4\)
**Slight differences do to rounding may be present.
Challenge. Plot your data. Add all components for a good visualization.
4. Plot the interaction Let’s explore how to visualize the significant interaction between sand and body size by following the visualizations in Figure 2.
# loading the packages
library(ggplot2)
# plotting the interaction
p1 <- ggplot(crab,aes(sand,tb,group=size,color=size)) +
geom_point(size=2,alpha=.5,position = position_dodge(width=.15)) +
geom_smooth(method = "lm", se = FALSE) +
ylab("Body temperature (°C)") +
xlab("Sand treatment") +
theme_classic(14)
p1
5. Carry out a multiple comparison using the Tukey test
Following Chapter 9, we can carry out a multiple comparison among groups using the Tukey test. Recall that the Tukey test allows us to find group means that are significantly different from each other, comparing all possible pairs of means with a t-test like approach.
For this, we can use the function TukeyHSD() where the first argument is the fitted model.
# Tukey test
t <- TukeyHSD(crab_aov)
t
Questions:
6. Check model assumptions of normality
Challenge. Test the model assumptions of normality. Recall from Chapter 9 that in an ANOVA, the model residuals (the difference between each observation and the mean value) should be normally distributed. Hint: Review your Chapter 9 R script on model checking plots (residuals vs fitted values, a Q-Q plot, a scale-location plot, and the constant-leverage plot).
Question:
Discussion questions