Ch10_html.utf8

Today’s investigation. Temperature is an important factor influencing the fitness of marine invertebrates in the intertidal zone. This is especially true for reproductively active male fiddler crabs that live risking their life to mate. After all, the more time a crab spends sheltering in a burrow to avoid high temperatures, the less time it has for reproductive activities. These observations have motivated many studies on evolutionary strategies balancing ectotherms’ body sizes and reproductive performance with abiotic factors. Today we will use a two-way ANOVA to test how body size influences behavioral choices made by fiddler crabs in response to environmental variation in temperature and water availability using data generated by CSULB Marine Ecology lab.

Introduction

In this lab, we will carry out a two-way analysis of variance to determine how does body size interacts with humidity to influence behavioral choices made by fiddler crabs. Sand fiddler crabs are often exposed to high body temperatures and risk of desiccation while engaging in combat with other males to defend their territory or displaying to attract females (Figure 1). Because successful males are usually larger in body size, scientists have explored how large individuals exposed to high temperatures and the risk of desiccation can risk their survival and still win the contest of mating.

Today, we will test whether body size interacts with sand humidity to influence the body temperature selected by individual male crabs. Recall crabs are ectotherms and thus their bodies acquire environmental temperatures. For this, we will use data from CSULB Marine Ecology lab’s experiments led by Dr. Bengt Allen. In his experiments, Dr. Allen used a Plexiglas raceway mounted on a solid aluminum bar with a thermal gradient. The raceway was covered with a thin layer of sand and each experimental male crab was allowed to move freely on it before taking its body temperature. Dr. Allen used both small and large crabs in his experiments and discovered that larger males are better at tolerating harsh environmental conditions. So, let’s explore the two-way analysis of variance and how it is a useful statistical tool for describing differences and interactions across group means.

Figure 1. Uca pugilator combat (left), and Uca crab aggregation in mudflat (right). Images: Bengt Allen.

Upon completion of this lab, you should be able to:

Analyze factorial designs with a two way ANOVA;
Identify and quantify interactions between two factors;
Interpret the R model output (ANOVA table).

References:

Allen et al. 2012

Worked example

In Chapter 9, we used a one-way ANOVA to compare the mean of more than two groups. However, when comparing the mean differences between groups of two or more categorical variables (factors) we use an extension of the one-way ANOVA, the two-way ANOVA. Basically, the two-way ANOVA is a multivariate analysis that examines whether there are interactions between the two explanatory variables on the response variable. As expected, the two-way ANOVA follows the same assumptions as the one-way ANOVA.

The two-way analysis of variance is a linear model (as the one-way ANOVA) used in factorial designs. Factorial designs are experiments in which all combinations of the values of two - or more - explanatory variables are tested. In this case, the explanatory variables are called factors. In Chapter 9, we estimated a one-way ANOVA step-by-step. A two-way ANOVA becomes more complicated given the potential large number of combinations among factors that may result, together with the number of replicates. So, for our purpose, let’s just define the two-way ANOVA in its linear form.

When testing the effects of two categorical variables, the linear model can be defined as:

\[ \begin{aligned} Y=\mu+A+B+A*B, \end{aligned} \]

where \(\mu\) is the mean or intercept, and \(A\) and \(B\) are the two categorical explanatory variables representing the main effects. \(A*B\) is the interaction term. Here, the degrees of freedom for a main effect are defined as # of levels-1, where levels are the different groups within each factor. The degrees of freedom for the interaction term are defined as the product of the main effect degrees of freedom values.

Say we are interested in testing whether the difference in mean territory size of reptiles species differs between diet types (carnivore vs herbivore) and climatic seasons (summer vs winter). Here, our model can be defined as:

\[ \begin{aligned} TERRITORY=\mu+DIET+SEASON+DIET*SEASON. \end{aligned} \]

Here, we are interested in testing the following hypotheses:

DIET:

\(H_0\): There is no difference between diet types in mean territory size.
\(H_A\): There is a difference between diet types in mean territory size.

SEASON:

\(H_0\): There is no difference between climatic seasons in mean territory size.
\(H_A\): There is a difference between climatic seasons in mean territory size.

DIET*SEASON:

\(H_0\): The effect of diet type on mean territory size does not depend on climatic season.
\(H_A\): The effect of diet type on mean territory size depends on climatic season.

For our example, a significant DIET effect would indicate that territory size differs between diet types, when averaged over climatic seasons. A significant SEASON effect would indicate that territory size differs between climatic seasons, when averaged over diet types. The interaction term would be significant if the effect of diet type on mean territory size depends on climatic season.

Let’s explore visually three examples of these potential effects on territory size:

Figure 2. Interaction plots for our hypothetical example.

Figure 2. Interaction plots for our hypothetical example.

Here, either a single or a combination of these potential effects may be significant. Note that when an interaction between factors is significant, we expect non-parallel lines describing the effects on the response variable.

Say we estimated a two-way ANOVA with these data and got the table below:

Table 1. Two-way ANOVA for mean territory size of reptiles species

Factor	Sum Sq.	df	Mean Sq.	F	p-value
DIET	5.010	1	5.010	100.20	0.01
SEASON	2.180	1	2.180	21.30	0.01
DIET*SEASON	0.134	1	0.134	1.26	0.03

From Table 1 we can conclude that the two factors, including the interaction term, affect significantly the territory size of reptiles. Thus, the most likely scenario is a combination of all three scenarios 1-3 above. We conclude that there is a difference between diet types in mean territory size, there is a difference between climatic seasons in mean territory size, and that the effect of diet type on mean territory size depends on climatic season (they interact with each other).

Materials and Methods

R and RStudio
package ggplot2 and rstatix
crab.csv

Today’s activity Size-dependent temperature constraints in fiddler crabs is organized into one main exercise to describe whether body size interacts with hydration state of the substrate to influence the thermal preferences of fiddler crabs. This exercise will also help us familiarized and interpret R’s model output and model assumption diagnostics.

Size-dependent temperature constraints in fiddler crabs

Research question 1: Does body size interact with hydration state to influence the thermal preferences of fiddler crabs?

1. Import the data Let’s import the “crab” dataset to RStudio and explore it.

Questions:

How many variables and observations does “crab” have?
Considering the research question, what are the variables of interest?
How many groups/factors will you compare?

2. Check model assumptions of equal variances

As in Chapter 9, we can use the Barlett’s test to test for equal variances. For this, we will use the function bartlett.test(). The first argument of the function is the response variable, followed by the explanatory variable (factor).

# equal variances test between tb and size
bartlett.test(tb~size, data=crab)

# equal variances test between tb and sand
bartlett.test(tb~sand, data=crab)

3. Carry out a two-way analysis of variance

To estimate an analysis of variance in R, we use the function aov(). The first argument of the function is the response variable and the second argument is the explanatory variable. Lastly, we need to indicate the dataframe where such variables are located.

Let’s fit an ANOVA to our data and check the model output.

# ANOVA
crab_aov <- aov(tb ~ sand + size + sand*size, data=crab)

# ANOVA table
summary(crab_aov)

##             Df Sum Sq Mean Sq F value Pr(>F)    
## sand         1 1474.6  1474.6 264.431 <2e-16 ***
## size         2    7.5     3.7   0.669 0.5151    
## sand:size    2   26.7    13.4   2.397 0.0972 .  
## Residuals   84  468.4     5.6                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Stop, Think: Interpret the model output from R (ANOVA table). Stop and review the Worked example. Think about how you can obtain each value in the model output.

Info-Box! Let’s interpret R’s aov() summary output.

# ANOVA table
summary(crab_aov)

Following the equations for a one-way ANOVA in Chapter 9, we can estimate the ANOVA table:

Degrees of freedom; df:

\(Df_{sand}=2-1=1\)

\(Df_{size}=3-1=2\)

\(Df_{sand*size}=(2-1)(3-1)=2\)

Sum of squares; Sum Sq:

\(SS_{sand}=1474.6\)

\(SS_{size}=7.5\)

\(SS_{sand*size}=26.7\)

Mean sum of squares; Mean Sq:

\(MS_{sand}=\frac{1381.1}{1}=1474.6\)

\(MS_{size}=\frac{7.5}{2}=3.7\)

\(MS_{sand*size}=\frac{26.7}{2}=13.4\)

F-ratio; F value:

\(F_{sand}=\frac{1474.6}{5.6}=263.3\)

\(F_{size}=\frac{3.7}{5.6}=0.7\)

\(F_{sand*size}=\frac{13.4}{5.6}=2.4\)

**Slight differences do to rounding may be present.

Challenge. Plot your data. Add all components for a good visualization.

4. Plot the interaction Let’s explore how to visualize the significant interaction between sand and body size by following the visualizations in Figure 2.

# loading the packages
library(ggplot2)

# plotting the interaction
p1 <- ggplot(crab,aes(sand,tb,group=size,color=size)) +
  geom_point(size=2,alpha=.5,position = position_dodge(width=.15)) +
  geom_smooth(method = "lm", se = FALSE) +
  ylab("Body temperature (°C)") +
  xlab("Sand treatment") +
  theme_classic(14)
p1

Figure 3. Size-dependent body temperature across sand hydration state treatments.

5. Carry out a multiple comparison using the Tukey test

Following Chapter 9, we can carry out a multiple comparison among groups using the Tukey test. Recall that the Tukey test allows us to find group means that are significantly different from each other, comparing all possible pairs of means with a t-test like approach.

For this, we can use the function TukeyHSD() where the first argument is the fitted model.

# Tukey test
t <- TukeyHSD(crab_aov)
t

Questions:

Are all group means different from each other?
What is the relationship between body temperature, body size and sand treatment?

6. Check model assumptions of normality

Challenge. Test the model assumptions of normality. Recall from Chapter 9 that in an ANOVA, the model residuals (the difference between each observation and the mean value) should be normally distributed. Hint: Review your Chapter 9 R script on model checking plots (residuals vs fitted values, a Q-Q plot, a scale-location plot, and the constant-leverage plot).

Question:

What are the assumptions of the ANOVA?
Does your analysis follow the ANOVA assumptions of normality?

Discussion questions

Differentiate between a one-way and a two-way ANOVA.
In the linear formula, why do we multiply the two factors?
Why does the model output present three different p-values?

Great Work!

Two-way analysis of variance

Chapter 10