Introduction: Linear mixed-effects models (LMMs), also known as hierarchical models, are another extension of simple linear models used when there is clustering (i.e., nested data structures) or non-independence (i.e., repeated measurements) among observations. These models are called “mixed-effects” because they incorporate both fixed and random effects. Fixed effects are variables with a constant effect on the response variable, while random effects are variables whose values or levels are assumed to be drawn randomly from a larger population of levels. Given the natural clustering in biological data (e.g., genetic groups, geographic locations), as well as the longitudinal monitoring of the same individuals over time, LMMs represent another essential tool in modern population biology.

In this module, you will expand your skills on linear models (Modules M.4) to LMMs while testing associations between Cayo Santiago rhesus macaque social cognition and age.



Upon completion of this module, you will be able to:


References:


Extra training:


Associated literature:


Expand for R Notation and functions index

Notation:


Functions:




Testing associations between social cognition and age


Rhesus macaques represent an excellent comparative system to study the evolutionary origins of cognitive development in primates. However, comparative developmental research on cognition is often challenging to implement due to low sample sizes or no access to populations of individuals that vary with age. Cayo Santiago rhesus macaques provide the rare opportunity to address questions on the evolution of cognition, given that experimental tasks can be performed in many individuals in the field. Cayo Santiago monkeys are habituated to human observers, and thus researchers can generate relatively large amounts of data of socioemotional cognitive traits of individuals with known age (Fig 1).
**Fig 1**. Socioemotional attention experimental tasks design. Monkeys are presented with four photos for 10 seconds each; a neutral female face (trial 1), followed by a threat face of the same female (trial 2), and a neutral male face (trial 3), followed by a threat face of the same male (trial 4).

Fig 1. Socioemotional attention experimental tasks design. Monkeys are presented with four photos for 10 seconds each; a neutral female face (trial 1), followed by a threat face of the same female (trial 2), and a neutral male face (trial 3), followed by a threat face of the same male (trial 4).


In this module, you will use a subset of Cayo Santiago rhesus macaque adult demographic and cognitive data from Rosati et al. 2018. (cayo_demo_cog, Table 1) to explore ontogenic changes in social cognition using linear mixed-effects models. This is authentic demographic and cognition data. The demographic data is collected through daily visual censuses performed by the staff of the Caribbean Primate Research Center (CPRC) of the University of Puerto Rico-Medical Sciences Campus. The cognition health data is based on cognitive tasks collected in the field by a team of researchers from the University of Michigan, University of Pennsylvania, and Yale University. Here, social cognition is indexed as monkeys’ social attention towards different visual stimuli (Fig 1; Rosati et al. 2018).

Before coding, explore the data in Table 1.


Table 1: Cayo Santiago rhesus macaque demographic and cognitive data

Metadata of cayo_demo_cog: Demographic and cognition data of Cayo Santiago rhesus macaques.



1. Exploring data structure and performing quality check

Before any analysis, you should check the data and understand its attributes. Refer to Module M.1. Practice your coding first before clicking on the answer!


Guiding questions:

  1. What is the class of each variable?

Click for Answer!

Variable classes include ‘numeric’ and ‘character’.


2. Exploring associations between social cognition and age using empirical data


Challenge!

  1. Before fitting a model, explore the potential association between social cognition and age using the empirical data. Make sure to appropriately identify the response and explanatory variables, as well as the best data visualization type using ggplot2.


Guiding questions:

  1. Can you describe the relationship between looking time and age?

Click for Answer!

According to the pattern in the concentration of data points, looking time decreases as age increases.


  1. What other available variables do you predict to have a role in explaining the variation in looking time?

Click for Answer!

All other available variables may contribute to the observed variability in looking time. It is possible that males have a different response than females to social stimuli (variable Sex). As individuals habituate to experimental tasks, trial number could also have an influence on the amount of time a monkey spends looking at a photo (variable Trial). Finally, maternal effects could play a role on looking time (variables Mom and Matriline)


3. Fitting linear mixed-effects models

Biological data often presents nested structures. This occurs when observations belong to a discrete set of groups, and you suspect or have evidence that group membership has an important effect on your response variable. As group membership may explain a portion of the observed variation in the response variable, it is very important to account for it! This can ultimately improve model fit. Similarly, non-independence among individual observations due to repeated measurements must be addressed.


Guiding questions:

  1. Does cayo_demo_cog present nested data?

Click for Answer!

Yes! For example, each individual monkey belongs to a discrete number of Mom categories (IDs). Thus, Subject is nested or clustered within units of Mom.


Recall that linear models are defined by a fixed intercept and fixed slopes corresponding to the effect of each explanatory variable. Similarly, random effects can be added as random intercepts and or random slopes for nesting variables.

Below, you will fit a LMMs to the rhesus macaque data to test whether social cognition is a function of age. To address potential maternal effects, you will add a random intercept for Mom ID. For this, you will use R package lmerTest and its function lmer() where the first argument is the response variable, followed by the explanatory variable (separated by “~”). For lmer(), you also need to specify the structure of the random effects; (slope | intercept). To see the model output, you will use summary().

# installing lmerTest
install.packages("lmerTest",repos="http://cran.us.r-project.org")

# loading lmerTest
library(lmerTest)

# linear mixed-effects model including Mom as a random intercept
lmm1 <- lmer(Looking_Time ~ age + (1|mom), data = cayo_demo_cog) # (1|Mom) to only include a random intercept for Mom

# model output
summary(lmm1)

Model output interpretation. The model output summary now has an extra component; random effects. This section indicates the variance and standard deviation of the random intercept fit to the model (i.e., Mom ID). This variance tells you how much variability in looking time is explained by individual mom differences. In other words, this tells you how much mothers differ from each other in terms of their offspring looking time. The residual variance is the variability within each mom ID, and thus remains unexplained by the model.


Guiding questions:

  1. Can you explain the rest of the model summary output?

Click for Answer!

Formula: the model formula defined.

Scaled residuals: minimum, 1st quartile, median, 3rd quartile, and maximum value of the scaled residuals.

Fixed effects: model parameters (y-intercept and slope) and statistics for each parameter (SE, t statistic, p-value). According to the model, the mean baseline looking time when there are no age effects is 5.557s. As age increases by 1 year, looking time decreases by 0.237s.

Correlation of Fixed Effects: correlation matrix for explanatory variables.


4. Plotting predictions from linear mixed-effects models

You can use R package sjPlot to plot predictions from LMMs. However, many extra tools are available. Below you will learn how to plot mean model predictions, as well as random intercepts, using R package ggeffects. This package also allows you to incorporate aesthetics using functions from ggplot2.

# installing ggeffects
install.packages("ggeffects",repos="http://cran.us.r-project.org")

# loading ggeffects and ggplot2
library(ggeffects)
library(ggplot2)

# plotting mean model prediction
p1 <- ggpredict(lmm1, terms = c("age"), type = "fixed") # extracting prediction

plot(p1) + 
  ylab("Looking time (s)") + # using ggplot functions
  xlab("Age (years)")

# plotting model prediction across groups (random intercepts for Mom)
p2 <- ggpredict(lmm1, terms = c("age", "mom"), type = "random") 

plot(p2, ci=FALSE) +
  ylab("Looking time (s)") +
  xlab("Age (years)")


Guiding questions:

  1. What mothers appear to be different from others?

Click for Answer!

Out of the total of 9 mothers, Mom 8 appears to have a higher y-intercept than those associated to the other 8 mothers.


Challenge!

  1. Explore the data and think about other important fixed effects and nesting variable that you can use as a random intercept. Fit the model and compare it with lmm1.


Discussion questions: Compare lmm1 to your new model; which model would you choose and why?


FIN



Acknowledgements: The creation of this module was funded by the National Science Foundation DBI BRC-BIO award 2217812. Cayo Santiago is supported by the Office of Research Infrastructure Programs (ORIP) of the National Institutes of Health, grant 2 P40 OD012217, and the University of Puerto Rico (UPR), Medical Sciences Campus. Additional support was provided by the National Science Foundation Graduate Research Fellowship Program to Alexis A. Diaz, award 2141410. We acknowledge the use of BioRender.com to create Fig 1.


Authors: Raisa Hernández-Pacheco, Alexandra L. Bland, Alexis A. Diaz, Alexandra G. Rosati, California State University, Long Beach, University of Michigan