Resource Image

Biostatistics using R: A Laboratory Manual

Author(s): Raisa Hernández-Pacheco1, Alexis A Diaz1

California State University-Long Beach

5279 total view(s), 637 download(s)

2 comment(s) (Post a comment)

Description

The main goals for this Biostatistics laboratory manual were to (1) provide direct biological content to each lab session by using authentic research data, and (2) introduce R programming language as the data management and statistical tool in a 200-level Biostatistics course for biology majors. For this, we invited faculty from our Biological Sciences department at California State University-Long Beach to share data with us. As a result, all chapters (except Chapter 1) feature a CSULB biology research lab and use their authentic research data to implement a particular statistical test. By introducing the study system and research questions being addressed by familiar people, we hoped to engaged students and bring a sense of connection to campus and to other students and faculty members. However, the manual was also developed as an open educational resource, free to the students, and fully available to any educator outside campus for implementation.

This Biostatistics laboratory manual was implemented virtually during Fall 2020 but was developed for face-to-face instruction. Thus, it can be implemented both ways. We recommend implementing one chapter per lab session with a follow-up take-home exercise where the students apply the gained statistical and programming knowledge.  Ideally, the students would use the same programming tools they implemented in the chapter during the take-home exercise. Such take-home exercises can use authentic or simulated data. Given the technical challenges that may arrive when using computers for programming, we recommend lab sessions of no more than 25 students.

The structure of our semester also included a practical mid-term and a final independent research project in which students needed to generate a research question, analyze it with appropriate statistical and programming tools, and present it orally to the rest of the class in a short presentation.

 

Table of Content

Chapter 1: Introduction to R and RStudio

Chapter 2. Data sampling, accuracy, and precision. Featured: CNSM Vertebrate Collections

Chapter 3. Visualizing data. Featured: Marine Ecology Lab

Chapter 4. Probability distributions. Featured: Quantitative Ecology Lab

Chapter 5. Hypothesis testing. Featured: Wetlands Ecology Lab

Chapter 6. Population proportions and the binomial distribution. Featured: Avian Ecology Lab

Chapter 7. The normal distribution. Featured: Shark Lab 

Chapter 8. Comparing two means: t-test. Featured: Mammal Lab 

Chapter 9. One-way anova. Featured: Molecular and ecotoxicology Lab

Chapter 10. Two-way anova. Featured: Marine Ecology Lab

Chapter 11: Correlation and regression analyses. Featured: Microbial Genomics Lab

Notes

Structure and nomenclature used

With the exception of Chapter 1, all Chapters are structured into an Introduction, a pre-lab Worked example, and an in-class activity. The Introduction provides information about the study system and the general research question that will be addressed in the in-class activity. The pre-lab Worked example is designed as a practical guide to the statistical test being employed in the in-class activity and is meant to be studied by students before the lab section. The example is designed so that the student carries out the exercises step-by-step with a hand calculator. The in-class activity is an R-based exercise that uses authentic research data. These activities are designed to be carried out in a lab period (165 minutes approximately) and are structured into a combination of demonstrations where codes are provided, challenges where codes are not provided, and discussion questions. Every Chapter builds up from previous ones concerning both the statistical knowledge and the programming skills.

Across the manual, R functions and commands are presented in bold. R packages are presented in italics. “datasets” and “variables” are presented in quotes. 

 

R and RStudio version

This manual was created using R version 3.6.1 and RStudio version 1.3.1093

R Core Team (2019). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.

 

Acknowledgments

We thank researchers at the Department of Biological Sciences of California State University-Long Beach who shared their data with us and thus made possible these Biostatistics laboratory lessons employing authentic data. Special thanks to Bengt Allen (Marine Ecology Lab), Renaud Berlemont (Microbial Genomics Lab), Erika Holland (Molecular and Ecotoxicology Lab), Chris Lowe (Shark Lab), Ari Martínez (Avian Ecology Lab), Ted Stankowich (Mammal Lab), and Christine Whitcraft (Wetlands Ecology Lab). We also thank Logan Luevano for his comments and suggestions, Ashley Carter for sharing resources, and CSULB Biostatistics students for their evaluations. The creation of this manual was partly funded by the Higher Education Emergency Relief Fund (HEERF) of the Cares Act. 

 

Future adaptations

We welcome adaptations to our work! Editable files (.Rmd) are available upon request.

 

Cite this work

Researchers should cite this work as follows: