Biostatistics using R: A Laboratory Manual
Author(s): Raisa Hernández-Pacheco1, Alexis A Diaz1
California State University-Long Beach
5301 total view(s), 637 download(s)
- Chapter 1: Introduction to R and RStudio (v1.0)
- Chapter 2: Data sampling, accuracy, and precision (v1.0)
- Chapter 3: Visualizing data (v1.0)
- Chapter 4: Probability distributions (v1.0)
- Chapter 5: Hypothesis testing (v1.0)
- Chapter 6: Population proportions and the binomial distribution (v1.0)
- Chapter 7: The normal distribution (v1.0)
- Chapter 8: Comparing two means: the t-test (v1.0)
- Chapter 9: One-way analysis of variance (v1.0)
- Chapter 10: Two-way analysis of variance (v1.0)
- Chapter 11: Correlation and regression analyses (v1.0)
- Materials.zip(ZIP | 90 MB)
- License terms
Description
The main goals for this Biostatistics laboratory manual were to (1) provide direct biological content to each lab session by using authentic research data, and (2) introduce R programming language as the data management and statistical tool in a 200-level Biostatistics course for biology majors. For this, we invited faculty from our Biological Sciences department at California State University-Long Beach to share data with us. As a result, all chapters (except Chapter 1) feature a CSULB biology research lab and use their authentic research data to implement a particular statistical test. By introducing the study system and research questions being addressed by familiar people, we hoped to engaged students and bring a sense of connection to campus and to other students and faculty members. However, the manual was also developed as an open educational resource, free to the students, and fully available to any educator outside campus for implementation.
This Biostatistics laboratory manual was implemented virtually during Fall 2020 but was developed for face-to-face instruction. Thus, it can be implemented both ways. We recommend implementing one chapter per lab session with a follow-up take-home exercise where the students apply the gained statistical and programming knowledge. Ideally, the students would use the same programming tools they implemented in the chapter during the take-home exercise. Such take-home exercises can use authentic or simulated data. Given the technical challenges that may arrive when using computers for programming, we recommend lab sessions of no more than 25 students.
The structure of our semester also included a practical mid-term and a final independent research project in which students needed to generate a research question, analyze it with appropriate statistical and programming tools, and present it orally to the rest of the class in a short presentation.
Table of Content
Chapter 1: Introduction to R and RStudio
Chapter 2. Data sampling, accuracy, and precision. Featured: CNSM Vertebrate Collections
Chapter 3. Visualizing data. Featured: Marine Ecology Lab
Chapter 4. Probability distributions. Featured: Quantitative Ecology Lab
Chapter 5. Hypothesis testing. Featured: Wetlands Ecology Lab
Chapter 6. Population proportions and the binomial distribution. Featured: Avian Ecology Lab
Chapter 7. The normal distribution. Featured: Shark Lab
Chapter 8. Comparing two means: t-test. Featured: Mammal Lab
Chapter 9. One-way anova. Featured: Molecular and ecotoxicology Lab
Chapter 10. Two-way anova. Featured: Marine Ecology Lab
Chapter 11: Correlation and regression analyses. Featured: Microbial Genomics Lab
Notes
Structure and nomenclature used
With the exception of Chapter 1, all Chapters are structured into an Introduction, a pre-lab Worked example, and an in-class activity. The Introduction provides information about the study system and the general research question that will be addressed in the in-class activity. The pre-lab Worked example is designed as a practical guide to the statistical test being employed in the in-class activity and is meant to be studied by students before the lab section. The example is designed so that the student carries out the exercises step-by-step with a hand calculator. The in-class activity is an R-based exercise that uses authentic research data. These activities are designed to be carried out in a lab period (165 minutes approximately) and are structured into a combination of demonstrations where codes are provided, challenges where codes are not provided, and discussion questions. Every Chapter builds up from previous ones concerning both the statistical knowledge and the programming skills.
Across the manual, R functions and commands are presented in bold. R packages are presented in italics. “datasets” and “variables” are presented in quotes.
R and RStudio version
This manual was created using R version 3.6.1 and RStudio version 1.3.1093
R Core Team (2019). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.
Acknowledgments
We thank researchers at the Department of Biological Sciences of California State University-Long Beach who shared their data with us and thus made possible these Biostatistics laboratory lessons employing authentic data. Special thanks to Bengt Allen (Marine Ecology Lab), Renaud Berlemont (Microbial Genomics Lab), Erika Holland (Molecular and Ecotoxicology Lab), Chris Lowe (Shark Lab), Ari Martínez (Avian Ecology Lab), Ted Stankowich (Mammal Lab), and Christine Whitcraft (Wetlands Ecology Lab). We also thank Logan Luevano for his comments and suggestions, Ashley Carter for sharing resources, and CSULB Biostatistics students for their evaluations. The creation of this manual was partly funded by the Higher Education Emergency Relief Fund (HEERF) of the Cares Act.
Future adaptations
We welcome adaptations to our work! Editable files (.Rmd) are available upon request.
Cite this work
Researchers should cite this work as follows:
- Hernández-Pacheco, R., Diaz, A. A. (2021). Biostatistics using R: A Laboratory Manual. QUBES Educational Resources. doi:10.25334/EWZM-NS95