STEM Writing Project
Student Report Archive
Biology Lab Reports Data Archive
The Challenge
All systematic writing studies start with a corpus of writing samples that represent the genre. The number and quality of the text samples within that corpus determines what questions can be asked, and often affect the reliability of any findings. This is particularly true for computational text analysis.
When we launched SWP, no well-structured corpus of student-authored texts existed. We could only find a few isolated examples of students' scientific writing. Most of the texts had been written to meet different requirements, and some had been edited by instructors to illustrate particular writing errors.
Our Approach
We assembled an archive of >4,000 de-identified biology student lab reports with metadata that can be used as:
- Examples for training instructors;
- Research data for linguistic analyses, etc.;
- Data for test other automated systems.
All reports were collected as part of an NSF-sponsored research study conducted at Wake Forest University under supervision of the Institutional Review Board (IRB Protocol #00022693, approved January 2017). Only reports from students who provided written informed consent (signatures on file) are available from the public archive.
Reports have been de-identified, and both author and grading instructor names replaced with randomized IDs. All report files have been converted to plain text (.txt) format. Metadata for each report are provided in .csv format, with a corresponding codebook.
Samples of BaSH scripts we used to remove identifying names and extraneous information are posted in the repository as well.
Resources | Links |
Full report archive (including metadata & codebook) | Link to GitHub repository (opening 7/1/22) |
Allowed & Prohibited Uses
Our reports archive is published under terms of a Creative Commons CC-BY-SA-NC 4.0 license. Reports can be used for research and assessment purposes. No commercial use in any form is permitted.
We realize that this archive is a tempting source of reports for students in BIO100 courses. To discourage this, our use license EXPLICITLY requires users attribute the original source when using these documents. By extension, students are not permitted to submit or use these reports (in whole or in part) in any way that suggests they are the original author of the text.
- Failure to identify one of our reports as another student's work is a violation of the Terms of Use.
- Using any report without proper attribution meets the generally accepted definition is plagiarism and/or academic misconduct.
The reports in our archive contain small marker errors that we can easily identify. Instructors who suspect their students may have used one or more of our reports without attribution should contact us immediately. We can assist you by scanning student reports for our embedded marker errors, searching the archive for potential sources, or help with text analysis to determine similarities.
Looking Ahead
To our knowledge the lab reports archive is the largest structured collection of undergraduate student reports for biology ever assembled. There are innumerable potential questions to ask, and we invite collaborators to contact us with ideas for future studies.
Also, check our list of To Do items for the assessments sub-project. Let us know if you want to take on one or more.
Where to Learn More
- Benoit, K., Watanabe, K., Wang, H., Nulty, P., Obeng, A., Müller, S., & Matsuo, A. (2018). quanteda: An R package for the quantitative analysis of textual data. Journal of Open Source Software, 3(30), 774. https://doi.org/10.21105/joss.00774
- Fellows, N. J. (1994). A window into thinking: Using student writing to understand conceptual change in science learning. Journal of Research in Science Teaching, 31(9), 985–1001. https://doi.org/10.1002/tea.3660310911
- Gottschalk, K. K. (2003). The ecology of response to student essays. ADE Bulletin, 134–135, 49–56.
- Henderson, C., & Dancy, M. H. (2007). Barriers to the use of research-based instructional strategies: The influence of both individual and situational characteristics. Physical Review Special Topics - Physics Education Research, 3(2), 020102. https://doi.org/10.1103/PhysRevSTPER.3.020102
- Reiff, M. J., & Bawarshi, A. (2011). Tracing Discursive Resources: How Students Use Prior Genre Knowledge to Negotiate New Writing Contexts in First-Year Composition. Written Communication, 28(3), 312–337.
- Reynolds, J. A., Thaiss, C., Katkin, W., & Thompson, R. J. J. (2012). Writing-to-learn in undergraduate science education: A community-based, conceptually driven approach. CBE Life Sciences Education, 11(1), 17–25. https://doi.org/10.1187/cbe.11-08-0064
- Reynolds, J., Smith, R., Moskovitz, C., & Sayle, A. (2009). BioTAP: A Systematic Approach to Teaching Scientific Writing and Evaluating Undergraduate Theses. BioScience, 59(10), 896–903.
- White, B., Frederiksen, J., & Collins, A. (2009). The interplay of scientific inquiry and metacognition. In D. J. Hacker, J. Dunlosky, & A. C. Graesser (Eds.), Handbook of Metacognition in Education (pp. 175–205). Routledge.
Comments
There are no comments on this entry.