QUBES - STEM Writing Project: Assessment

Discoverability Visible
Join Policy Restricted
Created 01 May 2022

STEM Writing Project

Word Use Analysis

The Challenge

Tracking students' development as scientific writers by “close reading” individual reports is impractical in large BIO101 classes. How then can students’ writing skills be evaluated longitudinally in these large courses?

Our Approach

We proposed using machine-scorable text features as proxy metrics for students’ development as writers. For this study we assembled a suite of potential candidate metrics, then asked:

What does "good student scientific writing" look like? Which text features are informative?
What do these features tell us about changes in students' writing patterns?
Can proxy metrics provide useful insights about cohort-level changes over a curriculum sequence?

We divided our archive of >4400 student lab reports into 4 writer experience levels:

Novice students enrolled in their first college biology course.
Early-career students with 0.5-1.5 years of general college writing experience, and 1 semester of biology writing experience.
Mid-career students with 2+ years of general writing experience, but only 1 semester of biology writing experience.
Advanced students with 2+ years of general writing experience and 2+ prior semesters of biology writing experience

Text features that were evaluated as proxy metrics fell into 3 categories:

Lexical range: # unique words, type/token ratios, word repetition rates
Word choices: working vocabulary, fractional type/token ratios
Readability: wordiness, word difficulty, sentence length and complexity

Whether proxy metrics could predict assigned grades was tested using proportional odds ordinal logistic regression (POLR).

Lessons Learned

1. Several machine-scored metrics correlated well with students’ growing experience as writers.

Overall lexical richness (simple type-token ratio, Herdan’s C, Dugast’s U) did not change with experience.
Word repetition (Yule’s K, Simpson’s D, Herdan’s Vm) declined significantly (11.4-20.6%, p<0.001).

2. Lexical range & use of formal terms increased as students gained writing experience.

Total # unique words used rose 25.1% (p<0.001).
Use of academic & specialized terms grew faster (24.2-38.1%) than general terms (12.1%-17.8%), reflecting a move to more “formal” word choices.

3. Overall, 14/32 readability indices showed a relative association (phi_C) > 0.2 over the 3-course series (p<0.001).

Not all readability indices correlated equally well with writing experience.
Indices emphasizing wordy items and frequency of long or polysyllabic words were more likely to be positively correlated with more writing experience.

4. Proxy metrics were poor predictors of individual student grades. Fit for single- & multi-factor POLR models was low, with 59% average predictive error on the best fit model (above; Nagelkerke pseudo-R² = 0.187.)

In summary

We found that selected proxy metrics can surface changes in students’ writing longitudinally across a curricular sequence, and for a cohort rather than just individual students. These proxy metrics are valuable because they are less subject to interpretation, and harder for students to “game.” We also found that proxy features can help us triangulate on the intrinsic features of interest/value within students' writing that we want to develop over time.

Available Resources

Resources	Links
Summary poster - 2022 IUSE Summit in Washington, DC	PDF file
R Shiny web form for collecting well-structured student reports	Link to QUBES
Archive of 4400 student reports and metadata	Link to QUBES
Structured vocabularies & R scripts for analyses	Coming soon

Where to Learn More

Theory

Carpenter, J. H. C. H. (2001). It’s about the Science: Students Writing and Thinking about Data in a Scientific Writing Course. Language & Learning Across the Disciplines, 5, 2.
McCannon, B. C. (2018). Readability and Research Impact. SSRN Electronic Journal. https://doi.org/10.2139/ssrn.3341573
Oppenheimer, D. M. (2006). Consequences of erudite vernacular utilized irrespective of necessity: Problems with using long words needlessly. Applied Cognitive Psychology, 20(2), 139–156. https://doi.org/10.1002/acp.1178
Page, E. B., & Paulus, D. H. (1968). The Analysis of Essays by Computer. Final Report.
Plavén-Sigray, P., Matheson, G. J., Schiffler, B. C., & Thompson, W. H. (2017). The readability of scientific texts is decreasing over time. ELife, 6, e27725. https://doi.org/10.7554/eLife.27725
Quitadamo, I. J., & Kurtz, M. J. (2007). Learning to improve: Using writing to increase critical thinking performance in general education biology. CBE Life Sciences Education, 6(2), 140–154. https://doi.org/10.1187/cbe.06-11-0203
Tweedie, F. J., & Baayen, R. H. (1998). How Variable May a Constant Be? Measures of Lexical Richness in Perspective. Computers and the Humanities, 32(5), 323–352.
Underwood, J. S., & Tregidgo, A. P. (2006). Improving student writing through effective feedback: Best practices and recommendations. Journal of Teaching Writing, 22, 73–97.

Metrics

Bormuth, J. R. (1969). Development of Readability Analyses. Department of Health, Education, & Welfare.
Browne, C., Culligan, B., & Phillips, J. (2013a). The New Academic Word List. http://www.newgeneralservicelist.org
Browne, C., Culligan, B., & Phillips, J. (2013b). The New General Service List. http://www.newgeneralservicelist.org
Coleman, M., & Liau, T. L. (1975). A computer readability formula designed for machine scoring. Journal of Applied Psychology, 69, 283–284.
Davies, M. (2016). The Corpus of Contemporary American English (COCA): 520 million words, 1990-present. http://corpus.byu.edu/coca/
Farr, J. N., Jenkins, J. J., & Paterson, D. G. (1951). Simplification of Flesch Reading Ease Formula. Journal of Applied Psychology, 35(5), 333–337. https://doi.org/10.1037/h0062427
Flesch, R. (1948). A new readability yardstick. The Journal of Applied Psychology, 32(3), 221–233. https://doi.org/10.1037/h0057532
Gunning, R. (1968). The Technique of Clear Writing, Revised Edition. McGraw-Hill. https://books.google.com/books?id=ofI0AAAAMAAJ
Herdan, G. (1960). Type Token Mathematics. A Textbook of Mathematical Linguistics. (Vol. 4). Mouton & Co.
Kincaid, J. P., Fishburne, R. P., Rogers, R. L., & Chissom, B. S. (1975). Derivation of New Readability Formulas (Automated Readability Index, Fog Count and Flesch Reading Ease Formula) for Navy Enlisted Personnel. Defense Technical Information Center. https://books.google.com/books?id=7Z7ENwAACAAJ
McLaughlin, G. H. (1969). SMOG grading: A new readability formula. Journal of Reading, 12(8), 639–646.
O’Hayre, J. (1966). Gobbledygook Has Gotta Go (p. 113). Bureau of Land Management. http://training.fws.gov/history/HistoricDocuments.html
Powers, R. D., Sumner, W. A., & Kearl, B. E. (1958). A recalculation of four adult readability formulas. Journal of Educational Psychology, 49(2), 99–105. https://doi.org/10.1037/h0043254
Simpson, E. H. (1949). Measurement of Diversity. Nature, 163(4148), 688–688. https://doi.org/10.1038/163688a0
Smith, E. A., & Kincaid, J. P. (1970). Derivation and Validation of the Automated Readability Index for Use with Technical Materials. In Human Factors (Vol. 12, Issue 5, pp. 457–564). https://doi.org/doi:10.1177/001872087001200505
Yule, G. U. (1968). The Statistical Study of Literary Vocabulary. Cambridge University Press. https://books.google.com/books?id=-R09AAAAIAAJ

Comments

There are no comments on this entry.