The Challenge

Tracking students' development as scientific writers by “close reading” individual reports is impractical in large BIO101 classes. How then can students’ writing skills be evaluated longitudinally in these large courses?
 

Our Approach

We proposed using machine-scorable text features as proxy metrics for students’ development as writers.  For this study we assembled a suite of potential candidate metrics, then asked:

  • What does "good student scientific writing" look like? Which text features are informative?
  • What do these features tell us about changes in students' writing patterns?
  • Can proxy metrics provide useful insights about cohort-level changes over a curriculum sequence?

We divided our archive of >4400 student lab reports into 4 writer experience levels:

  • Novice students enrolled in their first college biology course.
  • Early-career students with 0.5-1.5 years of general college writing experience, and 1 semester of biology writing experience.
  • Mid-career students with 2+ years of general writing experience, but only 1 semester of biology writing experience.
  • Advanced students with 2+ years of general writing experience and 2+ prior semesters of biology writing experience

 Text features that were evaluated as proxy metrics fell into 3 categories:

  • Lexical range: # unique words, type/token ratios, word repetition rates
  • Word choices: working vocabulary, fractional type/token ratios
  • Readability: wordiness, word difficulty, sentence length and complexity

 Whether proxy metrics could predict assigned grades was tested using proportional odds ordinal logistic regression (POLR).

 

Lessons Learned

1. Several machine-scored metrics correlated well with students’ growing experience as writers.

  • Overall lexical richness (simple type-token ratio, Herdan’s C, Dugast’s U) did not change with experience.
  • Word repetition (Yule’s K, Simpson’s D, Herdan’s Vm) declined significantly (11.4-20.6%, p<0.001).

2. Lexical range & use of formal terms increased as students gained writing experience.

  • Total # unique words used rose 25.1% (p<0.001).
  • Use of academic & specialized terms grew faster (24.2-38.1%) than general terms (12.1%-17.8%), reflecting a move to more “formal” word choices.

3. Overall, 14/32 readability indices showed a relative association (phiC) > 0.2 over the 3-course series (p<0.001).

  • Not all readability indices correlated equally well with writing experience.
  • Indices emphasizing wordy items and frequency of long or polysyllabic words were more likely to be positively correlated with more writing experience.

4. Proxy metrics were poor predictors of individual student grades. Fit for single- & multi-factor POLR models was low, with 59% average predictive error on the best fit model (above; Nagelkerke pseudo-R2 = 0.187.)

 

In summary

We found that  selected proxy metrics can surface changes in students’ writing longitudinally across a curricular sequence, and for a cohort rather than just individual students. These proxy metrics are valuable because they are less subject to interpretation, and harder for students to  “game.” We also found that proxy features can help us triangulate on the intrinsic features of interest/value within students' writing that we want to develop over time. 

 

Available Resources

Resources Links
  Summary poster - 2022 IUSE Summit in Washington, DC   PDF file
  R Shiny web form for collecting well-structured student reports   Link to QUBES
  Archive of 4400 student reports and metadata    Link to QUBES
  Structured vocabularies & R scripts for analyses    Coming soon

 


Where to Learn More

Theory

  1. Carpenter, J. H. C. H. (2001). It’s about the Science: Students Writing and Thinking about Data in a Scientific Writing Course. Language & Learning Across the Disciplines, 5, 2.
  2. McCannon, B. C. (2018). Readability and Research Impact. SSRN Electronic Journal. https://doi.org/10.2139/ssrn.3341573
  3. Oppenheimer, D. M. (2006). Consequences of erudite vernacular utilized irrespective of necessity: Problems with using long words needlessly. Applied Cognitive Psychology, 20(2), 139–156. https://doi.org/10.1002/acp.1178
  4. Page, E. B., & Paulus, D. H. (1968). The Analysis of Essays by Computer. Final Report.
  5. Plavén-Sigray, P., Matheson, G. J., Schiffler, B. C., & Thompson, W. H. (2017). The readability of scientific texts is decreasing over time. ELife, 6, e27725. https://doi.org/10.7554/eLife.27725
  6. Quitadamo, I. J., & Kurtz, M. J. (2007). Learning to improve: Using writing to increase critical thinking performance in general education biology. CBE Life Sciences Education, 6(2), 140–154. https://doi.org/10.1187/cbe.06-11-0203
  7. Tweedie, F. J., & Baayen, R. H. (1998). How Variable May a Constant Be? Measures of Lexical Richness in Perspective. Computers and the Humanities, 32(5), 323–352.
  8. Underwood, J. S., & Tregidgo, A. P. (2006). Improving student writing through effective feedback: Best practices and recommendations. Journal of Teaching Writing, 22, 73–97.


Metrics

  1. Bormuth, J. R. (1969). Development of Readability Analyses. Department of Health, Education, & Welfare.
  2. Browne, C., Culligan, B., & Phillips, J. (2013a). The New Academic Word List. http://www.newgeneralservicelist.org
  3. Browne, C., Culligan, B., & Phillips, J. (2013b). The New General Service List. http://www.newgeneralservicelist.org
  4. Coleman, M., & Liau, T. L. (1975). A computer readability formula designed for machine scoring. Journal of Applied Psychology, 69, 283–284.
  5. Davies, M. (2016). The Corpus of Contemporary American English (COCA): 520 million words, 1990-present. http://corpus.byu.edu/coca/
  6. Farr, J. N., Jenkins, J. J., & Paterson, D. G. (1951). Simplification of Flesch Reading Ease Formula. Journal of Applied Psychology, 35(5), 333–337. https://doi.org/10.1037/h0062427
  7. Flesch, R. (1948). A new readability yardstick. The Journal of Applied Psychology, 32(3), 221–233. https://doi.org/10.1037/h0057532
  8. Gunning, R. (1968). The Technique of Clear Writing, Revised Edition. McGraw-Hill. https://books.google.com/books?id=ofI0AAAAMAAJ
  9. Herdan, G. (1960). Type Token Mathematics. A Textbook of Mathematical Linguistics. (Vol. 4). Mouton & Co.
  10. Kincaid, J. P., Fishburne, R. P., Rogers, R. L., & Chissom, B. S. (1975). Derivation of New Readability Formulas (Automated Readability Index, Fog Count and Flesch Reading Ease Formula) for Navy Enlisted Personnel. Defense Technical Information Center. https://books.google.com/books?id=7Z7ENwAACAAJ
  11. McLaughlin, G. H. (1969). SMOG grading: A new readability formula. Journal of Reading, 12(8), 639–646.
  12. O’Hayre, J. (1966). Gobbledygook Has Gotta Go (p. 113). Bureau of Land Management. http://training.fws.gov/history/HistoricDocuments.html
  13. Powers, R. D., Sumner, W. A., & Kearl, B. E. (1958). A recalculation of four adult readability formulas. Journal of Educational Psychology, 49(2), 99–105. https://doi.org/10.1037/h0043254
  14. Simpson, E. H. (1949). Measurement of Diversity. Nature, 163(4148), 688–688. https://doi.org/10.1038/163688a0
  15. Smith, E. A., & Kincaid, J. P. (1970). Derivation and Validation of the Automated Readability Index for Use with Technical Materials. In Human Factors (Vol. 12, Issue 5, pp. 457–564). https://doi.org/doi:10.1177/001872087001200505
  16. Yule, G. U. (1968). The Statistical Study of Literary Vocabulary. Cambridge University Press. https://books.google.com/books?id=-R09AAAAIAAJ

 

Created by Dan Johnson Last Modified Tue June 14, 2022 11:54 am by Dan Johnson

Comments

There are no comments on this entry.