The Challenge

Tracking students' development as scientific writers by “close reading” individual reports is impractical in large BIO101 classes. How then can students’ writing skills be evaluated longitudinally in these large courses?

Our Approach

We proposed using machine-scorable text features as proxy metrics for students’ development as writers.  For this study we assembled a suite of potential candidate metrics, then asked:

  • What does "good student scientific writing" look like? Which text features are informative?
  • What do these features tell us about changes in students' writing patterns?
  • Can proxy metrics provide useful insights about cohort-level changes over a curriculum sequence?

We divided our archive of >4400 student lab reports into 4 writer experience levels:

  • Novice students enrolled in their first college biology course.
  • Early-career students with 0.5-1.5 years of general college writing experience, and 1 semester of biology writing experience.
  • Mid-career students with 2+ years of general writing experience, but only 1 semester of biology writing experience.
  • Advanced students with 2+ years of general writing experience and 2+ prior semesters of biology writing experience

 Text features that were evaluated as proxy metrics fell into 3 categories:

  • Lexical range: # unique words, type/token ratios, word repetition rates
  • Word choices: working vocabulary, fractional type/token ratios
  • Readability: wordiness, word difficulty, sentence length and complexity

 Whether proxy metrics could predict assigned grades was tested using proportional odds ordinal logistic regression (POLR).


Lessons Learned

1. Several machine-scored metrics correlated well with students’ growing experience as writers.

  • Overall lexical richness (simple type-token ratio, Herdan’s C, Dugast’s U) did not change with experience.
  • Word repetition (Yule’s K, Simpson’s D, Herdan’s Vm) declined significantly (11.4-20.6%, p<0.001).

2. Lexical range & use of formal terms increased as students gained writing experience.

  • Total # unique words used rose 25.1% (p<0.001).
  • Use of academic & specialized terms grew faster (24.2-38.1%) than general terms (12.1%-17.8%), reflecting a move to more “formal” word choices.

3. Overall, 14/32 readability indices showed a relative association (phiC) > 0.2 over the 3-course series (p<0.001).

  • Not all readability indices correlated equally well with writing experience.
  • Indices emphasizing wordy items and frequency of long or polysyllabic words were more likely to be positively correlated with more writing experience.

4. Proxy metrics were poor predictors of individual student grades. Fit for single- & multi-factor POLR models was low, with 59% average predictive error on the best fit model (above; Nagelkerke pseudo-R2 = 0.187.)


In summary

We found that  selected proxy metrics can surface changes in students’ writing longitudinally across a curricular sequence, and for a cohort rather than just individual students. These proxy metrics are valuable because they are less subject to interpretation, and harder for students to  “game.” We also found that proxy features can help us triangulate on the intrinsic features of interest/value within students' writing that we want to develop over time. 


Available Resources

Resources Links
  Summary poster - 2022 IUSE Summit in Washington, DC   PDF file
  R Shiny web form for collecting well-structured student reports   Link to QUBES
  Archive of 4400 student reports and metadata    Link to QUBES
  Structured vocabularies & R scripts for analyses    Coming soon


