Using Constructed Response Questions on High-Stakes Assessments in Large Classes With Limited Support

Norris Armstrong*; Sandii Constable; Norris Armstrong*; Sandii Constable; Norris Armstrong*; Sandii Constable

Teaching Tools and Strategies

Browse Files

Using Constructed Response Questions on High-Stakes Assessments in Large Classes With Limited Support

Author(s): Norris Armstrong*¹, Sandii Constable¹

University of Georgia

Editor: Melanie Melendrez-Vallard

Published online: 11 Jun 2024

Courses: Genetics

Keywords: assessment test exam Large class constructed response short answer scoring

1339 total view(s), 50 download(s)

Browse Files

Login to access supporting documents

Abstract

Large lecture courses often rely heavily, and sometimes exclusively, on Multiple Choice (MC) exams to assess student learning. Constructed Response (CR) questions, in which students generate their own answers, can provide more insight into student learning than MC questions alone but are seldom used in large courses because they are more challenging and labor intensive to score. We describe a strategy for using CR questions on assessments even in very large lectures courses with the support of undergraduate assistants.

Primary Image: A partially graded exam with short answer questions.

Citation

Armstrong N, Constable S. 2024. Using Constructed Response Questions on High-Stakes Assessments in Large Classes With Limited Support. CourseSource 11. https://doi.org/10.24918/cs.2024.19

Article Context

Introduction

Assessment has been defined as the process by which the quality of an individual’s work on a specific task is used to infer more generally what that person knows or can do. Assessment can be used by individuals to evaluate their own knowledge and skills (self-assessment), by instructors to give feedback during the learning process (formative assessment), or to evaluate an individual’s knowledge and ability after a stage of learning has been completed (summative assessment). Regardless of how it is performed, assessment is a key component of learning because it provides information on an individual’s strengths, where there is room for improvement, and can help instructors revise their teaching methods to better meet a learner’s needs (1).

To correctly infer what an individual knows and can do, it is essential that assessments align with the knowledge and skills being examined and that they accurately measure a person’s abilities. Several authors have argued that multiple forms of assessment are needed to capture different components of learning (1–6), and a variety of self- and formative assessment formats have been developed to do this (7–13). However, summative assessment in large courses often relies heavily on a single format, multiple-choice exams (14–18).

Instructors frequently use multiple-choice (MC) questions for summative assessments because this format is considered to be an objective, reliable, and efficient way to examine learning (5, 6). However, relying primarily on MC questions to assess learning has drawbacks. MC questions do not reveal the thought processes students’ use when selecting their answers and can miss misconceptions if these ideas are not among the available answer options (3, 19–23). MC questions can also overestimate knowledge if a student can correctly guess answers from the limited options available or by using criteria that have little to do with the concepts being assessed (24–26). As an example, Figure 1 shows an MC question that asks students to apply their understanding of chromosome and allele behavior to predict the genotypes of the daughter cells produced by cell division. Cell division is a fundamental process in cell biology and genetics and understanding chromosome/allele behavior is critical for comprehending how genetic traits are transferred from one generation to the next in a wide variety of contexts. We have found that students can often answer this question correctly even when they are unable to explain how cell division takes place or how this process could generate the result they selected (see Supporting File S1 for more detail).

In contrast to MC questions in which answer(s) are selected from a limited number of options, constructed-response (CR) questions require students to generate their own answers. Self-generated answers can give greater insight into student thinking and can better reveal potential misconceptions. This format also makes it less likely that a student can guess a correct answer (27, 28). Figure 2 shows a revised version of the question shown in Figure 1 that asks students to generate their own answer illustrating how chromosome and allele behavior can lead to the production daughter cells with different genotypes. Responses to similar questions in our classes have revealed that many students have incomplete or incorrect understanding of one or more different aspects of cell division including how chromosomes line up during meiosis, the difference between sister chromatids and homologous chromosomes, the importance of DNA replication in cell division, and how chromosomes and alleles will appear after replication has taken place (see Supporting File S1 for more detail).

Another advantage of using CR questions on assessments is that this approach can encourage students to adopt study habits that promote more in-depth learning. MC assessments often assess low-level cognitive skills such as simple recall (17, 29, 30) and, even when they do measure higher order skills, MC format exams can prompt students to adopt strategies that promote low level learning. In contrast, CR questions are believed to encourage more active, higher-level learning approaches (24, 30). Indeed, we have observed that students often try to answer MC practice problems by eliminating some answer options and then selecting from the remaining choices using criteria that are unrelated to the concepts being assessed. However, when asked to generate their own answers, students were more likely to think through and apply relevant concepts in order to develop a response. As a result, including CR and other question types on exams may promote more effective learning than using MC-based assessments alone.

Despite the potential benefits of using CR questions, large classes generally do not use this format because the responses generated by students can be more labor intensive and difficult to score. Below, we describe how we have been able to use and score CR questions on summative assessments in very large enrollment courses with careful planning and the help of undergraduate assistants.

Question Design

We have found that successfully using CR questions on assessments in large courses depends heavily on the quality of the questions being used and by the length of the answers these questions prompt. Questions that are unclear or complicated tend to generate unclear responses that require more time, effort, and interpretation to score. Similarly, questions that prompt lengthy responses tend to be more difficult to score (31) and can be made even more challenging when students write longer answers than needed.

Instructors can help ensure that the exam questions they write are straightforward and clear with a couple of simple steps. One is to prepare a clear, detailed scoring rubric that includes examples of possible answers before the assessment is administered (31–33). Questions for which it is difficult to write brief, clear answers on the scoring rubric are almost certain to generate messy student responses. Similarly, questions that have multiple acceptable answers should generally be avoided as these can require more time and effort to interpret and score. A second useful step to ensure question clarity is to have another instructor or a trusted graduate or undergraduate student read over the assessment. A fresh set of eyes can identify questions that might seem clear to the instructor but could be confusing to students or that may have alternative answers that the instructor did not anticipate.

Another approach to discourage excessively long answers to questions is to provide instructions at the start of an assessment, or in the questions themselves, letting students know how long their answers should be and/or where they should be placed. For example, instructions could indicate that an answer should not extend beyond a given length or should fit within a space provided with each question. Long and messy answers can also be avoided by assessing different aspects of complex concepts separately. Assessing individual parts of a complex concept generates answers that are easier to score and also make it easier to identify where students have difficulty (31–33). Breaking questions that assess complex concepts into multiple parts also allows instructors to award partial credit when students understand part, but not all, of concept. Having students draw or annotate simple figures or diagrams rather than using words may also generate answers that are easier to score (34). Most CR questions on our assessments can be answered using a few words, 1–2 sentences, or simple figures (see Supporting File S2).

It could be argued that CR questions that require very brief answers are no better than MC questions. However, we have found that answers consisting of as little as a single word, number, or a simple figure can be very effective at assessing students’ understanding of concepts and can reveal different types of errors and misconceptions they may hold. For example, the question shown in Figure 3 asks students to apply their understanding of DNA/RNA structure and replication to predict the results of a chemical reaction central to both cellular processes and biotechnology. Though this question requires only a brief answer, students have submitted almost a dozen different responses to this type question that enable us to determine if they understand how primers and template DNA function, how DNA polymerase acts, and the importance of DNA directionality during replication. CR questions have also enabled us to ask a broader range of questions than is possible using MC questions alone and has given us deeper insight into student learning and understanding.

Using Undergraduate Assistants to Help Evaluate CR Question Responses

Scoring even brief responses to CR question in large courses is a labor-intensive process that instructors would find challenging to do by themselves. One potential solution to this problem is to enlist the help of undergraduate students in the evaluation process. Using undergraduates to support classroom activities and learning is well-established. Undergraduate students are often asked to work together on class activities, can work as classroom assistants to help instructors support the learning of other students, and schools often offer ad-hoc tutoring programs that are staffed largely by undergraduates (35–37). Methods have been developed that ask students to evaluate work completed by their peers (38) and are popular enough that commercial software has been created to facilitate this technique (39, 40). In some schools, undergraduates can even serve as teaching assistants who independently teach, manage, and assess students in laboratory and recitation sections (41–45). We have found that Undergraduate Assistants (UAs) can do an excellent job helping to evaluate responses to CR questions on assessments in large courses.

If undergraduates help with assessment, recruiting qualified candidates is important to ensure that assessment questions are scored reliably and accurately. There is no single trait that is critical for a student to be a good UA, but we have found that students who have previously taken and done well in the same course they will support or have completed more advanced courses in the same discipline will have a good understanding of the concepts being taught and will be familiar with the types of CR questions typically being asked. Students with prior work experience or have held positions of responsibility, such as leadership positions with campus groups, tend to be hard working, dependable, can follow instructions well, and are usually good at working independently.

The number of UAs needed will depend on the size of the course they will assist, the number of assessments that need to be evaluated, and the number and type of CR questions on each assessment. We recruit enough UAs to ensure that scoring can be completed within a week of administering an assessment and our goal is to require less than 10 hours of time from each UA for each assessment. We also recruit at least one more UA than the minimum needed so that we can accommodate the UAs’ own schedules and still score the assessments in a timely manner. For example, our courses enroll about 600 students a semester with four to five midterm exams each with 10–15 CR questions that can be answered with a single word/number, 1–2 sentences, or a simple figure. To score these assessments, we need a minimum of 3 and typically recruit 4–5 UAs. Courses with different enrollments and assessments may need a different number of UAs.

Scoring Assessments

Scoring CR questions, even with UA support, can be time consuming but is manageable if well organized. Below are some of the steps that we have found to be most important.

Students are asked to enter only their initials and student ID on their assessments. This reduces the likelihood that individual assessments can be identified during scoring.
The UAs score the assessments in or near the instructor’s office so that the instructor is readily available to answer any questions and can address problems as soon as they arise. The assessments should also be securely stored in the instructor’s office reducing concerns about privacy and academic integrity.
The same person scores all responses to a given question. This speeds up and improves the consistency of scoring. Exceptions can be made for questions that are straightforward to score.
The instructor creates a detailed rubric that includes correct answers, common incorrect answers, and point values that the UAs will use to score the assessments (8, 46) (see Supporting File S3).
All UAs use same notation system for scoring (e.g., awarding points for correct answers rather than deducting for incorrect). This facilitates tallying point totals for the assessment after scoring has been completed and makes it easier for students to interpret their results when the assessments are returned.
The UAs are discouraged from leaving written comments as these can greatly slow down the scoring process. Instead, the instructor can provide students with the scoring rubric and/or an assessment key explaining how the assessment was scored and addressing common errors.
The UAs are asked score answers that are either clearly correct or incorrect first and to set other answers aside. The UA can then sort ambiguous answers into groups based on the type(s) of errors they display and discuss with the instructor how each answer group should be scored, increasing the speed and consistency of scoring.
Establish a system for identifying which questions and assessments have been scored and which have not. This is especially important if the UAs work at different times or for very large courses when it may not be possible to fully score a question in one sitting. Labeling partially scored assessments with sticky notes has worked well for us.
We print adhesive labels with each students’ name and student ID using the Microsoft Word Mail Merge function and affix these to assessments with the same ID after scoring has been completed. We then enter students’ scores for each assessment question into a spreadsheet. Scoring individual questions allows us to better evaluate different parts of the assessment and to identify concepts that students find more challenging.

Returning Assessments

We recommend scanning and storing all assessments electronically after which the original assessment (or scanned copy) can be returned to the students. The instructor’s copy serves as a record of the original submission if students request that their assessment be rescored and to help encourage academic integrity (47).

Closing Remarks

We have used the process described above to include CR questions on assessments even in very large courses without the need for graduate teaching assistants. The UAs did an excellent job helping us to evaluate student responses and provide a unique perspective how students interpret CR questions. Discussing the questions and student responses with the UAs improved how we prepare and score our assessments and enabled us to use a wider variety of question types. Using CR questions has also helped us better determine where students have difficulty and to identify the types of errors they tend to make. We have been able to use this information to change how we teach different topics to better help students understand and master the concepts being examined.

A concern about CR questions is that scoring these questions is more subjective and potentially less accurate than the scoring of MC questions. This has led some to some debate as to whether CR questions should be used for summative assessments (48), especially when scoring is performed by individuals who are not content experts and are more likely to make mistakes (42, 49). However, research has shown that student work can be evaluated accurately by non-experts, including other students, if the assessments are well designed, have defined scoring rubrics, and if the evaluators are provided with appropriate guidance and support (28, 33, 50–54). We have found that UAs can score CR questions efficiently and effectively when provided with guidance and a detailed rubric. When problems do occasionally arise we have been able to easily address them by discussing the questions and student responses with the UAs during the scoring process. By providing students with a detailed scoring rubric that they can compare against their own answers and allowing them to request that the instructor rescore a question, any errors that are made can be identified and corrected. Regrade requests that we have received on our assessments indicate that errors occur infrequently and at similar rates for questions scored by UAs and by the instructor.

Additional concerns about having undergraduates evaluate the work of other students include privacy and the risk of potential bias during scoring (50, 54–57). FERPA regulations allow peer evaluation of student work (57), but some students may not be comfortable if peers who are not part of the course are able to view their assignments. Bias during assignment scoring can take many forms, can affect student scores both positively and negatively, and even the perception of possible bias during scoring can damage students’ trust that their work is being assessed fairly and accurately. Both privacy and bias concerns can be minimized through some of the steps described above. Having students only provide their initials and student identification numbers on assessments reduces the likelihood that assessments could be associated with specific individuals until after scoring has been completed. We further reduce the possibility of bias during scoring by having each UA score only part of each assessment, by providing students with a key and/or scoring rubric when we return their assessments, and by allowing students to request that a question be reevaluated by the instructor if they believe that their response has been assessed incorrectly.

Many of the steps to facilitate the scoring of CR questions we describe above can now be managed electronically. For example, the software GradeScope (58) enables instructors to create and adjust a scoring rubric while evaluating student responses, can sort students’ responses to individual questions into related groups, assign scores to responses in sorted groups en masse, securely return scored assessments to the students electronically, and can facilitate student regrade requests. The software can even presort very short answers (single words or numbers) into groups automatically. As a result, this software can enable instructors to use CR questions on assessments in moderately sized courses even without the aid of Undergraduate Assistants or for very large courses with undergraduate support.

It is important to note that, even with the help of software and Undergraduate Assistants, the use of CR questions in large courses still requires a good deal of effort. Receiving and incorporating feedback to early assessment drafts, preparing scoring rubrics, organizing and scoring CR questions, and responding to student regrade requests requires a considerable amount of time. However, careful planning along with and the help of UAs and/or software can significantly reduce the instructor’s workload and the increased insight into learning provided by students’ responses makes the extra work involved in using CR questions on assessments well worth the effort.

Supporting Materials

S1. Using CR Questions – CR Question Benefit to Instruction
S2. Using CR Questions – CR Question Answer Limits
S3. Using CR Questions – Example Scoring Rubric

Acknowledgments

We would like to acknowledge the UGA Biology Division and the UGA Office for Instruction who provided support for our efforts to develop new assessment formats for our large courses and the undergraduate assistants who have helped us implement and refine this approach over several semesters.

References

Joughin G. 2009. Introduction: Refocusing assessment, p 1–11. In Joughin G (ed), Assessment, learning and judgement in higher education. Springer Netherlands, Dordrecht, Netherlands. doi:10.1007/978-1-4020-8905-3_1.
Laverty JT, Underwood SM, Matz RL, Posey LA, Carmel JH, Caballero MD, Fata-Hartley CL, Ebert-May D, Jardeleza SE, Cooper MM. 2016. Characterizing college science assessments: The Three-Dimensional Learning Assessment Protocol. PLOS ONE 11:e0162333. doi:10.1371/journal.pone.0162333.
Hubbard JK, Potts MA, Couch BA. 2017. How question types reveal student thinking: An experimental comparison of multiple-true-false and free-response formats. CBE Life Sci Educ 16:ar26. doi:10.1187/cbe.16-12-0339.
Fu AC, Raizen SA, Shavelson RJ. 2009. The nation’s report card: A vision of large-scale science assessment. Science 326:1637–1638. doi:10.1126/science.1177780.
Dysthe O. 2007. The challenges of assessment in a new learning culture, p 15–28. In Havnes A, McDowell L (ed), Balancing dilemmas in assessment and learning in contemporary education. Taylor and Francis Group, New York, NY.
Douglas M, Wilson J, Ennis E. 2012. Multiple-choice question tests: A convenient, flexible, and effective learning tool? A case study. Innov Educ Teach Int 49:111–121. doi:10.1080/14703297.2012.677596.
Morris R, Perry T, Wardle L. 2021. Formative assessment and feedback for learning in higher education: A systematic review. Rev Educ 9:e3292. doi:10.1002/rev3.3292.
Bissell AN, Lemons PP. 2006. A new method for assessing critical thinking in the classroom. BioScience 56:66–72. doi:10.1641/0006-3568(2006)056[0066:ANMFAC]2.0.CO;2.
Kulkarni C, Wei KP, Le H, Chia D, Papadopoulos K, Cheng J, Koller D, Klemmer SR. 2015. Peer and self assessment in massive online classes, p 131–168. In Plattner H, Meinel C, Leifer L (ed), Design thinking research: Building innovators. Springer International Publishing, Cham, Switzerland. doi:10.1007/978-3-319-06823-7_9.
Knight JK, Wood WB. 2005. Teaching more by lecturing less. Cell Biol Educ 4:298–310. doi:10.1187/05-06-0082.
Stanton JD, Sebesta AJ, Dunlosky J. 2021. Fostering metacognition to support student learning and performance. CBE Life Sci Educ 20:fe3. doi:10.1187/cbe.20-12-0289.
McDaniel MA, Bugg JM, Liu Y, Brick J. 2015. When does the test-study-test sequence optimize learning and retention? J Exp Psychol Appl 21:370–382. doi:10.1037/xap0000063.
Tanner KD. 2012. Promoting student metacognition. CBE Life Sci Educ 11:113–120. doi:10.1187/cbe.12-03-0033.
Butler AC. 2018. Multiple-choice testing in education: Are the best practices for assessment also good for learning? J Appl Res Mem Cogn 7:323–331. doi:10.1016/j.jarmac.2018.07.002.
Carnegie JA. 2017. Does correct answer distribution influence student choices when writing multiple choice examinations? Can J Scholarsh Teach Learn 8. doi:10.5206/cjsotl-rcacea.2017.1.11.
Mullen K, Schultz M. 2012. Short answer versus multiple choice examination questions for first year chemistry. Int J Innov Sci Math Educ 20.
Momsen JL, Long TM, Wyse SA, Ebert-May D. 2010. Just the facts? Introductory undergraduate biology courses focus on low-level cognitive skills. CBE Life Sci Educ 9:435–440. doi:10.1187/cbe.10-01-0001.
DeAngelo L, Hurtado S, Pryor JH, Kelly KR, Santos JL, Korn WS. 2009. The American college teacher: National norms for the 2007–2008 HERI faculty survey. Higher Education Research Institute at University of California, Los Angeles, Los Angeles, CA.
Couch BA, Hubbard JK, Brassil CE. 2018. Multiple–true–false questions reveal the limits of the multiple–choice format for detecting students with incomplete understandings. BioScience 68:455–463. doi:10.1093/biosci/biy037.
Bacon DR. 2003. Assessing learning outcomes: A comparison of multiple-choice and short-answer questions in a marketing context. J Mark Educ 25:31–36. doi:10.1177/0273475302250570.
Dufresne RJ, Leonard WJ, Gerace WJ. 2002. Making sense of students’ answers to multiple-choice questions. Phys Teach 40:174–180. doi:10.1119/1.1466554.
Nehm RH, Schonfeld IS. 2008. Measuring knowledge of natural selection: A comparison of the CINS, an open-response instrument, and an oral interview. J Res Sci Teach 45:1131–1160. doi:10.1002/tea.20251.
Smith JI, Tanner K. 2010. The problem of revealing how students think: Concept inventories and beyond. CBE Life Sci Educ 9:1–5. doi:10.1187/cbe.09-12-0094.
Kuechler WL, Simkin MG. 2010. Why is performance on multiple-choice tests and constructed-response tests not more closely related? Theory and an empirical test. Decis Sci J Innov Educ 8:55–73. doi:10.1111/j.1540-4609.2009.00243.x.
Ibbett NL, Wheldon BJ. 2016. The incidence of clueing in multiple choice testbank questions in accounting: some evidence from Australia. E-J Bus Educ Scholarsh Teach 10:20–35.
McKenna P. 2019. Multiple choice questions: Answering correctly and knowing the answer. Interact Technol Smart Educ 16:59–73. doi:10.1108/ITSE-09-2018-0071.
Halim AS, Finkenstaedt-Quinn SA, Olsen LJ, Gere AR, Shultz GV. 2018. Identifying and remediating student misconceptions in introductory biology via writing-to-learn assignments and peer review. CBE Life Sci Educ 17:ar28. doi:10.1187/cbe.17-10-0212.
Olvet DM, Bird JB, Fulton TB, Kruidering M, Papp KK, Qua K, Willey JM, Brenner JM. 2022. A multi-institutional study of the feasibility and reliability of the implementation of constructed response exam questions. Teach Learn Med 35:609–622. doi:10.1080/10401334.2022.2111571.
Falchikov N, Thompson K. 2008. Assessment: What drives innovation? J Univ Teach Learn Pract 5:55–67. doi:10.53761/1.5.1.5.
Stanger-Hall KF. 2012. Multiple-choice exams: An obstacle for higher-level thinking in introductory science classes. CBE Life Sci Educ 11:294–306. doi:10.1187/cbe.11-11-0100.
Black B, Suto I, Bramley T. 2011. The interrelations of features of questions, mark schemes and examinee responses and their impact upon marker agreement. Assess Educ Princ Policy Pract 18:295–318. doi:10.1080/0969594X.2011.555328.
Hogan TP, Murphy G. 2007. Recommendations for preparing and scoring constructed-response items: What the experts say. Appl Meas Educ 20:427–441. doi:10.1080/08957340701580736.
Suto I, Nádas R, Bell J. 2011. Who should mark what? A study of factors affecting marking accuracy in a biology examination. Res Pap Educ 26:21–51. doi:10.1080/02671520902721837.
Quillin K, Thomas S. 2015. Drawing-to-learn: A framework for using drawings to promote model-based reasoning in biology. CBE Life Sci Educ 14:es2. doi:10.1187/cbe.14-08-0128.
Topping KJ. 1996. The effectiveness of peer tutoring in further and higher education: A typology and review of the literature. High Educ 32:321–345. doi:10.1007/BF00138870.
Arco-Tirado JL, Fernández-Martín FD, Hervás-Torres M. 2020. Evidence-based peer-tutoring program to improve students’ performance at the university. Stud High Educ 45:2190–2202. doi:10.1080/03075079.2019.1597038.
Learning Assistant Alliance. n.d. Home page. Retrieved from https://www.learningassistantalliance.org (accessed 27 April 2023).
Lim SCJ. 2018. Content knowledge mastery and peer assessment quality: A preliminary case study in an undergraduate engineering course, p 185–189. In Proceedings of the 2018 IEEE 10th International Conference on Engineering Education (ICEED), Kuala Lumpur, Malaysia. doi:10.1109/ICEED.2018.8626907.
PeerAssessment.com. n.d. Home page. Retrieved from https://peerassessment.com (accessed 24 April 2023).
Peerceptive. n.d. Home page. Retrieved from https://peerceptiv.com (accessed 23 April 2023)
Schalk KA, McGinnis JR, Harring JR, Hendrickson A, Smith AC. 2009. The undergraduate teaching assistant experience offers opportunities similar to the undergraduate research experience. J Microbiol Biol Educ 10:32–42. doi:10.1128/jmbe.v10.97.
Hogan TP, Norcross JC. 2012. Preparing for the future: Undergraduates as teaching assistants, p 197–206. In Buskist W, Benassi VA (ed), Effective college and university teaching: Strategies and tactics for the new professoriate. SAGE Publications Inc., Thousand Oaks, CA.
Weidert JM, Wendorf AR, Gurung RAR, Filz T. 2012. A survey of graduate and undergraduate teaching assistants. Coll Teach 60:95–103. doi:10.1080/87567555.2011.637250.
Philipp SB, Tretter TR, Rich CV. 2016. Development of undergraduate teaching assistants as effective instructors in STEM courses. J Coll Sci Teach 45:74–82.
Wheeler LB, Maeng JL, Chiu JL, Bell RL. 2017. Do teaching assistants matter? Investigating relationships between teaching assistants and student outcomes in undergraduate science laboratory classes. J Res Sci Teach 54:463–492. doi:10.1002/tea.21373.
Jonsson A, Svingby G. 2007. The use of scoring rubrics: Reliability, validity and educational consequences. Educ Res Rev 2:130–144. doi:10.1016/j.edurev.2007.05.002.
Caughran JA, Morrison RW. 2015. Returning written assignments electronically: Adapting off-the-shelf technology to preserve privacy and exam integrity. J Chem Educ 92:1254–1255. doi:10.1021/ed500577x.
Schuwirth LWT, Van Der Vleuten CPM. 2004. Different written assessment methods: What can be said about their strengths and weaknesses? Med Educ 38:974–979. doi:10.1111/j.1365-2929.2004.01916.x.
Wald N, Harland T. 2020. Rethinking the teaching roles and assessment responsibilities of student teaching assistants. J Furth High Educ 44:43–53. doi:10.1080/0309877X.2018.1499883.
Tisi J, Whitehouse G, Maughan S, Burdett N. 2013. A review of literature on marking reliability research (report for Ofqual). National Foundation for Educational Research Slough, United Kingdom.
Ahmed A, Pollitt A. 2011. Improving marking quality through a taxonomy of mark schemes. Assess Educ Princ Policy Pract 18:259–278. doi:10.1080/0969594X.2010.546775.
Sütő WMI, Nádas R. 2008. What determines GCSE marking accuracy? An exploration of expertise among maths and physics markers. Res Pap Educ 23:477–497. doi:10.1080/02671520701755499.
Van Hattum-Janssen N, Pacheco JA, Vasconcelos RM. 2004. The accuracy of student grading in first-year engineering courses. Eur J Eng Educ 29:291–298. doi:10.1080/0304379032000157259.
Bernstein DJ. 1979. Reliability and fairness of grading in a mastery program. Teach Psychol 6:104–107. doi:10.1207/s15328023top0602_13.
Fleming ND. 1999. Biases in marking students’ written work: Quality? p 83–92. In Brown S, Glasner A (ed), Assessment matters in higher education: Choosing and using diverse approaches. Society for Research into Higher Education / Open University Press, Buckingham, United Kingdom.
Li L. 2017. The role of anonymity in peer assessment. Assess Eval High Educ 42:645–656. doi:10.1080/02602938.2016.1174766.
Ramirez CA. 2009. FERPA clear and simple: The college professional’s guide to compliance. Jossey-Bass, San Francisco, CA.
Gradescope. n.d. Home page. Turnitin Inc. Retrieved from https://www.gradescope.com (accessed 29 July 2023).

Article Files

Login to access supporting documents

Armstrong-Constable-Using Constructed Response Questions on High-Stakes Assessments in Large Classes With Limited Support.pdf(PDF | 401 KB)
S1. Using CR Questions - CR Question Benefit to Instruction.docx(DOCX | 299 KB)
S2. Using CR Questions - CR Question Answer Limits.docx(DOCX | 36 KB)
S3. Using CR Questions - Example Scoring Rubric.docx(DOCX | 61 KB)
License terms

Authors

Author(s): Norris Armstrong*¹, Sandii Constable¹

University of Georgia

About the Authors

*Correspondence to: Norris Armstrong, 1000 Cedar St., Division of Biology, University of Georgia, Athens, Georgia, 30602; narmstro@uga.edu

Competing Interests

None of the authors have a financial, personal, or professional conflict of interest related to this work.

Comments

There are no comments on this resource.

Using Constructed Response Questions on High-Stakes Assessments in Large Classes With Limited Support

Abstract

Citation

Article Context

Course

Article Type

Course Level

Vision and Change Core Competencies

Class Type

Class Size

Principles of How People Learn

Assessment Type

Introduction

Question Design

Using Undergraduate Assistants to Help Evaluate CR Question Responses

Scoring Assessments

Returning Assessments

Closing Remarks

Supporting Materials

Acknowledgments

References

Article Files

Authors

About the Authors

Competing Interests

Comments

Comments