Support Options

  • Knowledge Base

    Find information on common questions and issues.

  • Support Messages

    Check on the status of your correspondences with members of the QUBES team.

Contact Us

About you
About the problem
Resource Image

Resources for Assessing Educational Interventions in Biology at the Collegiate Level

Author(s): Youngeun Choi1, Genevieve C. Saphier1, William J. Anderson1

Harvard University

96 total view(s), 35 download(s)

0 comment(s) (Post a comment)


Most scientific research is judged based on the quality of controlled experiments and carefully analyzed results. In addition, proper levels of regulation in terms of biosafety and animal usage are a routine part of the scientific research process…


Most scientific research is judged based on the quality of controlled experiments and carefully analyzed results. In addition, proper levels of regulation in terms of biosafety and animal usage are a routine part of the scientific research process for laboratories. For many biologists, educational research is much more of a black box. While faculty have many great ideas on how to improve education, they struggle with the best way to evaluate whether their ideas lead to greater student outcomes. Here we provide a review of compliance issues related to educational research, as well as describe ways in which pedagogical innovations in biology can be assessed. We also describe some of the challenges related to educational research and how these could be addressed.

Licensed under CC Attribution-NonCommercial 4.0 International according to these terms

Version 1.0 - published on 26 Aug 2021 doi:10.24918/cs.2016.1 - cite this


Compliance and Assessment Are Critical for Successful Innovation in STEM Education

A series of national reports and commentaries calling for science, technology, engineering, and mathematics (STEM) education reform has gradually brought about changes in classroom environment and teaching methods (1-5). Terms such as "active learning," "flipped classroom," and "clicker questions" are no longer obscure to many science educators (6). Prestigious science journals have published science education research papers, and scientific communities have organized workshops where instructors can learn and discuss effective teaching strategies (1,7-9).

Equipped with growing support and resources, science educators are encouraged to implement and assess novel teaching methods more than ever. However, many faculty members in higher education are apprehensive about performing any systematic evaluation of their pedagogical strategies (10) beyond conventional student course evaluations. For instance, it is not hard to find education research presentations at conferences that lack formal assessment. In many cases, researchers use affective assessments on how students view the intervention on their own outcome without using a more objective quantification, such as normalized learning gain. Sometimes, the assessment tools are inconsistent (and sometimes inappropriate) based upon the question asked. Furthermore, many of these studies are presented without obtaining IRB determination or approval prior to engaging in the research.

To foster and disseminate research-based pedagogical strategies in STEM education, it is necessary to provide instructors, who were trained as scientists and thus are generally unaccustomed to the setting of educational research, with the tools necessary to assess their classroom innovations. To aid science educators in entering possible uncharted territory surrounding educational research, we address in this essay two issues that instructors should consider while planning for educational research: regulatory compliance and assessment tools for pedagogical interventions.

What Regulatory Compliance Issues Arise from Education Research?


In general, the compliance issues faced by faculty looking to conduct educational research are regulated by the Family Education Rights and Privacy Act (FERPA) and by the Department of Health and Human Services regulations for the Protection of Human Subjects (45CFR46), colloquially referred to as "the Common Rule." Understanding of and compliance with these rules and regulations are important for the responsible conduct of education research in the higher education setting.

FERPA is a federal law intended to protect the privacy of parents and students. It regulates the disclosure of personally identifiable information from the student record (e.g., grades, GPA, etc.). Certain educational research projects may wish to use some of this information (e.g., to correlate learning gains in a class with students' performance in other biology classes or their science GPA). The use in research of information from student records is permitted by FERPA under the following circumstances:

  1. The student has explicitly consented to the use of this information in the research, OR
  2. The information has been de-identified, OR
  3. The information will be used to either
    • Develop, validate, or administer predictive tests, OR
    • Improve instruction
    • AND
    • The researcher enters into a data use agreement that includes specific language regarding data confidentiality (34CFR99.31(a)(6)) (11).

The aforementioned data use agreements vary by institution, as they typically include elements required by FERPA as well as institution-specific requirements. FERPA requirements for usage can be found here: Consultation with the registrar's office should help faculty to navigate specific institutional policy and procedures regarding compliance with FERPA.


Educational research in the context of the Common Rule, which requires Institutional Review Board (IRB) review and approval of most human subjects research, is more nuanced and therefore often quite difficult to figure out. The Common Rule generally only applies to either federally funded research or research at institutions that accept federal funding, the latter of which applies to many academic institutions. In general, to determine if a particular project is subject to the Common Rule, the Office of Human Research Protection (OHRP) provides the guidance that one should begin by asking, "Is it research?" (12) In the context of educational research, this can be particularly difficult to decipher, as Hammack (1997) points out, "Good teaching practice has always required close observation and experimentation" (13). While initially in the context of medical education, Roberts et al. (2001) addresses something relevant to all teacher-led education research: "Taken together medical education research and medical education practice both involve being methodical, innovative, self-observing, forward-looking, and open to peer review" (14).

So when does good teaching practice cross over into educational research? In the regulation, "research" is defined as, "a systematic investigation designed to develop or contribute to generalizable knowledge." Discussions of this definition generally focus on the words systematic, designed, and generalizable knowledge (15). Much of good teaching and even just academic record keeping may include the systematic collection of data. But if the design of the project, commonly understood as "intent" (15) is not generalizable knowledge, but instead program evaluation or quality improvement, then the project is not research (OHRP; (16)). However, if part of the intent of the project is quality improvement and part is research, or if the intent of the project changes along the way from quality improvement to research, then human subjects protection applies and review is necessary (17). It is critical to note then that the distinction is not made based on what will be done by the teacher or researcher, but rather based on what the teacher or researcher intends (18).

Researchers should check with their local IRB to determine their exact requirements. Many educational research projects will fit into the regulatory category of "exempt." Though exempt research does not technically have to undergo IRB review (15), OHRP recommends that investigators not make this determination on their own (OHRP; (12)), and many institutions have policies requiring IRB determination of exemption. The Common Rule lists six categories of research that are exempt from IRB review. The most germane rule states that if research is conducted in a typical education setting and involves only normal educational practices, then it is exempt from IRB review. Projects that have been determined to be exempt do not generally require annual renewal by the IRB. However, researchers are often required to submit any modifications to the study's design or protocol to the IRB for review so that the IRB can determine that the modification does not alter the exempt status.

It is relevant to note that the US Department of Health and Human Services issued a Notice of Proposed Rulemaking (NPRM) in September 2015 that proposes substantial changes to the Common Rule. The proposal includes changes to how determinations of exempt research can be made and creating a new category of research called "excluded" that would not require any IRB review. As of now, it is unclear who would determine whether or not a particular project qualifies for exclusion. The comment period on these proposed changes ended in January 2016, and it is not clear what the final rules will be, when they will be published, and/or when they will take effect.

Methods to Assess the Effectiveness of the Pedagogical Intervention

In addition to addressing compliance issues, educational researchers should select appropriate and effective methods to evaluate their pedagogical intervention. Depending on the goal of the pedagogical intervention, educational research may address one or more of the following questions:

  • How much did the intervention contribute to student learning?
  • How much did a particular skill/competency improve due to the intervention?
  • How much did student attitudes change after the intervention?

Below, we first briefly discuss some unique attributes of the educational research that should be considered in selecting the pedagogical tool. We then categorize assessment tools that can be used to determine the impact of the pedagogical strategy on learning, development of critical thinking skills, or student attitudes.


Trained as scientists, many biology educators may feel uncomfortable with quasi-experimental design, which is widely used in educational research. Two conditions that set apart education research from scientific research are the lack of identical study subjects (i.e., no two students are the same) and an ethical concern over students in a control group who cannot benefit from the educational innovation. To address the first issue, researchers have used analysis of covariance (ANCOVA) to control for strong indicators of student performance, such as grade point averages (19). Although diversity of student background needs to be controlled, it can inform researchers whether a group of students with a specific background benefits from the tool more than other groups. A recent study addressing the effect of increased course structure revealed the intervention disproportionately influenced particular subpopulations of students (20). This finding suggests that organizing student data by their backgrounds may uncover otherwise hidden or overlooked effects of the pedagogical technique.

The equity issue is complicated by the necessity of having a control group in the research. To circumvent this concern, researchers can perform their studies over the course of several years, during which the intervention is introduced, and use test scores from the year(s) before the pedagogical change as a control (19,8,10,21). In this case, the instructor should ensure that the difficulties of assessment tools stay the same throughout the years of the study. Alternatively, instructors can minimize the degree of inequity by forming a control group and an experimental group temporarily (e.g., one week) during the term, implementing the pedagogical intervention, assessing and comparing student performance (22,23). If different student groups are taught by different instructors, researchers should try to make the teaching quality and instructor background consistent between groups. A semester-long, randomized trial of a teaching technique is also feasible with student understanding of the study and their consent to participation, and it enables the instructor to rigorously examine the causal effect of the intervention (24,25,9).


Content Specific Assessment

If the intervention was implemented to enhance content knowledge and understanding, the instructor can measure how much students learned by administering the same test before and after the intervention (i.e., pre- and post-test). The difference between the pre-test score and the post-test score can be an intuitive measurement of the learning gain, but this simple calculation has some caveats (26,27). For instance, students with high pre-test scores have only small room for improvement, thus bigger differences between pre- and post-test scores are correlated with lower pre-test scores (28). Therefore, the education community has used the normalized learning gain instead (29):

Normalized Learning Gain = (Post Score - Pre Score) / (100 - Pre Score)

This in turn eliminates any bias caused by different pre-treatment scores (27). Moreover, researchers have developed other analysis methods [e.g., Rasch modeling (27,30)] to extract additional information from the assessment data.

If student answers should be qualitatively evaluated (e.g., essay questions), experts in the field who are not involved in the course can be appropriate graders (31). In this case, it is important that these graders should be provided with a comprehensive rubric so that any discrepancies between graders can be minimized.

Concept Inventory

Concept inventory is a multiple-choice test validated to assess student understanding of core concepts in a specific subject (32). Inspired by the Force Concept Inventory, which was a crucial tool to evaluate different teaching methods in physics education (29,33-35), biology faculty and educators have collaborated to develop various biological concept inventories (reviewed in (32); Table 1). These inventories are publicly available, making researchers less burdened with devising their own questions. Another advantage of using the concept inventory is that its widespread use can benefit the education community beyond the individual study by enabling the comparison between different pedagogical techniques.

[[{"fid":"2612","view_mode":"colorbox","fields":{"format":"colorbox","field_file_image_alt_text[und][0][value]":"Table 1","field_file_image_title_text[und][0][value]":"Table 1","field_caption[und][0][value]":"%3Cp%3ETable%201.%26nbsp%3BUseful%20assessment%20tools%20for%20educational%20research.%3C%2Fp%3E%0A","field_caption[und][0][format]":"full_html"},"type":"media","field_deltas":{"1":{"format":"colorbox","field_file_image_alt_text[und][0][value]":"Table 1","field_file_image_title_text[und][0][value]":"Table 1","field_caption[und][0][value]":"%3Cp%3ETable%201.%26nbsp%3BUseful%20assessment%20tools%20for%20educational%20research.%3C%2Fp%3E%0A","field_caption[und][0][format]":"full_html"}},"attributes":{"alt":"Table 1","title":"Table 1","class":"media-element file-colorbox","data-delta":"1"}}]]

Self-Reported Assessment

Students can self-report their learning gains using an instrument such as the Student Assessment of Learning Gains survey (Table 1). Although previous studies found a correlation between self-reported gains and student academic performance (36-38), the accuracy of self-reported learning gain is still debatable (39,40). Thus, we suggest that educators use one of the aforementioned formal assessments to measure the efficacy of the teaching strategy.


Improving student analytical skills and scientific literacy is one of the core objectives of undergraduate science education reform (1,41-47). As a response, educators have created various teaching methods to help students develop critical thinking skills (31,48,49) as well as diagnostic tests to measure student progress in scientific thinking (Table 1). Among them is the Test of Scientific Literacy Skills (TOSLS), which contains 28 multiple-choice questions pertaining to two major scientific literacy skills--scientific method validation and data interpretation (50). Furthermore, assessment tools, such as the Rubric for Experimental Design (51) and the Experimental Design Ability Test (52), are available for an instructor who wishes to evaluate student knowledge in experimental design.


Student behavior is one of the critical factors linked to student learning and achievement (53-56). To determine any changes in student behavior and perception after the introduction of a teaching technique, questionnaires with multiple-choice and/or five- or seven-point Likert-scale questions have been developed (Table 1).

Disseminating the Results with the Community

A variety of venues exist to share one's results with the greater community. For example, the American Society for Cell Biology, the Society for Developmental Biology, and many others welcome educational posters at their annual meetings. Additionally, many annual meetings also have educational symposia. The Society for the Advancement of Biology Education Research ( runs an annual meeting devoted to educational research and assessment. There are a variety of online resources (including CourseSource; that publish peer-reviewed teaching materials. Journals like CBE - Life Science Education ( also provide a great vehicle to describe studies.

One barrier to sharing results from educational research is a worry that this public dissemination exposes the effectiveness of an instructor's classroom skills. Junior faculty in particular may be very self-conscious of their classroom performance, especially if teaching is a considerable part of the tenure process. Additionally, most biology instructors do not have enough time, support, or incentives from their schools to document their learning innovations and share them with other intramural or extramural instructors. This challenge underscores the importance of a departmental culture where teaching is highly valued. Educational research, especially involving studies that evaluate teaching effectiveness, should be included as part of the tenure review and teaching awards, as other colleagues have argued (57). This inclusion will allow educational research studies to be viewed as they should be - a positive sign that the faculty cares about their role as educators.


We thank Doug Melton and Willy Lensch for advice and support of the work. We also are grateful to those readers who have provided constructive feedback.

Cite this work

Researchers should cite this work as follows:


There are no comments on this resource.