The test developer’s dilemma: Evaluating the balance of feasibility and empiric performance of test development techniques for repeated written assessments

Eric Shappell, Mary Jo Wagner, John Bailitz, Therese Mead, James Ahn, Andrew Eyre, Nicholas Maldonado, Bradley Wallace, Yoon Soo Park

Research output: Contribution to journalArticlepeer-review

Abstract

Purpose: Written assessments face challenges when administered repeatedly, including resource-intensive item development and the potential for performance improvement secondary to item recall as opposed to understanding. This study examines the efficacy of three-item development techniques in addressing these challenges. Methods: Learners at five training programs completed two 60-item repeated assessments. Items from the first test were randomized to one of three treatments for the second assessment: (1) Verbatim repetition, (2) Isomorphic changes, or (3) Total revisions. Primary outcomes were the stability of item psychometrics across test versions and evidence of item recall influencing performance as measured by the rate of items answered correctly and then incorrectly (correct-to-incorrect rate), which suggests guessing. Results: Forty-six learners completed both tests. Item psychometrics were comparable across test versions. Correct-to-incorrect rates differed significantly between groups with the highest guessing rate (lowest recall effect) in the Total Revision group (0.15) and the lowest guessing rate (highest recall effect) in the Verbatim group (0.05), p = 0.01. Conclusions: Isomorphic and total revisions demonstrated superior performance in mitigating the effect of recall on repeated assessments. Given the high costs of total item revisions, there is promise in exploring isomorphic items as an efficient and effective approach to repeated written assessments. Practice points Item psychometrics were comparable across repeated assessments developed with three different techniques: verbatim repetition, isomorphic changes, and total revisions. Isomorphic changes and total item revisions demonstrate superior performance in mitigating the effect of recall on repeated assessments as compared to a verbatim repetition of test items. Given the high costs of total item revisions, there is promise in exploring isomorphic items as an efficient and effective approach to repeated written assessments.

Original languageEnglish
JournalMedical Teacher
DOIs
StateAccepted/In press - 2022

Keywords

  • Test development
  • isomorphic variables
  • repeated testing
  • spaced education

Fingerprint

Dive into the research topics of 'The test developer’s dilemma: Evaluating the balance of feasibility and empiric performance of test development techniques for repeated written assessments'. Together they form a unique fingerprint.

Cite this