TY - JOUR
T1 - Can you crowdsource expertise? Comparing expert and crowd-based scoring keys for three situational judgment tests
AU - Brown, Matt I.
AU - Grossenbacher, Michael A.
AU - Martin-Raugh, Michelle P.
AU - Kochert, Jonathan
AU - Prewett, Matthew S.
N1 - Funding Information:
We would like to thank Harrison Kell and Samuel Rikoon for their helpful feedback on a previous version of this manuscript. Disclaimer: The research described herein was sponsored by the U.S. Army Research Institute for the Behavioral and Social Sciences, Department of the Army (Cooperative Agreement No. W911NF‐18‐2‐0018. The views expressed in this presentation are those of the author and do not reflect the official policy or position of the Department of the Army, DOD, or the U.S. Government. Portions of this study were presented as a poster at the 34th Meeting for the Society for Industrial and Organizational Psychology; National Harbor, MD and at the 31st Meeting for the Society for Industrial and Organizational Psychology; Anaheim, CA.
Funding Information:
We would like to thank Harrison Kell and Samuel Rikoon for their helpful feedback on a previous version of this manuscript. Disclaimer: The research described herein was sponsored by the U.S. Army Research Institute for the Behavioral and Social Sciences, Department of the Army (Cooperative Agreement No. W911NF-18-2-0018. The views expressed in this presentation are those of the author and do not reflect the official policy or position of the Department of the Army, DOD, or the U.S. Government. Portions of this study were presented as a poster at the 34th Meeting for the Society for Industrial and Organizational Psychology; National Harbor, MD and at the 31st Meeting for the Society for Industrial and Organizational Psychology; Anaheim, CA.
Publisher Copyright:
© 2021 John Wiley & Sons Ltd.
PY - 2021/12
Y1 - 2021/12
N2 - It is common practice to rely on a convenience sample of subject matter experts (SMEs) when developing scoring keys for situational judgment tests (SJTs). However, the defining characteristics of what constitutes a SME are often ambiguous and inconsistent. Sampling SMEs can also impose considerable costs. Other research fields have adopted crowdsourcing methods to replace or reproduce judgments thought to require subject matter expertise. Therefore, we conducted the current study to compare crowdsourced scoring keys to SME-based scoring keys for three SJTs designed for three different job domains: Medicine, Communication, and Military. Our results indicate that scoring keys derived from crowdsourced samples are likely to converge with keys based on SME judgment, regardless of test content (r =.88 to.94 between keys). We observed the weakest agreement among individual MTurk and SME ratings for the Medical SJT (classification consistency = 61%) relative to the Military and Communication SJTs (80% and 85%). Although general mental ability and conscientiousness were each related to greater expert similarity among MTurk raters, the average crowd rating outperformed nearly all individual MTurk raters. Using randomly-drawn bootstrapped samples of MTurk ratings in each of the three samples, we found that as few as 30–40 raters may provide adequate estimates of SME judgments of most SJT items. These findings suggest the potential usefulness of crowdsourcing as an alternative or supplement to SME-generated scoring keys.
AB - It is common practice to rely on a convenience sample of subject matter experts (SMEs) when developing scoring keys for situational judgment tests (SJTs). However, the defining characteristics of what constitutes a SME are often ambiguous and inconsistent. Sampling SMEs can also impose considerable costs. Other research fields have adopted crowdsourcing methods to replace or reproduce judgments thought to require subject matter expertise. Therefore, we conducted the current study to compare crowdsourced scoring keys to SME-based scoring keys for three SJTs designed for three different job domains: Medicine, Communication, and Military. Our results indicate that scoring keys derived from crowdsourced samples are likely to converge with keys based on SME judgment, regardless of test content (r =.88 to.94 between keys). We observed the weakest agreement among individual MTurk and SME ratings for the Medical SJT (classification consistency = 61%) relative to the Military and Communication SJTs (80% and 85%). Although general mental ability and conscientiousness were each related to greater expert similarity among MTurk raters, the average crowd rating outperformed nearly all individual MTurk raters. Using randomly-drawn bootstrapped samples of MTurk ratings in each of the three samples, we found that as few as 30–40 raters may provide adequate estimates of SME judgments of most SJT items. These findings suggest the potential usefulness of crowdsourcing as an alternative or supplement to SME-generated scoring keys.
KW - consensus-based measurement
KW - crowdsourcing
KW - implicit trait policies
KW - situational judgment tests
KW - subject matter expertise
UR - http://www.scopus.com/inward/record.url?scp=85116764315&partnerID=8YFLogxK
U2 - 10.1111/ijsa.12353
DO - 10.1111/ijsa.12353
M3 - Article
AN - SCOPUS:85116764315
SN - 0965-075X
VL - 29
SP - 467
EP - 482
JO - International Journal of Selection and Assessment
JF - International Journal of Selection and Assessment
IS - 3-4
ER -