TY - JOUR
T1 - Does scale length matter? A comparison of nine- versus five-point rating scales for the mini-CEX
AU - Cook, David A.
AU - Beckman, Thomas J.
N1 - Funding Information:
Acknowledgments Thanks to K. G. Thomas and D. M. Dupras for assistance in study planning and execution, F. Enders for assistance in statistical planning, and to E. S. Holmboe for use of scripted cases. Funding was provided by the Mayo Education Innovation Program. A paper based on this study was presented at the 2008 meeting of the American Educational Research Association in New York.
PY - 2009/11
Y1 - 2009/11
N2 - Educators must often decide how many points to use in a rating scale. No studies have compared interrater reliability for different-length scales, and few have evaluated accuracy. This study sought to evaluate the interrater reliability and accuracy of mini-clinical evaluation exercise (mini-CEX) scores, comparing the traditional mini-CEX nine-point scale to a five-point scale. Methods: The authors conducted a validity study in an academic internal medicine residency program. Fifty-two program faculty participated. Participants rated videotaped resident-patient encounters using the mini-CEX with both a nine-point scale and a five-point scale. Some cases were scripted to reflect a specific level of competence (unsatisfactory, satisfactory, superior). Outcome measures included mini-CEX scores, accuracy (scores compared to scripted competence level), interrater reliability, and domain intercorrelation. Results: Interviewing, exam, counseling, and overall ratings varied significantly across levels of competence (P < .0001). Nine-point scale scores accurately classified competence more often (391/720 [54%] for overall ratings) than five-point scores (316/723 [44%], P < .0001). Interrater reliability was similar for scores from the nine- and five-point scales (0.43 and 0.40, respectively, for overall ratings). With the exception of correlation between exam and counseling scores using the five-point scale (r = 0.38, P = .13), score correlations among all domain combinations were high (r = 0.46-0.89) and statistically significant (P ≤ .015) for both scales. Conclusions: Mini-CEX scores demonstrated modest interrater reliability and accuracy. Although interrater reliability is similar for nine- and five-point scales, nine-point scales appear to provide more accurate scores. This has implications for many educational assessments.
AB - Educators must often decide how many points to use in a rating scale. No studies have compared interrater reliability for different-length scales, and few have evaluated accuracy. This study sought to evaluate the interrater reliability and accuracy of mini-clinical evaluation exercise (mini-CEX) scores, comparing the traditional mini-CEX nine-point scale to a five-point scale. Methods: The authors conducted a validity study in an academic internal medicine residency program. Fifty-two program faculty participated. Participants rated videotaped resident-patient encounters using the mini-CEX with both a nine-point scale and a five-point scale. Some cases were scripted to reflect a specific level of competence (unsatisfactory, satisfactory, superior). Outcome measures included mini-CEX scores, accuracy (scores compared to scripted competence level), interrater reliability, and domain intercorrelation. Results: Interviewing, exam, counseling, and overall ratings varied significantly across levels of competence (P < .0001). Nine-point scale scores accurately classified competence more often (391/720 [54%] for overall ratings) than five-point scores (316/723 [44%], P < .0001). Interrater reliability was similar for scores from the nine- and five-point scales (0.43 and 0.40, respectively, for overall ratings). With the exception of correlation between exam and counseling scores using the five-point scale (r = 0.38, P = .13), score correlations among all domain combinations were high (r = 0.46-0.89) and statistically significant (P ≤ .015) for both scales. Conclusions: Mini-CEX scores demonstrated modest interrater reliability and accuracy. Although interrater reliability is similar for nine- and five-point scales, nine-point scales appear to provide more accurate scores. This has implications for many educational assessments.
KW - Accuracy
KW - Assessment
KW - Clinical competence
KW - Educational measurement
KW - Interrater reliability
KW - Medical education
KW - Psychometrics
KW - Reproducibility of results
UR - http://www.scopus.com/inward/record.url?scp=70449122923&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=70449122923&partnerID=8YFLogxK
U2 - 10.1007/s10459-008-9147-x
DO - 10.1007/s10459-008-9147-x
M3 - Article
C2 - 19034679
AN - SCOPUS:70449122923
SN - 1382-4996
VL - 14
SP - 655
EP - 664
JO - Advances in Health Sciences Education
JF - Advances in Health Sciences Education
IS - 5
ER -