Assessment of multiple-choice questions in medicine. Validity evidence of an instrument.

Evaluación de reactivos de opción múltiple en medicina. Evidencia de validez de un instrumento.
Jesús Rivera Jiméneza, Fernando Flores Hernándezb, Amilcar Alpuche Hernándezb, Adrián Martínez Gonzálezb

a Departamento de Bioquímica, Facultad de Medicina, Universidad Nacional Autónoma de México (UNAM), Ciudad de México, México
b Secretaría de Educación Médica, Facultad de Medicina, Universidad Nacional Autónoma de México (UNAM), Ciudad de México, México

Recibido 13 enero 2016, Aceptado 27 abril 2016

Palabras Clave

Validez de pruebas, Reactivos de opción múltiple, Evaluación


Test validity, Multiple choice ítems, Assessment


Introduction: The appropriate preparation of test ítems of an examination constitutes validity evidence in itself. Despite there being a general consensus about item-writing guidelines, several studies report a high incidence of violations of these standards. An instrument is proposed in order to assess the quality of multiple-choice item-writing, describing the validity evidence gathering process.

Methods: The validity evidence was gathered on an instrument designed to assess multiple choice ítems features, according to the sources proposed by the Standards for Educational and Psychological Testing, and particularly those related to content, response process, and internal structure. Kappa index (following Fleiss’ model) and point-biserial correlation coefficient were used to measure concordance in the criteria assessed by the instrument. An exploratory factorial analysis was performed to identify the instrument dimensions, and Cronbach's alpha was calculated as an internal consistency statistic.

Results: Concordance between multiple judges was greater than 0.8 (almost perfect agreement) for 12 out of 21 criteria, and 0.19 for Bloom's taxonomy level. Factorial analysis defined 4 dimensions with Kaiser-Meyer-Olkin (KMO) test =0.666 (p<.01), explained variance of 49.979%, and a Cronbach's alpha of 0.627.

Conclusion: This instrument can be used to assess multiple choice ítems, since it counts with validity evidence related to content, response process and internal structure, and psychometric values appropriated for instrumentation.