Virtues and limitations of item response theory for educational assessment in the medical sciences
Abstract
Classical test theory (CTT) and item response theory (IRT) constitute the two main paradigms in psychometrics. Although the foundations of IRT were already introduced in the middle of the twentieth century and despite the numerous publications since which show the theoretical superiority of IRT over CTT, the classical approach is still, by far, the most commonly used for educational measurement, not the least in the field of medical education. In this article, I revise the fundamentals and basic concepts of both psychometric approaches and highlight the advantages that IRT models may offer in the context of ducational assessment in the health sciences. However, based on an evaluation of the assumptions underlying the most commonly used IRT models, it is argued that these assumptions are significantly discrepant with the complex reality often encountered in educational measurement. As a result, it is concluded that, in order to take proper advantage of the IRT framework, often more complex models, eyond the traditionally known, must be considered, including multidimensional models and/or models that take into account local dependencies among test items.