Artículo original
eISSN 2007-5057
Investigación educ. médica Vol. 7, n.o 27, julio-septiembre 2018
http://doi.org/10.1016/j.riem.2017.06.003
Validación de un cuestionario en castellano basado en el
modelo de Stanford para evaluar docentes clínicos
Marcela Bitrana*, Manuel Torres-Sahlia y Oslando Padillab
a Centro de Educación Médica, Escuela de Medicina, Pontificia
Universidad Católica de Chile, Santiago, Chile.
b Departamento de Salud Pública, Escuela de Medicina,
Pontificia Universidad Católica de Chile, Santiago, Chile.
Recibido el 31 de enero de 2017; aceptado el 13 de junio de 2017
Resumen
Introducción: Aunque existen instrumentos en español para evaluar el desempeño docente durante el ciclo básico o la especialización médica, faltan instrumentos para evaluar la docencia en los años iniciales del entrenamiento clínico, en que el profesor cumple un rol fundamental facilitando el aprendizaje experiencial. MEDUC30 es un cuestionario en español desarrollado por la Escuela de Medicina de la Pontificia Universidad Católica para este efecto. Se construyó con base en el modelo educacional del Programa de Desarrollo Docente de la Universidad de Stanford (SFDP) y ha sido usado desde 2004 en la Pontificia Universidad Católica y validado previamente con métodos exploratorios.
Objetivo: Proveer evidencia de validez y confiabilidad de MEDUC30 que avale su utilidad en contextos hispanohablantes, usando métodos analíticos confirmatorios.
Método: Este es un estudio de carácter analítico, longitudinal y retrospectivo. Se analizaron 24,681 cuestionarios que evaluaban 579 docentes clínicos. Éstos fueron completados por estudiantes de medicina entre tercer y séptimo año, entre 2004 y 2015. Los datos se analizaron mediante análisis factorial exploratorio (AFE) y confirmatorio (AFC), y se evaluó la invariancia de medición con AFC multi-grupo.
Resultados: Se compararon cuatro modelos, de los cuales un modelo bi-factor fue el que mejor dio cuenta de los datos. Este modelo está compuesto de un factor general y seis específicos: [i] Enseñanza centrada en el paciente, [ii] Comunicación de objetivos, [iii] Evaluación y retroalimentación, [iv] Promoción de la comprensión, la retención y el aprendizaje auto-dirigido, [v] Control de la sesión, y [vi] Clima de aprendizaje. La confiabilidad general fue excelente (a Cronbach= .98, w McDonald = .98) y la de los seis factores, muy buena (a Cronbach =.88-.95, w McDonald = .78-.94). La invarianza de medición se sostuvo para sexo del docente, fecha, semestre, curso, campo clínico, y duración de la rotación. Todas estas variables mostraron ser fuentes de heterogeneidad poblacional.
Conclusiones: MEDUC30 es un instrumento en español válido y confiable para proveer retroalimentación a los docentes clínicos tanto de su efectividad docente general como de seis dominios educacionales específicos. Además, puede proporcionar información útil para jefes de programas y autoridades para mejorar la calidad de la docencia clínica.
Palabras Clave: Docente clínico; evaluación; calidad; cuestionario; validación
Validation of a Spanish questionnaire
implementing the Stanford Educational
Framework for Evaluation of Clinical TeachersAbstract
Abstrac
Introduction: Although there are instruments in Spanish to evaluate teacher performance during the initial basic science training years or during medical specialization; there are few instruments for the clinical training years, in which the main role of the teacher is to facilitate experiential learning. The MEDUC30 questionnaire is a Spanish instrument developed by the Pontificia Universidad Católica School of Medicine. It was built using the Stanford Faculty Development Program (SFDP) educational framework for evaluation of clinical teachers’ effectiveness by students. MEDUC30 has been used since 2004 at Pontificia Universidad Católica de Chile and was previously studied with exploratory methods.
Objective: To provide satisfying evidence of validity and reliability to support MEDUC30’s usefulness in Spanish-speaking contexts, using confirmatory analytical methods.
Method: This is an analytical, longitudinal and retrospective study, in which 24,681 MEDUC30 questionnaires evaluating 579 clinical teachers were analysed. They were completed by medical students from 3rd to 7th year of study, from 2004 throughout 2015. The questionnaire’s structure was studied by exploratory (EFA) and confirmatory factor analysis (CFA). Measurement invariance was evaluated with multi-group CFA.
Results: Four different models were compared; a bi-factor model was the best alternative to explain the data’s structure. It was composed of one general and six domain-specific factors: [i] Patient-Based Teaching, [ii] Communication of Goals, [iii] Evaluation and Feedback, [iv] Promotion of Understanding, Retention, and Self-directed Learning, [v] Control of the Session, and [vi] Learning Climate. The overall reliability of MEDUC30 scores was excellent (Cronbach´s a= .98, McDonald´s w = .98) and that of the six specific factors was very good (Cronbach´s a =.88-.95, McDonald´s w = .78-.94). Measurement invariance extended over teacher gender, date, semester, year of study, clinical teaching setting, and length of clinical rotation; all of these variables were sources of population heterogeneity.
Conclusions: MEDUC30 is a valid and reliable Spanish instrument to evaluate clinical teachers. It can be used to provide formative feedback to clinical teachers and to provide accurate information to department heads and program directors for resource allocation and promotion purposes.
Keywords: clinical teacher; evaluation; effectiveness; instrument; validity
INTRODUCTION
In 2000, the School of Medicine of the Pontificia Universidad Católica de Chile created a Faculty Development Center to provide formative instances for its more than 700 clinical teachers, and to foster a culture of continuous enhancement of teaching quality1,2 . Back then there were few questionnaires in Spanish to evaluate the performance of medical teachers in clinical settings. The Center developed and validated a questionnaire in Spanish named MEDUC30 based on a systematic review of specialized literature and using as a template the seven-domain educational model of Stanford University for evaluating clinical teaching effectiveness3.The questionnaire was refined and validated by a Delphi pannel4.
MEDUC30 is a 30-item questionnaire with a frequency Likert-type scale of 4 points 4 . Its items tribute to one of following domains: [i] Learning Climate, [ii] Evaluation, [iii] Feedback, [iv] Communication of Goals, [v] Control of the Session, [vi] Promotion of Understanding and Retention, [vii] Promotion of Self-directed Learning and[viii] Patient-Based Teaching. The eighth domain was incorporated to ensure that the activity evaluated referred to actual clinical teaching rather than minilectures given in clinical settings.
In the initial validation study 4 MEDUC30 displayed a good reliability and a four-factor structure: Patient-Based Teaching and Learning Climate emerged as separate empirical factors, and the remaining five SFDP’s educational domains gathered in two large factors: Evaluating skills (comprising Evaluation plus Feedback) and Teaching skills (comprising Communication of Goals, Control of the Session, Promotion of Comprehension and Retention, and Promotion of Self-directed learning)4.
MEDUC30 is to our knowledge one of the few validated instrument in Spanish to evaluate clinical teachers’ effectiveness during the initial years of clinical experiential learning. It complements the questionnaires developed by the Faculty of Medicine of the UNAM of Mexico to evaluate teachers’ performance during the basic medical science teaching5-7 and medical specialty training8 and those developed by the Pontificia Universidad Católica de Chile for evaluation of clinical teachers in different medical specialties9,10 . MEDUC30 has been used at the Pontificia Universidad Católica de Chile medical school since 2004. However, no confirmatory factor analysis (CFA) and measurement invariance studies have been made so far.
The purpose of this study is to provide updated evidence as to the reproducibility, validity and usefulness of MEDUC30 questionnaire to evaluate clinical teaching in Spanish-speaking contexts. To this end, we validated MEDUC30 using a larger and more recent database employing confirmatory analytical methods (CFA) to ascertain the model’s goodness-of-fit and multi-group CFA to study measurement invariance.
METHOD
This is an analytical, longitudinal, retrospective study aiming to examine the psychometric properties of the data produced with MEDUC30 in its regular use of evaluation of clinical teachers’ performance at Pontificia Universidad Católica de Chile School of medicine. We analysed a total of 24,681 evaluation forms regarding 579 clinical tutors (63% men) collected from 2004 through 2015.
MEDUC30 is a 30-item instrument that describes observable teacher behaviours4 . It uses a four-level scale: 1. ‘almost never’, 2. ‘sometimes’, 3. ‘often’, and 4. ‘almost always’ 4 . Twenty nine items tribute to eight dimensions as follows: [i] Patient-Based Teaching, items 1-5; [ii] Communication of Goals, items 6-8; [iii] Evaluation, items 9-12; [iv] Promoting Understanding and Retention, items 13-16; [v] Promoting Self-directed Learning, items 17-19; [vi] Control of the Session, items 20-22; [vii] Feedback, items 23-25; and [viii] Learning Climate, items 26-29. The last item corresponds to a global rating.
The evaluation process was as follows: at the end of each clinical rotation medical students from 3rd to 7th (last) year of study were asked to evaluate their clinical tutors using MEDUC30 paper forms. Students filled the forms anonimoulsy (in the absence of the evaluated teacher) as many times as rotations they had during the year. This was an ongoing process; at the end of each academic year, tutors accumulated between 5 and 30 evaluations. Individual reports were sent to the teachers to provide them specific feedback regarding the eigth domains of effective teaching. Copies of these reports were made available yearly to school authorities to be used as a source of information for academic promotion.
Items as ordinal measures of continuous latent constructs and missing data handling
For analytical purposes, we treated the MEDUC30 items’ scores as ordinal rather than continuous11. To deal with missing data, we applied multiple imputation12 using Multivariate Imputation by Chained Equations (MICE) with the proportional odds logistic regression (POLR) model. We used these imputed data to conduct both Exploratory (EFA) and Confirmatory Factor Analyses (CFA).
Factor Analysis
We conducted EFA to study the dimensionality and internal structure of the data, CFA to evaluate the model’s goodness-of-fit, and multi-group CFA to study measurement invariance.
The data was randomly divided into three samples: sample 1 (n = 4,122) for EFA, sample 2 (n = 4,109) for CFA global fit assessment, and sample 3 (n = 16,450) for CFA measurement invariance evaluation.
For the EFA we used the unweighted least squares (ULS) estimator, while for CFA, we added robust standard errors, and mean and variance adjusted test statistic with second order approach13 (ULSMVS in lavaan R Package).
Four fit indices were calculated to evaluate and compare descriptive goodness-of-fit. Two comparative fit indices: Comparative Fit Index (CFI) and Tucker-Lewis Index (TLI); one parsimony correction index: Root-Mean-Square Error of Approximation (RMSEA); and one absolute fit index: Weighted Root-Mean-square Residual (WRMR).
The following cutoff values were derived from simulation studies14-17 : good fit when: CFI ≥ .96, TLI ≥ .95, RMSEA ≤ .05; acceptable fit when: CFI and TLI ≥ .90, RMSEA < .08; mediocre fit: if .08 ≤ RM-SEA ≤ .10, with CFI and TLI ≥ .90. Meeting at least two of the three criteria just described in one level of satisfaction, and the remaining in an adjacent level (upper or lower), the model fit was assumed as conforming to the former16,18 . Finally, if CFI or TLI < .90, or RMSEA > .10 the model were rejected. WRMR (smaller is better) was used to corroborate model comparison19.
For reliability measures, we calculated Cronbach’s α, and McDonald’s ω and ωt20 as indices of internal consistency of the respective constructs. Additionally, for bifactor model constructs we reported McDonald’s ωh21 with respective ωs coefficients as indices of factor saturation.
We studied measurement invariance over tutor gender, date, semester, year of study, clinical teaching setting, and length of clinical rotation using χ2 based likelihood-ratio test (LRT) with Satorra22 adjusted test statistic. For every grouping variable, we used a random subsample of the biggest group(s) to ensure equal n with the smaller ones.
Software
We conducted all statistical analyses using R software 3.3.023 with specific packages. Multiple Imputation by Chained Equations was performed using the mice package 2.2524. Exploratory factor analysis, including tests for sampling adequacy and parallel analysis, was conducted with the psych package 1.6.425. Confirmatory Factor Analysis was conducted with the lavaan package 0.5.2026. Proportional odds regressions were performed with the MASS package 7.3.4527.
This paper reports the results of the clinical teachers’ evaluations conducted from 2004 to 2015 at Pontificia Universidad Católica de Chile School of Medicine. This is a mandatory process overseen by the Center for Medical Education according to ethical considerations aimed to assure the confidential handling of the information. The evaluation forms are filled anonymously by students. Each clinical tutor receives a yearly report of his/her results and these results are also known by department head and the school authorities to be used for purposes of career promotion.
Figure 1. Diagram of the bi-factor model of MEDUC30 scores
RESULTS
The proportion of missing responses for the whole questionnaire was low (1.79%) except for items 10 and 24 (8.72% and 5.13% missing values, respectively). To deal with this situation, we used multiple imputation.
On the other hand, the highest response category of the questionnaire (‘almost always’) concentrated 74.3% of the answers, indicative of a ‘ceiling effect’. To deal with this situation, we selected polychoric correlation and a robust estimator19,28,29.
The three samples were comparable with regards to date (ANOVA p = .60), tutor gender, clinical teaching setting, semester (binomial regression p = .29, .72, and .43 respectively), year of study, and global score (POLR p = .33, .69 respectively).
The Kaiser-Meyer-Olkin measure of sampling adequacy (.97) indicated a marvellous (≥ .90) factorability according to Kaiser and Rice30 criteria. The Bartlett’s sphericity test (2 = 129511.72, df = 406, p < .001) also indicated reasonability of factor analysis.
Number of Retained Factors
We decided to retain seven factors according to two criteria: the maximum number of factors given by Horn’s Parallel Analysis (eight factors) and the last big drop in the eigenvalues in the sedimentation plot (between factors 7 and 8).
In addition, we tested the hypothesis that a bifactor model could better explain the data given the large difference in eigenvalues between factors 1 and 2 (17.36 vs. 0.98), as suggested by Reise31 . This implied the retention of one general factor, with six specific factors.
Latent Structure
Exploratory analysis for a bifactor structure model with 6 specific dimensions resulted in the following domain-specific factors, alongside the general factor (figure 1): Patient-Based Teaching (PBT; items 1 to 5), Communication of Goals (CG; items 6 to 8), Evaluation and Feedback (EVFB; items 9, 11, 12, 23, and 25), Promotion of Understanding, Retention and Self-directed Learning (PURSL; items 13 to 19), Control of the Session (CS; items 20 to 22), and Learning Climate (LC; items 10, 24, and 26 to 29). Only two items had considerable cross-loadings (items 10 and 24), both loaded more on Learning Climate than on their original theoretical factor. Commonalities of items ranged from .50 to .90. Factor loadings ranged from .59 to .83 for the general factor, and from .23 to .67 for domain-specific factors.
Confirmatory Factor Analysis (Sample 2)
Goodness-of-fit and model comparison
We compared four models in CFA: [i] The four correlated traits model described by Bitran et al.4 , [ii] the bifactor model suggested in the results of EFA, [iii] a model with six correlated traits (corresponding to the six domain-specific factors of the bifactor model), and [iv] a single-factor model. According to evaluated global fit indices (table 1), the bifactor model was the only one with acceptable (CFI) to good (TLI and RMSEA) global fit indices. This supports the bifactor model with six domain-specific factors as the best latent structure for this MEDUC30 Data.
Table 1. CFA Global goodness-of-fit indices (sample 2)
Model |
SBχ² (df) |
SBχ² / df |
CFI |
TLI |
RMSEA [90% CI] |
WRMR |
Sample 2 |
||||||
Single factor |
4029.6 (154) |
26.1 |
.771 |
.979 |
.078 [.076, .080] |
3.49 |
Four Correlated Traits |
2531.5 (169) |
15.0 |
.86 |
.988 |
.058 [.057, .060] |
2.57 |
Six Correlated Traits |
2029.2 (176) |
11.6 |
.89 |
.991 |
.051 [.049, .053] |
2.17 |
Bifactor |
1234.2 (3) |
8.1 |
.936 |
.994 |
.041 [.040, .043] |
1.82 |
Note. SBχ² = Satorra-Bentler scaled chi-square, df = degrees of freedom, CFI = Comparative fit index. TLI = Tucker-Lewis index.
RMSEA = Root-mean-square error of approximation. WRMR = Weighted root-mean-square residual. All p-values < .001.
Factor structure and reliability
All factors in the bifactor model (the general and the six domain-specific factors) displayed good reliability coefficients: .88 to .98 for Cronbach’s α, and .79 to .97 for McDonald’s ω (Table 2). Hierarchical reliability (ωh/s ) was stronger for the general factor compared to the domain-specific factors, particularly the PURSL factor (ωs = .08) (Table 2).
Table 2. CFA Standardized factor loadings and strength indices for bifactor model (Sample 2)
Item |
Theoretical factor |
Bifactor Model (general & six domain-specific factors) |
||||||
g |
PBT |
CG |
EVFB |
PURSL |
CS |
LC |
||
Item 1 |
PBT |
.71 |
.09 |
|||||
Item 2 |
PBT |
.71 |
.60 |
|||||
Item 3 |
PBT |
.73 |
.59 |
|||||
Item 4 |
PBT |
.70 |
.55 |
|||||
Item 5 |
PBT |
.73 |
.28 |
|||||
Item 6 |
CG |
.76 |
.53 |
|||||
Item 7 |
CG |
.80 |
.51 |
|||||
Item 8 |
CG |
.83 |
.16 |
|||||
Item 9 |
EV |
.71 |
.35 |
|||||
Item 10 |
EV |
.83 |
.21 |
|||||
Item 11 |
EV |
.70 |
.42 |
|||||
Item 12 |
EV |
.80 |
.45 |
|||||
Item 13 |
PUR |
.80 |
.27 |
|||||
Item 14 |
PUR |
.82 |
.35 |
|||||
Item 15 |
PUR |
.81 |
.36 |
|||||
Item 16 |
PUR |
.82 |
.29 |
|||||
Item 17 |
PSL |
.82 |
.13 |
|||||
Item 18 |
PSL |
.72 |
.03 |
|||||
Item 19 |
PSL |
.82 |
.15 |
|||||
Item 20 |
CS |
.76 |
.47 |
|||||
Item 21 |
CS |
.73 |
.49 |
|||||
Item 22 |
CS |
.63 |
.42 |
|||||
Item 23 |
FB |
.73 |
.41 |
|||||
Item 24 |
FB |
.81 |
.29 |
|||||
Item 25 |
FB |
.74 |
.40 |
|||||
Item 26 |
LC |
.78 |
.49 |
|||||
Item 27 |
LC |
.89 |
.15 |
|||||
Item 28 |
LC |
.80 |
.48 |
|||||
Item 29 |
LC |
.85 |
.89 |
.40 |
||||
Cronbach’s α |
.98 |
.91 |
.91 |
.92 |
.94 |
.88 |
.96 |
|
McDonald’s ωt |
.97 |
.86 |
.87 |
.87 |
.88 |
.79 |
.93 |
|
McDonald’s ω(h/s) |
.92 |
.31 |
.21 |
.23 |
.08 |
.21 |
.25 |
Note. g = General factor, PBT = Patient-based teaching, CG = Communication of goals, EVFB = Evaluation and feedback,
PURSL = Promotion of understanding, retention and self-directed learning, CS = Control of session, LC = Learning Climate.
Factor loadings for the bifactor model (table 2) were all high on the general factor, ranging from .63 (item 22) to .87 (item 27). All specific factors except PURSL had at least two salient loadings (≥ .40). With exception of items 1, 8, 17, 18, 19 and 27 (loading < .20), domain-specific loadings were in general large enough (≥ .20) reflecting a multidimensional structure.
Multigroup CFA (table 3) indicated that configural (form), weak (loadings) and strong (intercepts) measurement invariance could be sustained (p > .05) across tutor gender (man/woman), clinical teaching setting (inpatient/outpatient), year of study (3rd to 7th), length of clinical rotation (1 to 7-or-more weeks), date (2004 to 2015), and semester (fall/spring).
Also, all of these variables were sources of population heterogeneity (mean) with p <.001 except for the length of clinical rotation with p = .003 which was still significant at most traditional confidence values.
Table 3. Likelihood-ratio test (X2 difference test) for Multi-group Measurement Invariance (Sample 3)
Invariance level |
Bifactor model |
|||
χ² (df) |
Δχ² [CFI] |
Δdf [RMSEA] |
p (>χ²) |
|
Gendera |
||||
Configural |
3485 (689) |
[.98] |
[.034] |
|
Loadings |
3870 (747) |
3.7 |
3.15 |
.32 |
Intercepts |
4058 (798) |
3.7 |
8.28 |
.90 |
Means |
5156 (805) |
78.8 |
5.13 |
<.001 |
Semesterb |
||||
Configural |
3547 (689) |
[.98] |
[.035] |
|
Loadings |
4033 (747) |
4.5 |
3.01 |
.22 |
Intercepts |
3996 (798) |
-0.8 |
9.00 |
>.999 |
Means |
4516 (805) |
38.0 |
5.37 |
<.001 |
Clinical teaching settingc |
||||
Configural |
3997 (689) |
[.98] |
[.034] |
|
Loadings |
4946 (747) |
7.1 |
2.648 |
.051 |
Intercepts |
5195 (798) |
4.5 |
8.23 |
.827 |
Means |
6954 (805) |
124.1 |
5.29 |
<.001 |
Year of studyd |
||||
Configural |
2928 (1712) |
[.98] |
[.032] |
|
Loadings |
5227 (1944) |
11.0 |
5.63 |
.074 |
Intercepts |
5078 (2148) |
-0.6 |
7.27 |
>.999 |
Means |
11544 (2176) |
107.4 |
5.00 |
<.001 |
Length of clinical rotatione |
||||
Configural |
3818 (2394) |
[.98] |
[.032] |
|
Loadings |
5860 (2742) |
6.08 |
5.40 |
.345 |
Intercepts |
6678 (3048) |
2.35 |
7.45 |
.954 |
Means |
8226 (3090) |
17.55 |
4.79 |
.003 |
Longitudinalf |
||||
Configural |
5324 (4099) |
[.98] |
[.032] |
|
Loadings |
9078 (4737) |
6.78 |
5.94 |
.34 |
Intercepts |
9756 (5298) |
0.99 |
7.01 |
>.999 |
Means |
15012 (5375) |
30.28 |
4.58 |
<.001 |
Note. SBX² = Satorra-Bentler scaled chi-square, df = degrees of freedom. CFI = Comparative fit index, configural invariance only.
RMSEA = Root-mean-square error of approximation, configural invariance only. a n = 4000 per gender of the tutor (man/woman).
b n = 4000 per semester (fall/spring). c n = 4000 per clinical teaching setting (inpatient/outpatient). d n = 933 per year of study (3rd, 4th, 5th, 6th, 7th).
e n = 780 per group (one, two, three, four, five, six, or seven-and-more weeks). f n = 503 per year (2004 to 2015).
We evaluated the validity of MEDUC30 to assess clinical teachers’ effectiveness. According to EFA, our data was reasonably well explained by a bifactor structure with six domain-specific factors. Five of them closely related to the seven theoretical domains of SFDP framework3 , and the sixth factor corresponding to an added dimension named Patient-Based Teaching. CFA proved that this model had a good fit for the data and was better than a single factor model or a first-order multidimensional model with four or six factors to account for MEDUC30 scores.
Besides supporting the multidimensionality of the teaching effectiveness construct, present results indicate that MEDUC30 behaves as a hierarchical construct, with a general factor that can be construed as ‘being a good teacher’, and six domain-specific factors. During the last decade, hundreds of clinical teachers at the PUC have completed a diploma in medical education32. Thus, it is conceivable that the ‘being a good teacher’ general factor found in this study is related to this professionalisation of teaching which entails the acquisition of general good teaching practices, in addition to domain-specific skills.
The ‘good teacher’ general factor found in our study is reminiscent of the ‘teaching performance’ latent construct proposed by Flores et al.7 to explain his results with OPINEST, an instrument used to evaluate medical teacher competences.
The potential contribution of MEDUC30 to medical education in Spanish-speaking contexts is related to its focus on the evaluation of facilitatory rol of medical teachers in the clinical setting. MEDUC30 covers this sensitive period of transition from passive, teacher-centered, information-driven teaching to active, student-centered, patient-driven learning33 . A recent study found that the attributes of an effective teacher differ between the classroom and the clinical setting34 thus giving support to the importance of context specificity in teaching effectiveness ratings.
Our results are partially consistent with initial validations of the SFDP framework construct using EFA on data obtained with the questionnaire SFDP26 3,35. In these studies the authors deemed the data to be reasonably explained by the theoretical seven-di-mension structure.
Compared to the four-factor structure proposed for MEDUC30 in the initial exploratory studies4, the bifactor structure with six domain-specific factors presented here corresponds more closely to the theoretical SFDP framework. Three of these factors corresponded exactly to the dimensions: Learning Climate, Control of the Session and Communication of Goals. The other two factors gathered the items of Evaluation and Feedback, on the one hand, and Promoting Comprehension and Retention and Promoting Self-directed Learning, on the other.
In a psychometric evaluation of the SFDP26, performed over a relatively small sample (N = 119), Mintz et al.36 proposed a new five-factor structure for a reduced 15-item instrument. Comparisons of our results with this report are difficult to draw since the authors did not evaluate hierarchical models. On the other hand, they eliminated entire dimensions rather than redefining the structure based on substantive and statistical criteria with the original set of items.
In a recent study done with Middle Eastern undergraduate medical students37 , a modified version of the “System for Evaluation of Teaching Qualities (SETQ), an instrument also based on the SPDF educational famework, displayed a six-factor structure consistent with the main SFDP domains and with MEDUC30.
MEDUC30 is a validated theory-based instrument in Spanish to assess clinical teachers’ effectiveness by students during the training clinical years. It adds to the repertoire of instruments developed in Spanish to evaluate medical teacher performance in basic science years5,6, and those aimed at medical specialty training 8-10.
This study has strengths related to the sample size and analytical methods used. Compared to other validation studies (for a revision see Fluit et al.2), the number of evaluations and teachers was several folds larger, and we employed multiple and stringent criteria for the factor analyses. These features endorse the robustness and reliability of results.
Unlike most validations of similar instruments, this study includes measurement invariance information. MEDUC30 can be used for comparisons across several data variables (i.e. date, tutor gender, year of study, and the length of rotation).
One limitation of this study is that it involves a single medical school; thus it would be necessary to confirm the questionnaire generalizability for other medical schools or countries with different clinical teaching realities.
Regarding further improvements of the questionnaire, it seems advisable to increase the width of the scale to allow for a larger response range. Scores have improved systematically during the 12-year period of assessment and, as a consequence, the power of discrimination of the 4-point scale has diminished.
It should always be borne in mind that while the assessment of clinical teachers by students could reveal valid and relevant information, this should be “triangulated” with information derived from other sources, including peers and self-assessment 2 .
Conclusions
In this report we give evidence that MEDUC30 is a reliable and valid instrument suited to provide clinical teachers with feedback on their strengths and weaknesses about multiple dimensions of clinical teaching. It has evidence of content validity, internal structure validity and use validity. This instrument should be of interest to medical schools of Spanish-speaking countries for it adds to the repertoire of validated instruments in Spanish to evaluate medical teachers in the clinical teaching setting.
MEDUC30 internal structure validity was supported in this study by the multidimensionality of its scores and the consistency of this internal structure with the educational framework used in its development. The content validity evidence derives from the questionnaire construction, which was based on a previously developed instrument, and on the input of experts and students. Also, MEDUC30 items cover 5 out of 7 of the roles agreed as characteristic of good teaching38-40. Finally, its use validity is confirmed by the widespread use and acceptance of this instrument for the assessment of clinical teachers at PUC medical school for more than ten years.
In conclusion, MEDUC30 meets satisfactorily three of the five possible sources of validity evidence as defined by the American Psychological and Education Research Associations published standards41,42: internal structure, content and use. Future investigations will be needed to provide evidence for the remaining two validity sources: relation to other variables and consequences.
Agradecemos a todos los estudiantes y docentes que participaron del proceso de evaluación durante el período 2004-2015.
Ninguna.
Reconocemos el apoyo del Centro de Educación Médica de la Escuela de Medicina de la Pontificia Universidad Católica de Chile.
Ninguno.
REFERENCES
* Autor para correspondencia: Marcela Bitran, Centro de Educación Médica,
Escuela de Medicina, Pontificia Universidad Católica de Chile.
Avenida Diagonal Paraguay 362, Santiago, Chile
Correo electrónico: mbitran@med.pub.cl
La revisión por pares es responsabilidad de la Universidad Nacional Autónoma de México.
2007-5057/© 2018 Universidad Nacional Autónoma de México, Facultad de Medicina.
Este es un artículo Open Access bajo la licencia CC BY-NC-ND (http://creativecommons.org/licenses/by-nc-nd/4.0/).