This investigation aimed to determine the validity of script concordance test (SCT), compared with clinical-case-related short-answer management problems (SAMP), in fourth-year medical students.
This retrospective study was conducted at the Medical School of Lille University. Cardiology and gynecology examinations both included 3 SCT and 2 clinical-case-related SAMP. Final score did not include SCT results, and was out of 20 points. The passing score was ≥10/20. Wilcoxon and McNemar tests were used to compare quantitative and qualitative variables, respectively. Correlation between scores was also analyzed.
A total of 519 and 521 students completed SAMP and SCT in cardiology and gynecology, respectively. Cardiology score was significantly higher in SCT than SAMP (mean ± SD 13.5±2.4 versus 11.4±2.6, Wilcoxon test, p<0.001). In gynecology, SCT score was significantly lower than SAMP score (10.8±2.6 versus 11.4±2.7, Wilcoxon test, p=0.001). SCT and SAMP scores were significantly correlated (p <0.05, Pearson’s correlation). However, percentage of students with SCT score ≥ 10/20 was similar among those who passed or failed cardiology (327 of 359 (91%) vs 146 of 160 (91%), χ2=0.004, df =1, p=0.952), or gynecology (274 of 379 (65%) vs 84 of 142 (59%), χ2=1.614, df=1, p=0.204) SAMP test. Cronbach alpha coefficient was 0.31 and 0.92 for all SCT and SAMP, respectively.
Although significantly correlated, the scores obtained in SCT and SAMP were significantly different in fourth-year medical students. These findings suggest that SCT should not be used for summative purposes in fourth-year medical students.
Script concordance tests (SCT) assess clinical reasoning expertise in a context of uncertainty.
In spite of some format similarities, SCT differ from content-enriched, multiple choice questions (MCQ). Although MCQ deal with clinical reasoning end-point, or relevant knowledge, SCT assess some parts of the cognitive process. In MCQ, one has to choose a single best answer, whereas in SCT students are evaluated by agreement or concordance of their answers with those of an expert panel. Furthermore, MCQ add unnecessary complexity to factual knowledge, while SCT are a genuine simulation of patients’ clinical history without additional complexity.
Recently, universities worldwide have used SCT for clinical reasoning in various medical disciplines including pediatric medicine,
Several studies have assessed SCT feasibility and efficacy as an evaluation tool in fourth-year medical students.18–26 However, only few have compared SCT to other examination forms in the same group of students.18,20–22 Furthermore, these studies included few students. Given the above-discussed limitations of using SCT to assess medical students in routine, we hypothesized that SCT would not be accurate for summative purposes in fourth-year medical student, independently of the domain of knowledge. Therefore, we conducted this study to evaluate SCT validity, compared with SAMP, in assessment of fourth-year medical students.
This retrospective study was conducted, in January 2013, at the Medical School of Lille University.The study was approved by the local Institutional Review Board (Comité de Protection des Personnes Nord-ouest IV). Because of the retrospective observational design of the study, and in accordance with the French law, written informed consent was not required by the local IRB. All data were analyzed anonymously. Five hundred and twenty one students attending the fourth year of medical school were included in this study.
Students had received a dedicated training for SCT, including 2 hours of theory about definition and construction of SCT, and several practices during cardiology and gynecology practical teaching. SCTs were constructed according to the guidelines of Dory et al.
Cardiology and gynecology full tests lasted 2h30 each, and included 3 SCT and 2 clinical-case-related SAMP that were given to students at the beginning of the test. The cardiology and gynecology SAMP included two clinical cases with 8-10 questions, requiring open and short answers. These questions dealt with a clinical issue or the recall of factual knowledge. SAMP have been used in our Medical School for summative assessment for several years. An example of SAMP is presented in Appendix 3. The final score was out of 20 points for both cardiology and gynecology, and was calculated as the total of SAMP grades. The passing score was ≥10/20. SCT results were not included in the final score.
SPSS software (IBM Statistics 22) was used for statistical analysis. Qualitative variables are presented as number (%). Distribution of quantitative variables was tested using Kolmogorov-Smirnov test. These data are presented as mean ± SD, as they were normally distributed. Statistical significance was set at p-value < 0.05. Cronbach’s α coefficient computing was used to assess reliability of SCT and SAMP.
Scores of SCT and SAMP were compared, for cardiology and gynecology, using Wilcoxon test. The percentage of students with an SCT score ≥ 50% in the 2 groups of students who passed and failed the test was compared using McNemar test. Wilcoxon and McNemar tests are usually used to compare quantitative and qualitative data in the same individuals, respectively. Correlation between SCT score and final score was analyzed with the Pearson’s coefficient.
Among the 521 included students, 265 (50.9%) were female. Their mean (± SD) age was 23.9 (±1.5) years, and the mean study year (± SD) was 4.6 (±1.1). Cronbach α coefficient was 0.31, and 0.92 for all SCT, and SAMP; respectively. Although 519 students completed SCT in cardiology, all students completed SCT in gynecology.
A total of 519 students completed the 2 SAMP and the SCT in cardiology. Mean score was significantly higher in SCT compared with SAMP (13.5±2.4 vs 11.4 ± 2.6, Wilcoxon test, p <0.001). A score ≥ 50% of maximum score, i.e. ≥ 10/20, was significantly more frequent in SCT than in SAMP (473 students [91%] vs 359 students [69%], respectively, McNemar test, p< 0.001).
Percentage of students with a SCT score ≥ 10/20 was similar (χ2=0.004, df = 1, p=0.952) in the 2 groups of students who passed (final score ≥ 10/20, 327/359 [91%]) or failed (final score < 10/20, 146/160 [91%]) SAMP test. SCT score was significantly correlated with SAMP score (Pearson’s correlation, r2=0.57, p=0.047).
A total of 521 students completed the 2 SAMP and the SCT in gynecology. Mean score was significantly lower in SCT compared to SAMP (10.8 ± 2.6 vs 11.4 ± 2.7, Wilcoxon test, p=0.001).
A score ≥ 50% of maximum score, i.e. ≥ 10/20, was found significantly less in SCT than in SAMP (331 [63%] vs 379 [72%], McNemar test, p=0.001). Percentage of students with an SCT score ≥ 10/20 was similar (χ2=1.614, df =1, p = 0.204) in the 2 groups of students who passed (final score ≥ 10/20, 247/379 [65%]) and failed (final score ≤ 10/20; 84/142 [59%]) SAMP test.
SCT score was significantly correlated with SAMP score (Pearson’s correlation, r2=0.92, p=0.004).
Our results show a significant correlation between SCT and SAMP scores. However, these scores were significantly different. Furthermore, percentage of students with an SCT score ≥ 10/20 was similar in the 2 groups of students who passed and failed the examination, based on the SAMP score. These results suggest that SCT failed in differentiating strong from weak students based on SAMP scores.
To our knowledge, our study is the first to compare SCT and SAMP in a large cohort of fourth-year medical students. In a cohort of 85 fourth-year medical students, Jouneau et al. evaluated SCT as a tool for assessment of clinical reasoning and knowledge organization in pulmonology clinical cases written examination.
Another recent study evaluated the utilization of SCT as an assessment tool for fifth-year medical student in rheumatology. The test included 60 questions, and was administered to a panel of 19 experts, and to 26 students.
Several studies compared the performance of SCT and MCQ in students’ assessment. Fournier et al. compared SCT and content-enriched MCQ performance in assessment of clinical reasoning expertise in the field of emergency medicine.
In a recent study, Kelly et al.
Despite the significant correlation found in our study between SCT and SAMP scores, the scores obtained in these tests were significantly different. This is most likely due to different type of knowledge assessed by SCT and SAMP. In fact, SCT assess clinical reasoning expertise in a context of uncertainty, whereas SAMP assesses clinical situation-based factual knowledge. One could argue that whilst SAMP is valuable for summative assessment of students, SCT would allow better ranking of students. However, our results suggest that SCT should not be used for summative assessment. Van den Broek et al.
One of the strengths of our study is the fact that SCT were not valid in summative assessment in two different specialties, i.e. cardiology and gynecology. No clear difference was found in the format of SCT in cardiology and gynecology to explain the better scores obtained in cardiology compared with gynecology. One potential explanation for this discrepancy is the clinical experience of students.
Our study has several limitations. The direct comparison of similar concepts between SCT and SAMP was not possible, as detailed learning objectives were not available. In addition, students knew that the SCT would not be taken into account in their final grade, and this might have reduced their efforts in that section of the test. However, the students knew that SCT would probably be used for their final examination at the final year of medical studies. Another limitation of our study lies in its reliability, with an SCT Cronbach α coefficient of only 0.31. Some authors have reported an adequate reliability with a minimum of 15 experts. Accordingly, the 12 and 10-member expert panels could be considered relatively small, and might have negatively affected Cronbach α.
Although significantly correlated, SCT and SAMP scores of cardiology and gynecology were significantly different in fourth-year medical students. SCT failed in differentiating strong from weak students, based on SAMP scores. These results suggest that SCT should not be used for summative purposes in fourth-year medical student.
We thank Dr Laura Ravasi for her assistance in writing and English editing the manuscript, on behalf of the University of Lille, France.
The authors declare that they have no conflict of interest.
Appendix 1. An example of a cardiology SCT case vignette
Appendix 2. An example of a Gynecology SCT case vignette
Appendix 3. Example of Gynecology short answer management problem