Objectives

IJME

Int J Med Educ

International Journal of Medical Education

Int J Med Educ

2042-6372

IJME

8-6369

10.5116/ijme.5898.2f91

Original research

Script concordance tests

Accuracy of script concordance tests in fourth-year medical students

Nseir

Saad

¹ Elkalioubie

Ahmed

² Deruelle

Philippe

¹ Lacroix

Dominique

¹ Gosset

Didier

¹ 1University of Lille, School of Medicine, Lille, France 2University Hospital of Lille, Critical Care Center, Lille, France

Correspondence: Saad Nseir, University Hospital of Lille, School of Medicine, Lille, France. Email: s-nseir@chru-lille.fr

25 02 2017

2017

8 63 69 06 02 2017 17 08 2016

2017

This is an Open Access article distributed under the terms of the Creative Commons Attribution License which permits unrestricted use of work provided the original work is properly cited. http://creativecommons.org/licenses/by/3.0/

Objectives

This investigation aimed to determine the validity of script concordance test (SCT), compared with clinical-case-related short-answer management problems (SAMP), in fourth-year medical students.

Methods

This retrospective study was conducted at the Medical School of Lille University. Cardiology and gynecology examinations both included 3 SCT and 2 clinical-case-related SAMP. Final score did not include SCT results, and was out of 20 points. The passing score was ≥10/20. Wilcoxon and McNemar tests were used to compare quantitative and qualitative variables, respectively. Correlation between scores was also analyzed.

Results

A total of 519 and 521 students completed SAMP and SCT in cardiology and gynecology, respectively. Cardiology score was significantly higher in SCT than SAMP (mean ± SD 13.5±2.4 versus 11.4±2.6, Wilcoxon test, p<0.001). In gynecology, SCT score was significantly lower than SAMP score (10.8±2.6 versus 11.4±2.7, Wilcoxon test, p=0.001). SCT and SAMP scores were significantly correlated (p <0.05, Pearson’s correlation). However, percentage of students with SCT score ≥ 10/20 was similar among those who passed or failed cardiology (327 of 359 (91%) vs 146 of 160 (91%), χ²=0.004, df =1, p=0.952), or gynecology (274 of 379 (65%) vs 84 of 142 (59%), χ²=1.614, df=1, p=0.204) SAMP test. Cronbach alpha coefficient was 0.31 and 0.92 for all SCT and SAMP, respectively.

Conclusions

Although significantly correlated, the scores obtained in SCT and SAMP were significantly different in fourth-year medical students. These findings suggest that SCT should not be used for summative purposes in fourth-year medical students.

Script concordance tests evaluation fourth-year medical year France

Introduction

Script concordance tests (SCT) assess clinical reasoning expertise in a context of uncertainty.¹ Such uncertainty could result from missing information at the time of decision-making, or absence of evidence-based medical recommendations.

In spite of some format similarities, SCT differ from content-enriched, multiple choice questions (MCQ). Although MCQ deal with clinical reasoning end-point, or relevant knowledge, SCT assess some parts of the cognitive process. In MCQ, one has to choose a single best answer, whereas in SCT students are evaluated by agreement or concordance of their answers with those of an expert panel. Furthermore, MCQ add unnecessary complexity to factual knowledge, while SCT are a genuine simulation of patients’ clinical history without additional complexity.²

Recently, universities worldwide have used SCT for clinical reasoning in various medical disciplines including pediatric medicine,³ emergency medicine,⁴ critical care,⁵ anesthesiology,⁶ surgery,⁷ radiology,⁸ and other medical specialties.^6,9–11 SCT is generally used for training and evaluation during the postgraduate medical studies. Previous studies have suggested that SCT could be used as a standardized instrument to evaluate growth in clinical reasoning skills.¹²^,¹³ However, one of the limitations of using SCT in this context is the difficulty to give a clear and helpful feedback. Further, evidence supporting the validity of SCT scores with respect to examinees’ thought and response processes is still limited,¹⁴ and potential weaknesses of SCT have recently been outlined.¹⁵ See et al.¹⁶ analyzed SCT, and MCQ scores on pulmonary and critical care medicine tests in 16 fellows and 10 residents. They concluded that SCT was vulnerable to the intentional avoidance of extreme responses. Another recent study evaluated the judgment of a panel of emergency medicine consultants against evidence-based likelihood ratios regarding the diagnosis value of selected clinical and para-clinical findings in the context of an SCT.⁴ The results raised concerns regarding whether the judgments of an expert panel are sufficiently valid as the reference standard for this test. Moreover, SCT could be very difficult to construct, apply and correct. Roberti et al.¹⁷ suggested that these difficulties might make application of a SCT assessment method unfeasible in units with limited resources.

Several studies have assessed SCT feasibility and efficacy as an evaluation tool in fourth-year medical students.^18–26 However, only few have compared SCT to other examination forms in the same group of students.^18,20–22 Furthermore, these studies included few students. Given the above-discussed limitations of using SCT to assess medical students in routine, we hypothesized that SCT would not be accurate for summative purposes in fourth-year medical student, independently of the domain of knowledge. Therefore, we conducted this study to evaluate SCT validity, compared with SAMP, in assessment of fourth-year medical students.

Methods Study design and participants

This retrospective study was conducted, in January 2013, at the Medical School of Lille University.The study was approved by the local Institutional Review Board (Comité de Protection des Personnes Nord-ouest IV). Because of the retrospective observational design of the study, and in accordance with the French law, written informed consent was not required by the local IRB. All data were analyzed anonymously. Five hundred and twenty one students attending the fourth year of medical school were included in this study.

Data-collection method and procedure

Students had received a dedicated training for SCT, including 2 hours of theory about definition and construction of SCT, and several practices during cardiology and gynecology practical teaching. SCTs were constructed according to the guidelines of Dory et al.² For each of cardiology and gynecology, two faculty members wrote the SCT. Both cardiology and gynecology SCT were reviewed and answered by 12 and 10 experts, respectively. Each SCT (3 in cardiology, and 3 in gynecology) included a clinical vignette and 3 hypotheses (or items). Additional information was provided after each hypothesis. The questions pertained to the effect of the new piece of information on the initial hypothesis. Students provided their answers on a 5-point Likert scale (-2 to +2) (Appendices 1 and 2). SCT was rated for out of 20 (2.25 for the first 8 items, and 2 for the last item).

Cardiology and gynecology full tests lasted 2h30 each, and included 3 SCT and 2 clinical-case-related SAMP that were given to students at the beginning of the test. The cardiology and gynecology SAMP included two clinical cases with 8-10 questions, requiring open and short answers. These questions dealt with a clinical issue or the recall of factual knowledge. SAMP have been used in our Medical School for summative assessment for several years. An example of SAMP is presented in Appendix 3. The final score was out of 20 points for both cardiology and gynecology, and was calculated as the total of SAMP grades. The passing score was ≥10/20. SCT results were not included in the final score.

Statistical analysis

SPSS software (IBM Statistics 22) was used for statistical analysis. Qualitative variables are presented as number (%). Distribution of quantitative variables was tested using Kolmogorov-Smirnov test. These data are presented as mean ± SD, as they were normally distributed. Statistical significance was set at p-value < 0.05. Cronbach’s α coefficient computing was used to assess reliability of SCT and SAMP.

Scores of SCT and SAMP were compared, for cardiology and gynecology, using Wilcoxon test. The percentage of students with an SCT score ≥ 50% in the 2 groups of students who passed and failed the test was compared using McNemar test. Wilcoxon and McNemar tests are usually used to compare quantitative and qualitative data in the same individuals, respectively. Correlation between SCT score and final score was analyzed with the Pearson’s coefficient.

Results

Among the 521 included students, 265 (50.9%) were female. Their mean (± SD) age was 23.9 (±1.5) years, and the mean study year (± SD) was 4.6 (±1.1). Cronbach α coefficient was 0.31, and 0.92 for all SCT, and SAMP; respectively. Although 519 students completed SCT in cardiology, all students completed SCT in gynecology.

Cardiology examination

A total of 519 students completed the 2 SAMP and the SCT in cardiology. Mean score was significantly higher in SCT compared with SAMP (13.5±2.4 vs 11.4 ± 2.6, Wilcoxon test, p <0.001). A score ≥ 50% of maximum score, i.e. ≥ 10/20, was significantly more frequent in SCT than in SAMP (473 students [91%] vs 359 students [69%], respectively, McNemar test, p< 0.001).

Percentage of students with a SCT score ≥ 10/20 was similar (χ²=0.004, df = 1, p=0.952) in the 2 groups of students who passed (final score ≥ 10/20, 327/359 [91%]) or failed (final score < 10/20, 146/160 [91%]) SAMP test. SCT score was significantly correlated with SAMP score (Pearson’s correlation, r²=0.57, p=0.047).

Gynecology examination

A total of 521 students completed the 2 SAMP and the SCT in gynecology. Mean score was significantly lower in SCT compared to SAMP (10.8 ± 2.6 vs 11.4 ± 2.7, Wilcoxon test, p=0.001).

A score ≥ 50% of maximum score, i.e. ≥ 10/20, was found significantly less in SCT than in SAMP (331 [63%] vs 379 [72%], McNemar test, p=0.001). Percentage of students with an SCT score ≥ 10/20 was similar (χ²=1.614, df =1, p = 0.204) in the 2 groups of students who passed (final score ≥ 10/20, 247/379 [65%]) and failed (final score ≤ 10/20; 84/142 [59%]) SAMP test.

SCT score was significantly correlated with SAMP score (Pearson’s correlation, r²=0.92, p=0.004).

Discussion

Our results show a significant correlation between SCT and SAMP scores. However, these scores were significantly different. Furthermore, percentage of students with an SCT score ≥ 10/20 was similar in the 2 groups of students who passed and failed the examination, based on the SAMP score. These results suggest that SCT failed in differentiating strong from weak students based on SAMP scores.

To our knowledge, our study is the first to compare SCT and SAMP in a large cohort of fourth-year medical students. In a cohort of 85 fourth-year medical students, Jouneau et al. evaluated SCT as a tool for assessment of clinical reasoning and knowledge organization in pulmonology clinical cases written examination.¹⁸ Students’ score in clinical cases and SCT were significantly correlated, as in our study. However, these 2 studies differ in several aspects. Our study included a larger number of students, as compared with the study of Jouneau et al. (519 vs. 85). It also dealt with two medical disciplines (cardiology and gynecology) rather than one (pulmonology), thus allowing more relevant generalization of its results. Furthermore, whilst SCT were compared to SAMP in our study, Jouneau et al. compared SCT with clinical cases.

Another recent study evaluated the utilization of SCT as an assessment tool for fifth-year medical student in rheumatology. The test included 60 questions, and was administered to a panel of 19 experts, and to 26 students.²⁷ Fifteen students completed SCT in its entity, and had a mean score of 61.5. Despite the low participation rate, the possibility of using this internet-based SCT was demonstrated.

Several studies compared the performance of SCT and MCQ in students’ assessment. Fournier et al. compared SCT and content-enriched MCQ performance in assessment of clinical reasoning expertise in the field of emergency medicine.²⁸ In spite of adequate Cronbach α coefficient (ranging from 0.85-0.95), SCT and MCQ were not significantly correlated (r² = 0.016, p = 0.59). As further pointed out by the authors to explain these negative results, only few students, and physicians were included in this study (20 first-year residents, 16 sixth-year medical students, and 7 certified doctors). Collard et al.²² compared SCT with factual knowledge test scores (true/false test with a 0-100% ascertainment degree), by 104 3rd, 4th, 5th, and 6th year medical students, and found a significant correlation between the 2 tests. Brailovsky et al.²¹ also found SCT to be significantly correlated to SAMP in a cohort of 24 medical students, in Quebec, from the end of their clerkship to the end of their residency.

In a recent study, Kelly et al.²⁰ compared reliability, validity and learner satisfaction between SCT, MCQ and National Board of Medical Examiners tests. This study included 120 3rd and 4th year medical students who were given 20-item SCT and MCQ. SCT examination was more valid than the MCQ examination because of better correlation with clinical performance. However, SCT was initially less reliable and less preferred by students.

Despite the significant correlation found in our study between SCT and SAMP scores, the scores obtained in these tests were significantly different. This is most likely due to different type of knowledge assessed by SCT and SAMP. In fact, SCT assess clinical reasoning expertise in a context of uncertainty, whereas SAMP assesses clinical situation-based factual knowledge. One could argue that whilst SAMP is valuable for summative assessment of students, SCT would allow better ranking of students. However, our results suggest that SCT should not be used for summative assessment. Van den Broek et al.²⁹ reported similar conclusions in final-year medical students.

One of the strengths of our study is the fact that SCT were not valid in summative assessment in two different specialties, i.e. cardiology and gynecology. No clear difference was found in the format of SCT in cardiology and gynecology to explain the better scores obtained in cardiology compared with gynecology. One potential explanation for this discrepancy is the clinical experience of students.

Our study has several limitations. The direct comparison of similar concepts between SCT and SAMP was not possible, as detailed learning objectives were not available. In addition, students knew that the SCT would not be taken into account in their final grade, and this might have reduced their efforts in that section of the test. However, the students knew that SCT would probably be used for their final examination at the final year of medical studies. Another limitation of our study lies in its reliability, with an SCT Cronbach α coefficient of only 0.31. Some authors have reported an adequate reliability with a minimum of 15 experts. Accordingly, the 12 and 10-member expert panels could be considered relatively small, and might have negatively affected Cronbach α.³⁰^,³¹ Furthermore, few SCT hypotheses (n=3) did not allow consensus among experts who answered the SCT. Nevertheless, exclusion of these conflicting questions from statistical analysis did not improve Cronbach α coefficient (results not shown). Absence of consensus among experts is one the major limitations of SCT, as no clear action is recommended when experts disagree. Some authors suggest that Cronbach’s alpha might not be the best way to assess SCT reliability, as clinical reasoning may not be a unitary concept. Finally, our results could not be generalized because of the single center design, the fact that SCT were only evaluated in cardiology and gynecology, and the low Cronbach’s α. Further multicenter studies are required to confirm our findings.

Conclusions

Although significantly correlated, SCT and SAMP scores of cardiology and gynecology were significantly different in fourth-year medical students. SCT failed in differentiating strong from weak students, based on SAMP scores. These results suggest that SCT should not be used for summative purposes in fourth-year medical student.

Acknowledgements

We thank Dr Laura Ravasi for her assistance in writing and English editing the manuscript, on behalf of the University of Lille, France.

Conflict of Interest

The authors declare that they have no conflict of interest.

Supplementary material

Supplementary file 1

Appendix 1. An example of a cardiology SCT case vignette

Supplementary file 2

Appendix 2. An example of a Gynecology SCT case vignette

Supplementary file 3

Appendix 3. Example of Gynecology short answer management problem

References 1

Giet

Massart

Gagnon

Charlin

Le test de concordance de script en 20 questions.

Pédagogie Médicale2013143948

10.1051/pmed/2012026

Dory

Gagnon

Vanpee

Charlin

How to construct and implement script concordance tests: insights from a systematic review.

Med Educ201246552563

10.1111/j.1365-2923.2011.04211.x

22626047

Carrière

Gagnon

Charlin

Downing

Bordage

Assessing clinical reasoning in pediatric emergency medicine: validity evidence for a Script Concordance Test.

Ann Emerg Med200953647652

10.1016/j.annemergmed.2008.07.024

18722694

Ahmadi

Khoshkish

Soltani-Arabshahi

Hafezi-Moghadam

Zahmatkesh

Heidari

Baba-Beigloo

Baradaran

Lotfipour

Challenging script concordance test reference standard by evidence: do judgments by emergency medicine consultants agree with likelihood ratios?

Int J Emerg Med2014734

10.1186/s12245-014-0034-3

25635194

Gibot

Bollaert

Le test de concordance de script comme outil d’évaluation formative en réanimation médicale.

Pédagogie médicale20089718

10.1051/pmed:2008037

Couraud

Girard

Souquet

Coiffard

Charlin

Milleron

Évaluation des connaissances sur le dépistage du cancer par test de concordance de script.

Revue des Maladies Respiratoires201633333342

10.1016/j.rmr.2015.09.001

26603957

Goos

Schubach

Seifert

Boeker

Validation of undergraduate medical student script concordance test (SCT) scores on the clinical assessment of the acute abdomen.

BMC Surg20161657

10.1186/s12893-016-0173-y

27535826

Brazeau-Lamontagne

Charlin

Gagnon

Samson

van der Vleuten

Measurement of perception and interpretation skills during radiology training: utility of the script concordance approach.

Med Teach200426326332

10.1080/01421590410001679000

15203845

Kazour

Richa

Zoghbi

El-Hage

Haddad

Using the Script Concordance Test to Evaluate Clinical Reasoning Skills in Psychiatry.

Acad Psychiatry2017418690

10.1007/s40596-016-0539-6

27178278

Kow

Walters

Karram

Sarsotti

Jelovsek

Assessing intraoperative judgment using script concordance testing through the gynecology continuum of practice.

Med Teach201436724729

10.3109/0142159X.2014.910297

24819908

Tan

Kandiah

Samarasekera

Ponnamperuma

Validating a script concordance test for assessing neurological localization and emergencies.

Eur J Neurol20142114191422

10.1111/ene.12373

24484361

Humbert

Miech

Measuring gains in the clinical reasoning of medical students: longitudinal results from a school-wide script concordance test.

Acad Med20148910461050

10.1097/ACM.0000000000000267

24979174

Custers

Thirty years of illness scripts: Theoretical origins and practical applications.

Med Teach201537457462

10.3109/0142159X.2014.956052

25180878

Lubarsky

Charlin

Cook

Chalk

van der Vleuten

Script concordance testing: a review of published validity evidence.

Med Educ201145329338

10.1111/j.1365-2923.2010.03863.x

21401680

Wan

Using the script concordance test to assess clinical reasoning skills in undergraduate and postgraduate medicine.

Hong Kong Med J201521455461

10.12809/hkmj154572

26314569

See

Tan

Lim

The script concordance test for clinical reasoning: re-examining its utility and potential weakness.

Med Educ20144810691077

10.1111/medu.12514

25307634

Roberti

Mdo R

Pereira

Costa

Script concordance test in medical schools in Brazil: possibilities and limitations.

Sao Paulo Med J2016134116120

10.1590/1516-3180.2015.00100108

26786613

Jouneau

Luraine

Desrues

Intérêt des tests de concordance de script pour évaluer le raisonnement et l’organisation des connaissances des étudiants de quatrième année des études médicales en France.

Pédagogie Médicale201213225232

10.1051/pmed/2012020

Duggan

Charlin

Summative assessment of 5th year medical students' clinical reasoning by Script Concordance Test: requirements and challenges.

BMC Med Educ20121229

10.1186/1472-6920-12-29

22571351

Kelly

Durning

Denton

Comparing a script concordance examination to a multiple-choice examination on a core internal medicine clerkship.

Teach Learn Med201224187193

10.1080/10401334.2012.692239

22775780

Brailovsky

Charlin

Beausoleil

Coté

Van der Vleuten

Measurement of clinical reflective capacity early in training as a predictor of clinical reasoning performance at the end of residency: an experimental study on the script concordance test.

Med Educ200135430436

11328512

Collard

Gelaes

Vanbelle

Bredart

Defraigne

Boniver

Bourguignon

Reasoning versus knowledge retention and ascertainment throughout a problem-based learning curriculum.

Med Educ200943854865

10.1111/j.1365-2923.2009.03410.x

19709010

Hornos

Pleguezuelos

Brailovsky

Harillo

Dory

Charlin

The practicum script concordance test: an online continuing professional development format to foster reflection on clinical practice.

J Contin Educ Health Prof2013335966

10.1002/chp.21166

23512561

Aldekhayel

Alselaim

Magzoub

Al-Qattan

Al-Namlah

Tamim

Al-Khayal

Al-Habdan

Zamakhshary

Constructing a question bank based on script concordance approach as a novel assessment methodology in surgical education.

BMC Med Educ201212100

10.1186/1472-6920-12-100

23095569

Amini

Moghadami

Kojuri

Abbasi

Abadi

Molaee

Pishbin

Javadzade

Kasmaee

Vakili

Sadat

Akbari

Omidvar

Shafaghi

Dehbozorgian

Jafari

Monajemi

Arabshahi

Adibi

Charlin

An innovative method to assess clinical reasoning skills: Clinical reasoning tests in the second national medical science Olympiad in Iran.

BMC Res Notes20114418

10.1186/1756-0500-4-418

22005350

Piovezan

Custódio

Cendoroglo

Batista

Lubarsky

Charlin

Assessment of undergraduate clinical reasoning in geriatric medicine: application of a script concordance test.

J Am Geriatr Soc20126019461950

10.1111/j.1532-5415.2012.04152.x

23036106

Mathieu

Couderc

Glace

Tournadre

Malochet-Guinamand

Pereira

Dubost

Soubrier

Construction and utilization of a script concordance test as an assessment tool for DCEM3 (5th year) medical students in rheumatology.

BMC Med Educ201313166

10.1186/1472-6920-13-166

24330600

Fournier

Thiercelin

Pulcini

Alunni-Perret

Gilbert

Minguet

Bertrand

Évaluation du raisonnement clinique en médecine d’urgence : les tests de concordance des scripts décèlent mieux l’expérience clinique que les questions à choix multiples à contexte riche.

Pédagogie Médicale200672030

10.1051/pmed:2006020

van den Broek

van Asperen

Custers

Valk

Ten Cate

Effects of two different instructional formats on scores and reliability of a script concordance test.

Perspect Med Educ20121119128

10.1007/s40037-012-0017-0

23316468

Lubarsky

Dory

Duggan

Gagnon

Charlin

Script concordance testing: from theory to practice: AMEE guide no. 75.

Med Teach201335184193

10.3109/0142159X.2013.760036

23360487

Lineberry

Kreiter

Bordage

Threats to validity in the use and interpretation of script concordance test scores.

Med Educ20134711751183

10.1111/medu.12283

24206151