Faculty perspectives on the use of standardized versus non-standardized oral examinations to assess medical students

Objectives To determine if faculty perceive standardized oral examinations to be more objective and useful than the non-standardized format in assessing third-year medical students’ learning on the obstetrics and gynecology rotation. Methods Obstetrics and gynecology faculty at three teaching hospitals were sampled to complete a survey retrospectively comparing the standardized oral examination (SOE) and non-standardized or traditional oral examinations (TOE). A Likert scale (0-5) was used to assess satisfaction, objectivity, and usefulness of SOE and TOE. Wilcoxon signed rank test was performed to compare median Likert scale scores for each survey item. A Spearman’s correlation coefficient was used to investigate the relationship between the perceived level of objectivity and SOE characteristics. For qualitative measures, content analysis was applied. Results Sixty-six percent (n=25) of eligible faculty completed the survey. Faculty perceived the standardized oral examination as significantly more objective compared with the non-standardized (z=-3.15, p=0.002). Faculty also found SOE to be more useful in assessing overall clerkship performance (z=-2.0, p<0.05). All of the survey participants were willing to administer the standardized examination again. Faculty reported strengths of the SOE to be uniformity, fairness, and ease of use. Major weaknesses reported included inflexibility and decreased ability to assess students’ higher order reasoning skills. Conclusions Faculty found standardized oral examinations to be more objective in assessing third-year medical students’ clinical competency when compared with a non-standardized approach. This finding can be meaningfully applied to medical education programs internationally.


Introduction
The oral examination is commonly used to assess clinical knowledge and skills in both undergraduate and postgraduate medical education.Given its apparent face validity, it is thought to be an effective way of assessing clinical competencies, including knowledge, communication skills, and critical thinking.2][3][4] In an oral examination, the trainee interacts with the examiner and is assessed based on answers provided to the questions asked.Unconscious biases may influence the trainee's scores during the examination.1][12] Zahn and colleagues reported that OB/GYN rotations commonly require medical students to take an oral examination. 13In the OB/GYN rotations at Brigham and Women's Hospital, Massachusetts General Hospital, and Beth Israel Deaconess Medical Center, all affiliated with Harvard Medical School, we have used a traditional non-standardized oral examination (TOE) as one summative assessment contributing to a final rotation grade for over twenty years.With TOE, the faculty examiner asks unscripted, non-standardized questions based on the content of the trainee's submitted patient case list and faculty examiner's knowledge of medical student educational objectives for the OB/GYN clerkship.5] From OB/GYN students' perspective, the SOE has been reported to be an effective alternative method to assess students' clinical reasoning and contributes to students' preparation for the written examination. 16ur study aimed to assess faculty perception of objectivity of our oral examination given in a standardized fashion versus the traditional non-standardized approach.We replaced the TOE with a new SOE and implemented it in a pilot study in 2015.We investigated faculty's satisfaction, acceptance of and perceived usefulness of using an SOE to assess third-year medical students' learning and clinical competency during the OB/GYN rotation.

Educational contents and setting
An SOE was pilot-implemented within the OB/GYN rotations at Brigham and Women's Hospital, Massachusetts General Hospital, and Beth Israel Deaconess Medical Center in 2015.These three institutions are teaching hospitals affiliated with Harvard Medical School.OB/GYN rotations are six weeks in length for third-year Harvard Medical School students.The goals of an SOE are to assess the ability of each student to understand and discuss the pathophysiology, differential diagnosis, diagnostic evaluation and treatment of patient cases, as well as to demonstrate the student's presentation and clinical reasoning skills.Students are asked to prepare four cases encountered during their clerkship and complete a structured case list.Students select one case from each of three categories (benign gynecology, gynecologic subspecialties, and obstetrics) a fourth ambulatory case from any of the three categories.The categories were determined by the OB/GYN medical school leadership and correspond to the Association of Professors of Gynecology and Obstetrics (APGO) 10th Edition Medical Student Objectives.Each student has two 20-minute oral exams with trained faculty examiners.
Each faculty member asks the student to briefly present the patient (2-3 minutes) followed by a 7-8 minute question and answer period.Two cases are covered in each oral exam respectively.We also developed a new SOE toolbox for faculty examiners based on our previous study which included the creation of standardized questions based on APGO teaching cases.The standardized questions consist of basic questions that students are expected to answer satisfactorily in order to pass the oral examination and also more advanced questions that examiners can select depending on students' performance on basic questions. 17Each examiner fills out an evaluation form at the conclusion of the exam.A faculty development resource, which consists of a slide presentation introducing the SOE format, directions for asking standardized questions based on topic selection (including information to modify standard questions to reflect the specifics of the case), the SOE grading instrument, how to differentiate student performance utilizing sample questions/answers, and videos of oral examinations using the new SOE format, was provided to all faculty members.

Study design and data collection
Our study was a survey-based quantitative and qualitative study.In 2016, thirty-eight OB/GYN faculty who administered at least one SOE during 2015-2016, were sampled and invited to complete an SOE survey one year after its pilotimplementation.8] Expert validation was applied to ensure the face and content validity: three content experts who had concrete knowledge and experience in OB/GYN clerkship teaching assessed the survey items' clarity and relevance to ensure the construct.We then conducted cognitive pretesting with one faculty to ensure participants would interpret the survey items in the manner that we intended.The research team discussed until reaching final consensus and then finalized the survey questionnaire.The final survey instrument included six items measuring the faculty members' satisfaction and perceived usefulness of SOE using a 5-point-Likert scale, as well as two free-text questions inquiring about the strengths and weaknesses of SOE (Appendix 1).Faculty were asked to rate the perceived objectivity, satisfaction (required time commitment, and willingness of participation), and usefulness in assessing students' competencies and clerkship performance using SOE, as well as provide retrospective ratings for the TOE.Two open-ended questions were also asked to generate additional free-text commentary on the strengths and weaknesses of the oral examinations.Faculty participation was voluntary.Anonymous survey responses were collected through the online survey tool.The study was approved by the institutional review board (IRB) at all three participating institutions.

Statistical analysis
All analyses were conducted using STATA version 15.1.For the 5-point-Likert scale survey items, we reported proportions.We used the Wilcoxon signed rank test to look at the differences in median Likert Scale score between the SOE and the TOE for each survey question.Median values, z scores, and p values were reported for the Wilcoxon test.Multivariate analysis was used to examine the relationship between the level of objectivity and the other six variables of the SOE (assessment of communication skills, clinical knowledge, knowledge application, clinical reasoning, professionalism, and overall clerkship performance).Based on these results, we reported correlation coefficients and p values using Spearman's correlation coefficient.The non-parametric Spearman's correlation and the Wilcoxon signed rank test was used because most data did not have a normal distribution and because the Likert scale data were considered ordinal.P-values less than 0.05 was considered statistically significant.Content analysis was utilized to examine the responses to open-ended questions about the weaknesses and strengths of SOE.Two authors coded the comments separately to identify any patterns or evidence of change over time in faculty members' descriptions.The authors then discussed and reached consensus on the themes.

Results
Among the 38 faculty members who were eligible for participation, 66% (n=25) completed the SOE survey.The majority of faculty participants (80%) reported administering SOE 1-5 times (as compared with >5 times) during 2015-2016.Twenty of the 25 participants (80%) had administered the TOE before launching the SOE.
Table 1 shows that, overall, 88% of faculty reported being "satisfied or very satisfied" with the SOE as compared to 85% with the TOE and 100% of faculty participants (n=25) indicated they would like to administer an SOE again.The level of objectivity reported as "objective or very objective" was 92% for SOE versus 60% for the TOE.When asked if SOE was a more objective way to assess students' clinical knowledge and skills, 88% said yes.Faculty rated the level of usefulness higher for SOE on all items except communication and professionalism, where TOE slightly outperformed SOE.
As shown in Table 2, the median scores for objectivity were significantly higher for SOE with a median score of 5 versus TOE with a median score of 4 (z=-3.15,p=0.002).The SOE also scored significantly higher than a TOE in assessing overall clerkship performance (z=-2.00,p=0.046).
We conducted secondary data analysis to investigate further whether a higher level of objectivity perceived by faculty would associate with a higher perceived level of SOE's usefulness.As shown in Table 3, faculty participants' perceived level of objectivity of the SOE strongly correlated with the SOE's perceived usefulness in assessing students' clinical knowledge (correlation coefficient 0.75, p=0.03) and knowledge application (correlation coefficient=0.77,p< 0.001).However, the faculty participants' perceived level of objectivity of the SOE only demonstrated a weak-to-moderate association with the usefulness of the other four SOE items (assessment of communication skills, clinical reasoning, professionalism, and overall clerkship performance).
The majority of faculty participants completed the two freetext questions about the weaknesses and strengths of SOE (96%, n=24).
Responses to the open-ended question "What do you think the strengths of the current oral examination are?" described specific strengths of an SOE, as well as a TOE (Table 4).Specific strengths of the SOE described by many faculty members included uniformity, fairness, and ease of use.Faculty participants reported the SOE's major weaknesses as inflexibility and decreased ability to assess students' higher-order reasoning skills by having to focus on standardized questions.Faculty participants noted strengths of the TOE including the ability to assess students' reasoning skills, as well as to demonstrate students' learning.When asked about the weaknesses of the current oral exam, faculty participants described the TOE as lacking standards on grading, having strong individual variations, and time pressure.

Discussion
Our findings suggest that, compared to TOE, the OB/GYN faculty perceived SOE was a more objective assessment tool to evaluate medical students' learning on the clerkship without the additional required time commitment.OB/GYN faculty examiners perceive the SOE as more objective and useful in assessing clerkship students' clinical knowledge, knowledge application, clinical reasoning, and overall clerkship performance than the TOE.[9] Standardization and comparability are critically important in medical schools which have multi-site clerkships, such as ours, to ensure compliance with The Liaison Committee on Medical Education (LCME) Standard 8.7, Comparability of Education/Assessment. 19 The introduction of the Association of American Medical Colleges (AAMC) Entrustable Professional Activities (EPAs) is another effort to standardize student assessments. 20The SOE not only satisfies one of The LCME Assessment Standards (Standard 9.6, Setting Standards of Achievement), but also can be utilized as a tool to assess EPAs (specifically EPA 2, prioritize a differential diagnosis following a clinical encounter, EPA 3, recommend and interpret common diagnostic and screening tests, EPA 6, provide an oral presentation of a clinical encounter and EPA 7, form clinical questions and retrieve evidence to advance patient care).
Results from our study illustrate that faculty members' perceived level of objectivity of the SOE strongly correlates with their perceived SOE's abilities of assessing students' clinical knowledge and knowledge application.However, the correlations between the level of objectivity and the SOE's ability to assess communication skills and clinical reasoning  were not statistically significant.One possible reason is that standardization of questions limits faculty examiners' flexibility to adjust questions based on the case scenario.As some faculty participants commented on the weaknesses of an SOE, "It's not always possible to stay with the 'standard' questions, as the clinical case does not always lend itself to that."Another possible reason is that standardization of questions restricts faculty examiners' style of asking questions.In our study, faculty participants perceived the TOE was slightly more useful in assessing students' communication skills than the SOE; some of them described the flexibility of tailoring the oral exam questions based on how students respond throughout the case presentation as valuable.This exemplifies the need to reinforce a certain level of flexibility into the SOE (e.g., instructions for faculty to modify standardized questions to reflect specifics of the cases) and guidelines to  "The flow of the oral exam as a "conversation" is made more challenging by the standardized questions --this is a minor weakness." The ability of assessing students' higher order reasoning skills by having to focus on standardized questions "Focusing on standardized questions left less opportunity to assess (students') higher-order clinical reasoning." *Faculty comments from pre-implementation survey; ¶ Faculty comments from post-implementation survey.
empower faculty examiners to assess students' clinical reasoning skills further when appropriate.Future study is needed to explore an optimal solution to fulfill this need.In addition, few faculty reported the TOE was extremely useful in assessing student's overall clerkship performance.This reflects the grading rubric at our institutions, where students' clinical performance over the 6-week rotation accounts for 70% of the clerkship grade in our institutions and this is the major determinant of overall performance.
Our study has several limitations.First, our sample size of the SOE survey is small.In addition, participants are from 3 teaching hospitals affiliated with a single medical school in the same geographic region; results may therefore not be generalizable to all OB/GYN clerkship.Second, the survey response rate was somewhat low (<70%), and the non-response bias might impact current results.Third, these results represent faculty examiners' self-reported opinions and do not measure clerkship students' actual competencies or academic outcomes.Future research would benefit from the incorporation of clerkship students' perspective and performance into faculty members' perspective of the SOE to further refine its design and implementation.Students' performance on the SOE relative to other clerkship performance metrics would also be interesting to study.

Conclusions
OB/GYN faculty examiners perceive the SOE as more objective and outperforming the TOE in assessing medical students' clinical knowledge, knowledge application, clinical reasoning, and overall clerkship performance.Programs in medical education are encouraged to introduce the standardized oral examination to their faculty and/or replace the traditional non-standardized oral examinations with the SOE to increase objectivity in assessing medical students' learning and performance.Future studies should include evaluation of this assessment tool in surgery and other medical fields that routinely administer oral examinations in undergraduate and graduate education.This finding can be applied internationally in the assessment of medical students' clinical competency and critical thinking skills.

Table 1 .
Faculty perspective of characteristics of the traditional oral examination (TOE) and the standardized oral examination (SOE) * There were missing data for TOE and SOE for the item asking about overall clerkship performance.

Table 2 .
Faculty perspective of the standardized oral examination (SOE) compared to the traditional oral examination (TOE) n=20 for paired data analysis.Ratings were given as median; 5 = very satisfied/ extremely useful, 1 = very dissatisfied/not at all useful.** standard normal distributed z value to test for significance of median Likert scale scores between TOE and SOE. *

Table 3 .
Correlation between the level of perceived objectivity and the characteristics of the standardized oral examination reported in the post-implementation SOE survey by the faculty (N=20)

Table 4 .
Themes of the strengths and weaknesses of the traditional oral examination and the standardized oral examination