Faculty perspectives on the use of standardized versus non-standardized oral examinations to assess medical students

Natasha Johnson, Holly Khachadoorian-Elia, Celeste Royce, Carey York-Best, Katharyn Atkins, Xiaodong P. Chen and Andrea Pelletier

Department of Obstetrics, Gynecology and Reproductive Biology, Harvard Medical School, USA

Submitted: 18/07/2017; Accepted: 10/09/2018; Published: 29/09/2018

Int J Med Educ. 2018; 9:255-261; doi: 10.5116/ijme.5b96.17ca

© 2018 Natasha Johnson et al. This is an Open Access article distributed under the terms of the Creative Commons Attribution License which permits unrestricted use of work provided the original work is properly cited. http://creativecommons.org/licenses/by/3.0

Objectives: To determine if faculty perceive standardized oral examinations to be more objective and useful than the non-standardized format in assessing third-year medical students’ learning on the obstetrics and gynecology rotation.

Methods: Obstetrics and gynecology faculty at three teaching hospitals were sampled to complete a survey retrospectively comparing the standardized oral examination (SOE) and non-standardized or traditional oral examinations (TOE).  A Likert scale (0-5) was used to assess satisfaction, objectivity, and usefulness of SOE and TOE.  Wilcoxon signed rank test was performed to compare median Likert scale scores for each survey item. A Spearman’s correlation coefficient was used to investigate the relationship between the perceived level of objectivity and SOE characteristics. For qualitative measures, content analysis was applied.

Results: Sixty-six percent (n=25) of eligible faculty completed the survey. Faculty perceived the standardized oral examination as significantly more objective compared with the non-standardized (z=-3.15, p=0.002). Faculty also found SOE to be more useful in assessing overall clerkship performance (z=-2.0, p<0.05). All of the survey participants were willing to administer the standardized examination again.  Faculty reported strengths of the SOE to be uniformity, fairness, and ease of use. Major weaknesses reported included inflexibility and decreased ability to assess students’ higher order reasoning skills.

Conclusions: Faculty found standardized oral examinations to be more objective in assessing third-year medical students’ clinical competency when compared with a non-standardized approach.  This finding can be meaningfully applied to medical education programs internationally.

The oral examination is commonly used to assess clinical knowledge and skills in both undergraduate and postgraduate medical education.  Given its apparent face validity, it is thought to be an effective way of assessing clinical competencies, including knowledge, communication skills, and critical thinking.  Although the oral examination has a long history in the professional development of physicians, concerns and fundamental questions remain about its use, the content validity, and the inter-rater reliability.1-4 In an oral examination, the trainee interacts with the examiner and is assessed based on answers provided to the questions asked.  Unconscious biases may influence the trainee's scores during the examination.  Common criticisms of oral examinations are the inherent variability and inconsistency associated with its subjective nature.5-6 Standardization of the oral examination content and grading rubric has been statistically shown to improve the objectivity of the oral examination.7-9

The effect of oral examinations on medical trainees (medical students and residents) and their academic performance has been investigated in many medical specialties, including surgery and internal medicine, but little has been reported in obstetrics and gynecology (OB/GYN) undergraduate medical education.10-12 Zahn and colleagues reported that OB/GYN rotations commonly require medical students to take an oral examination.13 In the OB/GYN rotations at Brigham and Women’s Hospital, Massachusetts General Hospital, and Beth Israel Deaconess Medical Center, all affiliated with Harvard Medical School, we have used a traditional non-standardized oral examination (TOE) as one summative assessment contributing to a final rotation grade for over twenty years. With TOE, the faculty examiner asks unscripted, non-standardized questions based on the content of the trainee’s submitted patient case list and faculty examiner’s knowledge of medical student educational objectives for the OB/GYN clerkship.  Although the TOE format has been shown to evaluate clinical knowledge and application among OB/GYN medical students, The Accreditation Council on Graduate Medical Education (ACGME) has suggested the standardized oral examination (SOE) as one of the assessment tools for graduate medical education.14-15 From OB/GYN students’ perspective, the SOE has been reported to be an effective alternative method to assess students’ clinical reasoning and contributes to students’ preparation for the written examination.16

Our study aimed to assess faculty perception of objectivity of our oral examination given in a standardized fashion versus the traditional non-standardized approach.  We replaced the TOE with a new SOE and implemented it in a pilot study in 2015. We investigated faculty’s satisfaction, acceptance of and perceived usefulness of using an SOE to assess third-year medical students’ learning and clinical competency during the OB/GYN rotation.

Educational contents and setting

An SOE was pilot-implemented within the OB/GYN rotations at Brigham and Women’s Hospital, Massachusetts General Hospital, and Beth Israel Deaconess Medical Center in 2015. These three institutions are teaching hospitals affiliated with Harvard Medical School. OB/GYN rotations are six weeks in length for third-year Harvard Medical School students. The goals of an SOE are to assess the ability of each student to understand and discuss the pathophysiology, differential diagnosis, diagnostic evaluation and treatment of patient cases, as well as to demonstrate the student’s presentation and clinical reasoning skills. Students are asked to prepare four cases encountered during their clerkship and complete a structured case list. Students select one case from each of three categories (benign gynecology, gynecologic subspecialties, and obstetrics) a fourth ambulatory case from any of the three categories. The categories were determined by the OB/GYN medical school leadership and correspond to the Association of Professors of Gynecology and Obstetrics (APGO) 10th Edition Medical Student Objectives. Each student has two 20-minute oral exams with trained faculty examiners.

Each faculty member asks the student to briefly present the patient (2-3 minutes) followed by a 7-8 minute question and answer period.  Two cases are covered in each oral exam respectively.  We also developed a new SOE toolbox for faculty examiners based on our previous study which included  the  creation of   standardized  questions  based on APGO teaching cases. The standardized questions consist of basic questions that students are expected to answer satisfactorily in order to pass the oral examination and also more advanced questions that examiners can select depending on students’ performance on basic questions.17 Each examiner fills out an evaluation form at the conclusion of the exam. A faculty development resource, which consists of a slide presentation introducing the SOE format, directions for asking standardized questions based on topic selection (including information to modify standard questions to reflect the specifics of the case), the SOE grading instrument, how to differentiate student performance utilizing sample questions/answers, and videos of oral examinations using the new SOE format, was provided to all faculty members.

Study design and data collection

Our study was a survey-based quantitative and qualitative study. In 2016, thirty-eight OB/GYN faculty who administered at least one SOE during 2015-2016, were sampled and invited to complete an SOE survey one year after its pilot-implementation. We developed the survey questionnaires based on current best practices in survey design and our study objective.17-18 Expert validation was applied to ensure the face and content validity: three content experts who had concrete knowledge and experience in OB/GYN clerkship teaching assessed the survey items’ clarity and relevance to ensure the construct. We then conducted cognitive pretesting with one faculty to ensure participants would interpret the survey items in the manner that we intended. The research team discussed until reaching final consensus and then finalized the survey questionnaire. The final survey instrument included six items measuring the faculty members’ satisfaction and perceived usefulness of SOE using a 5-point-Likert scale, as well as two free-text questions inquiring about the strengths and weaknesses of SOE (Appendix 1). Faculty were asked to rate the perceived objectivity, satisfaction (required time commitment, and willingness of participation), and usefulness in assessing students’ competencies and clerkship performance using SOE, as well as provide retrospective ratings for the TOE.  Two open-ended questions were also asked to generate additional free-text commentary on the strengths and weaknesses of the oral examinations. Faculty participation was voluntary. Anonymous survey responses were collected through the online survey tool. The study was approved by the institutional review board (IRB) at all three participating institutions.

Statistical analysis

All analyses were conducted using STATA version 15.1. For the 5-point-Likert scale survey items, we reported proportions. We used the Wilcoxon signed rank test to look at the differences in median Likert Scale score between the SOE and the TOE for each survey question. Median values, z scores, and p values were reported for the Wilcoxon test. Multivariate analysis was used to examine the relationship between the level of objectivity and the other six variables of the SOE (assessment of communication skills, clinical knowledge, knowledge application, clinical reasoning, professionalism, and overall clerkship performance). Based on these results, we reported correlation coefficients and p values using Spearman’s correlation coefficient. The non-parametric Spearman’s correlation and the Wilcoxon signed rank test was used because most data did not have a normal distribution and because the Likert scale data were considered ordinal. P-values less than 0.05 was considered statistically significant. Content analysis was utilized to examine the responses to open-ended questions about the weaknesses and strengths of SOE. Two authors coded the comments separately to identify any patterns or evidence of change over time in faculty members’ descriptions. The authors then discussed and reached consensus on the themes.

Among the 38 faculty members who were eligible for participation, 66% (n=25) completed the SOE survey. The majority of faculty participants (80%) reported administering SOE 1-5 times (as compared with >5 times) during 2015-2016. Twenty of the 25 participants (80%) had administered the TOE before launching the SOE.

Table 1 shows that, overall, 88% of faculty reported being “satisfied or very satisfied” with the SOE as compared to 85% with the TOE and 100% of faculty participants (n=25) indicated they would like to administer an SOE again. The level of objectivity reported as “objective or very objective” was 92% for SOE versus 60% for the TOE. When asked if SOE was a more objective way to assess students’ clinical knowledge and skills, 88% said yes. Faculty rated the level of usefulness higher for SOE on all items except communication and professionalism, where TOE slightly outperformed SOE.

As shown in Table 2, the median scores for objectivity were significantly higher for SOE with a median score of 5 versus TOE with a median score of 4 (z=-3.15, p=0.002). The SOE also scored significantly higher than a TOE in assessing overall clerkship performance (z=-2.00, p=0.046).

We conducted secondary data analysis to investigate further whether a higher level of objectivity perceived by faculty would associate with a higher perceived level of SOE’s usefulness. As shown in Table 3, faculty participants’ perceived level of objectivity of the SOE strongly correlated with the SOE’s perceived usefulness in assessing students’ clinical knowledge (correlation coefficient 0.75, p=0.03) and knowledge application (correlation coefficient=0.77, p< 0.001). However, the faculty participants’ perceived level of objectivity of the SOE only demonstrated a weak-to-moderate association with the usefulness of the other four SOE items (assessment of communication skills, clinical reasoning, professionalism, and overall clerkship performance).

The majority of faculty participants completed the two free-text questions about the weaknesses and strengths of SOE (96%, n=24). Responses to the open-ended question “What do you think the strengths of the current oral examination are?” described specific strengths of an SOE, as well as a TOE (Table 4). Specific strengths of the SOE described by many faculty members included uniformity, fairness, and ease of use. Faculty participants reported the SOE’s major weaknesses as inflexibility and decreased ability to assess students’ higher-order reasoning skills by having to focus on standardized questions. Faculty participants noted strengths of the TOE including the ability to assess students’ reasoning skills, as well as to demonstrate students’ learning.  When asked about the weaknesses of the current oral exam, faculty participants described the TOE as lacking standards on grading, having strong individual variations, and time pressure.

Our findings suggest that, compared to TOE, the OB/GYN faculty perceived SOE was a more objective assessment tool to evaluate medical students’ learning on the clerkship without the additional required time commitment. OB/GYN faculty examiners perceive the SOE as more objective and useful in assessing clerkship students’ clinical knowledge, knowledge application, clinical reasoning, and overall clerkship performance than the TOE. Results of perceived improvement in SOE’s level of objectivity is consistent with those reported by Crisostomo and others that standardization of the oral exam content and grading rubric could improve the subjective nature of oral examinations.7-9

Standardization and comparability are critically important in medical schools which have multi-site clerkships, such as ours, to ensure compliance with The Liaison Committee on Medical Education (LCME) Standard 8.7, Comparability of Education/Assessment.19 The introduction of the Association of American Medical Colleges (AAMC) Entrustable Professional Activities (EPAs) is another effort to standardize student assessments.20 The SOE not only satisfies one of The LCME Assessment Standards (Standard 9.6, Setting Standards of Achievement), but also can be utilized as a tool to assess EPAs (specifically EPA 2, prioritize a differential diagnosis following a clinical encounter, EPA 3, recommend and interpret common diagnostic and screening tests, EPA 6, provide an oral presentation of a clinical encounter and EPA 7, form clinical questions and retrieve evidence to advance patient care).

Results from our study illustrate that faculty members’ perceived level of objectivity of the SOE strongly correlates with their perceived SOE’s abilities of assessing students’ clinical knowledge and knowledge application. However, the correlations between the level of objectivity and the SOE’s ability to assess communication skills and clinical reasoning were not statistically significant.

Table 1. Faculty perspective of characteristics of the traditional oral examination (TOE) and the standardized oral examination (SOE)
Table 2. Faculty perspective of the standardized oral examination (SOE) compared to the traditional oral examination (TOE)

One possible reason is that standardization of questions limits faculty examiners’ flexibility to adjust questions based on the case scenario. As some faculty participants commented on the weaknesses of an SOE, “It's not always possible to stay with the ‘standard’ questions, as the clinical case does not always lend itself to that.” Another possible reason is that standardization of questions restricts faculty examiners’ style of asking questions. In our study, faculty participants perceived the TOE was slightly more useful in assessing students’ communication skills than the SOE; some of them described the flexibility of tailoring the oral exam questions based on how students respond throughout the case presentation as valuable. 

Table 3. Correlation between the level of perceived objectivity and the characteristics of the standardized oral examination reported in the post-implementation SOE survey by the faculty (N=20)
Table 4. Themes of the strengths and weaknesses of the traditional oral examination and the standardized oral examination

This exemplifies the need to reinforce a certain level of flexibility into the SOE (e.g., instructions for faculty to modify standardized questions to reflect specifics of the cases) and guidelines to empower faculty examiners to assess students’ clinical reasoning skills further when appropriate. Future study is needed to explore an optimal solution to fulfill this need. In addition, few faculty reported the TOE was extremely useful in assessing student’s overall clerkship performance. This reflects the grading rubric at our institutions, where students’ clinical performance over the 6-week rotation accounts for 70% of the clerkship grade in our institutions and this is the major determinant of overall performance.

Our study has several limitations. First, our sample size of the SOE survey is small. In addition, participants are from 3 teaching hospitals affiliated with a single medical school in the same geographic region; results may therefore not be generalizable to all OB/GYN clerkship. Second, the survey response rate was somewhat low (<70%), and the non-response bias might impact current results. Third, these results represent faculty examiners’ self-reported opinions and do not measure clerkship students’ actual competencies or academic outcomes. Future research would benefit from the incorporation of clerkship students’ perspective and performance into faculty members’ perspective of the SOE to further refine its design and implementation. Students’ performance on the SOE relative to other clerkship performance metrics would also be interesting to study.

OB/GYN faculty examiners perceive the SOE as more objective and outperforming the TOE in assessing medical students’ clinical knowledge, knowledge application, clinical reasoning, and overall clerkship performance. Programs in medical education are encouraged to introduce the standardized oral examination to their faculty and/or replace the traditional non-standardized oral examinations with the SOE to increase objectivity in assessing medical students’ learning and performance.  Future studies should include evaluation of this assessment tool in surgery and other medical fields that routinely administer oral examinations in undergraduate and graduate education.  This finding can be applied internationally in the assessment of medical students’ clinical competency and critical thinking skills.

Conflict of Interest

The authors declare that they have no conflict of interest.

  1. Roberts C, Sarangi S, Southgate L, Wakeford R and Wass V. Oral examinations-equal opportunities, ethnicity, and fairness in the MRCGP. BMJ. 2000; 320: 370-375.
  2. Morley S, Snaith P. Principles of psychological assessment. In: Freeman C, Tyrer P, editors. Research methods in psychiatry: a beginner's guide. Gaskell, London: Royal College of Psychiatrists; 1989.
  3. Burchard KW, Rowland-Morin PA, Coe NP, Garb JL. A surgery oral examination: interrater agreement and the influence of rater characteristics. Acad Med. 1995;70(11):1044-6.
  4. Jacobsohn E, Klock PA, Avidan M. Poor inter-rater reliability on mock anesthesia oral examinations. Can J Anaesth. 2006;53(7):659-68.
  5. EF R. The oral examination as an educational assessment procedure. Evaluating the skills of medical specialties. Chicago: American Board of Medical Specialties; 1983.
  6. Craig LB, Smith C, Crow SM, Driver W, Wallace M and Thompson BM. Obstetrics and gynecology clerkship for males and females: similar curriculum, different outcomes? Med Educ Online. 2013; 18: 21506.
    Full Text PubMed
  7. Crisostomo AC. The effect of standardization on the reliability of the Philippine Board of Surgery oral examinations. J Surg Educ. 2011; 68: 138-142.
    Full Text PubMed
  8. Daelmans HE, Scherpbier AJ, Van Der Vleuten CP and Donker AJ. Reliability of clinical oral examinations re-examined. Med Teach. 2001; 23: 422-424.
    Full Text PubMed
  9. EF R. The oral examination as an educational assessment procedure. Evaluating the skills of medical specialties. Chicago: American Board of Medical Specialties; 1983.
  10. Awad SS, Liscum KR, Aoki N, Awad SH and Berger DH. Does the subjective evaluation of medical student surgical knowledge correlate with written and oral exam performance? J Surg Res. 2002; 104: 36-39.
    Full Text PubMed
  11. Cifu A and Altkorn D. Designing and implementing standardized oral examinations in internal medicine clerkships. MedEdPORTAL Publications. 2011; 7: 9034.
    Full Text
  12. Mount CA, Short PA, Mount GR and Schofield CM. An end-of-year oral examination for internal medicine residents: an assessment tool for the clinical competency committee. J Grad Med Educ. 2014; 6: 551-554.
    Full Text PubMed
  13. Zahn CM, Nalesnik SW, Armstrong AY, Satin AJ and Haffner WH. Variation in medical student grading criteria: a survey of clerkships in obstetrics and gynecology. Am J Obstet Gynecol. 2004; 190: 1388-1393.
    Full Text PubMed
  14. Nahum GG. Evaluating medical student obstetrics and gynecology clerkship performance: which assessment tools are most reliable? Am J Obstet Gynecol. 2004; 191: 1762-1771.
    Full Text PubMed
  15. American Council on Graduate Medical Education. Toolbox of assess-ment methods. Available from: http://njms.rutgers.edu/culweb/medical/documents/ToolboxofAssessmentMethods.pdf
  16. Musindi W, Way D, Strafford K, Keder L, Katz N. Evaluation of a standardized Oral examination for assessment in an obstetrics and gynecolo-gy clerkship. Available from: https://www.apgo.org/2014/ASL16_Evaluation.pdf.
  17. Atkins K, Johnson N, Khachadoorian-Elia H, Mackenzie M, Wosu U, York-Best C and Ricciotti HA. A standardized oral examination for obstetrics and gynecology clerkships. MedEdPORTAL Publications. 2016; 12: 10393.
    Full Text
  18. Gehlbach H, Artino AR and Durning S. AM last page: survey development guidance for medical education researchers. Acad Med. 2010; 85: 925.
    Full Text PubMed
  19. Liaison Committee on Medical Education (LCME) Standards: structure and function of a medical. Available from: http://lcme.org/publications/.
  20. AAMC Core Entrustable Professional Activities for Entering Residency: curriculum developer's guide 2014. Available from: https://members.aamc.org/eweb/upload/core%20EPA%20Curric lum%20Dev%20Guide.pdf.