| Published Version Download ( PDF | 984kB) | License: Creative Commons Attribution Non-commercial No Derivatives 4.0 |
Artificial Intelligence Augmentation: Performance of GPT-4 and GPT-3.5 on the Plastic Surgery In-service Examination
Najafali, Daniel, Reiche, Erik, Araya, Sthefano, Orellana, Manuel, Liu, Farrah C., Camacho, Justin M., Patel, Sameer A., Broyles, Justin M., Dorafshar, Amir H., Morrison, Shane D., Knoedler, Leonard
and Fox, Paige M.
(2025)
Artificial Intelligence Augmentation: Performance of GPT-4 and GPT-3.5 on the Plastic Surgery In-service Examination.
Plastic and Reconstructive Surgery - Global Open 13 (4), e6645.
Date of publication of this fulltext: 24 Oct 2025 15:04
Article
DOI to cite this document: 10.5283/epub.78024
Abstract
Background: ChatGPT-3.5 scored in the 52nd percentile of the Plastic Surgery In-service Examination, making its knowledge equivalent to a first-year integrated resident. The updated GPT-4 may have improved performance given its more expansive training set. We hypothesized that GPT-4 would outperform its predecessor, making it a more valuable potential asset to surgical education. Methods: ...
Background: ChatGPT-3.5 scored in the 52nd percentile of the Plastic Surgery
In-service Examination, making its knowledge equivalent to a first-year integrated resident. The updated GPT-4 may have improved performance given its more expansive training set. We hypothesized that GPT-4 would outperform its predecessor, making it a more valuable potential asset to surgical education.
Methods: Questions from the 2022 Plastic Surgery In-service Examination were
given to GPT-4 and GPT-3.5. Both were prompted using 3 different structures. The 2022 American Society of Plastic Surgeons Norm Tables were used to compare the performance of the chatbot to national metrics from plastic surgery residents. Results: GPT-4 answered a total of 237 questions with an overall accuracy of 63% across all 3 strategies. The accuracy was as follows for the prompting schemes: 54% for open ended, 67% for multiple choice (MC), and 68% for MC with explanation. The section with the highest accuracy (74%) among all strategies was Section 4: Breast and Cosmetic. GPT-4’s highest scoring methodology (MC with explanation, 68%) placed it in the following national integrated percentiles: 93rd percentile for the first year, 76th percentile for the second year, 52nd percentile for the third year, 34th percentile for the fourth year, 17th percentile for the fifth year, and 15th percentile for the sixth year. GPT-3.5 scored 58% overall. Conclusions: GPT-4 outperformed its predecessor but only scored in the 15th percentile compared with postgraduate year-6 residents. More refinement is needed to achieve performance metrics equivalent to an attending plastic surgeon and become a valuable tool for surgical education.
Alternative links to fulltext
Involved Institutions
Details
| Item type | Article | ||||
| Journal or Publication Title | Plastic and Reconstructive Surgery - Global Open | ||||
| Publisher: | Wolters Kluwer | ||||
|---|---|---|---|---|---|
| Volume: | 13 | ||||
| Number of Issue or Book Chapter: | 4 | ||||
| Page Range: | e6645 | ||||
| Date | 25 April 2025 | ||||
| Institutions | Medicine > Zentren des Universitätsklinikums Regensburg > Zentrum für Plastische-, Hand- und Wiederherstellungschirurgie | ||||
| Identification Number |
| ||||
| Dewey Decimal Classification | 600 Technology > 610 Medical sciences Medicine | ||||
| Status | Published | ||||
| Refereed | Yes, this version has been refereed | ||||
| Created at the University of Regensburg | Yes | ||||
| URN of the UB Regensburg | urn:nbn:de:bvb:355-epub-780242 | ||||
| Item ID | 78024 |
Download Statistics
Download Statistics