Direkt zum Inhalt

Najafali, Daniel ; Reiche, Erik ; Araya, Sthefano ; Orellana, Manuel ; Liu, Farrah C. ; Camacho, Justin M. ; Patel, Sameer A. ; Broyles, Justin M. ; Dorafshar, Amir H. ; Morrison, Shane D. ; Knoedler, Leonard ; Fox, Paige M.

Artificial Intelligence Augmentation: Performance of GPT-4 and GPT-3.5 on the Plastic Surgery In-service Examination

Najafali, Daniel, Reiche, Erik, Araya, Sthefano, Orellana, Manuel, Liu, Farrah C., Camacho, Justin M., Patel, Sameer A., Broyles, Justin M., Dorafshar, Amir H., Morrison, Shane D., Knoedler, Leonard und Fox, Paige M. (2025) Artificial Intelligence Augmentation: Performance of GPT-4 and GPT-3.5 on the Plastic Surgery In-service Examination. Plastic and Reconstructive Surgery - Global Open 13 (4), e6645.

Veröffentlichungsdatum dieses Volltextes: 24 Okt 2025 15:04
Artikel
DOI zum Zitieren dieses Dokuments: 10.5283/epub.78024


Zusammenfassung

Background: ChatGPT-3.5 scored in the 52nd percentile of the Plastic Surgery In-service Examination, making its knowledge equivalent to a first-year integrated resident. The updated GPT-4 may have improved performance given its more expansive training set. We hypothesized that GPT-4 would outperform its predecessor, making it a more valuable potential asset to surgical education. Methods: ...

Background: ChatGPT-3.5 scored in the 52nd percentile of the Plastic Surgery
In-service Examination, making its knowledge equivalent to a first-year integrated resident. The updated GPT-4 may have improved performance given its more expansive training set. We hypothesized that GPT-4 would outperform its predecessor, making it a more valuable potential asset to surgical education.
Methods: Questions from the 2022 Plastic Surgery In-service Examination were
given to GPT-4 and GPT-3.5. Both were prompted using 3 different structures. The 2022 American Society of Plastic Surgeons Norm Tables were used to compare the performance of the chatbot to national metrics from plastic surgery residents. Results: GPT-4 answered a total of 237 questions with an overall accuracy of 63% across all 3 strategies. The accuracy was as follows for the prompting schemes: 54% for open ended, 67% for multiple choice (MC), and 68% for MC with explanation. The section with the highest accuracy (74%) among all strategies was Section 4: Breast and Cosmetic. GPT-4’s highest scoring methodology (MC with explanation, 68%) placed it in the following national integrated percentiles: 93rd percentile for the first year, 76th percentile for the second year, 52nd percentile for the third year, 34th percentile for the fourth year, 17th percentile for the fifth year, and 15th percentile for the sixth year. GPT-3.5 scored 58% overall. Conclusions: GPT-4 outperformed its predecessor but only scored in the 15th percentile compared with postgraduate year-6 residents. More refinement is needed to achieve performance metrics equivalent to an attending plastic surgeon and become a valuable tool for surgical education.



Beteiligte Einrichtungen


Details

DokumentenartArtikel
Titel eines Journals oder einer ZeitschriftPlastic and Reconstructive Surgery - Global Open
Verlag:Wolters Kluwer
Band:13
Nummer des Zeitschriftenheftes oder des Kapitels:4
Seitenbereich:e6645
Datum25 April 2025
InstitutionenMedizin > Zentren des Universitätsklinikums Regensburg > Zentrum für Plastische-, Hand- und Wiederherstellungschirurgie
Identifikationsnummer
WertTyp
10.1097/GOX.0000000000006645DOI
Dewey-Dezimal-Klassifikation600 Technik, Medizin, angewandte Wissenschaften > 610 Medizin
StatusVeröffentlicht
BegutachtetJa, diese Version wurde begutachtet
An der Universität Regensburg entstandenJa
URN der UB Regensburgurn:nbn:de:bvb:355-epub-780242
Dokumenten-ID78024

Bibliographische Daten exportieren

Nur für Besitzer und Autoren: Kontrollseite des Eintrags

nach oben