Nur für Besitzer und Autoren: Kontrollseite des Eintrags

Najafali, Daniel ; Reiche, Erik ; Araya, Sthefano ; Orellana, Manuel ; Liu, Farrah C. ; Camacho, Justin M. ; Patel, Sameer A. ; Broyles, Justin M. ; Dorafshar, Amir H. ; Morrison, Shane D. ; Knoedler, Leonard

; Fox, Paige M.

Artificial Intelligence Augmentation: Performance of GPT-4 and GPT-3.5 on the Plastic Surgery In-service Examination

Najafali, Daniel, Reiche, Erik, Araya, Sthefano, Orellana, Manuel, Liu, Farrah C., Camacho, Justin M., Patel, Sameer A., Broyles, Justin M., Dorafshar, Amir H., Morrison, Shane D., Knoedler, Leonard

und Fox, Paige M. (2025) Artificial Intelligence Augmentation: Performance of GPT-4 and GPT-3.5 on the Plastic Surgery In-service Examination. Plastic and Reconstructive Surgery - Global Open 13 (4), e6645.

Veröffentlichungsdatum dieses Volltextes: 24 Okt 2025 15:04
Artikel
DOI zum Zitieren dieses Dokuments: 10.5283/epub.78024

Veröffentlichte Version
Download ( PDF | 984kB)

Lizenz: Creative Commons Namensnennung-NichtKommerziell-KeineBearbeitung 4.0 International

Zusammenfassung

Background: ChatGPT-3.5 scored in the 52nd percentile of the Plastic Surgery
In-service Examination, making its knowledge equivalent to a first-year integrated resident. The updated GPT-4 may have improved performance given its more expansive training set. We hypothesized that GPT-4 would outperform its predecessor, making it a more valuable potential asset to surgical education.
Methods: Questions from the 2022 Plastic Surgery In-service Examination were
given to GPT-4 and GPT-3.5. Both were prompted using 3 different structures. The 2022 American Society of Plastic Surgeons Norm Tables were used to compare the performance of the chatbot to national metrics from plastic surgery residents. Results: GPT-4 answered a total of 237 questions with an overall accuracy of 63% across all 3 strategies. The accuracy was as follows for the prompting schemes: 54% for open ended, 67% for multiple choice (MC), and 68% for MC with explanation. The section with the highest accuracy (74%) among all strategies was Section 4: Breast and Cosmetic. GPT-4’s highest scoring methodology (MC with explanation, 68%) placed it in the following national integrated percentiles: 93rd percentile for the first year, 76th percentile for the second year, 52nd percentile for the third year, 34th percentile for the fourth year, 17th percentile for the fifth year, and 15th percentile for the sixth year. GPT-3.5 scored 58% overall. Conclusions: GPT-4 outperformed its predecessor but only scored in the 15th percentile compared with postgraduate year-6 residents. More refinement is needed to achieve performance metrics equivalent to an attending plastic surgeon and become a valuable tool for surgical education.

Alternative Links zum Volltext

Beteiligte Einrichtungen

Medizin > Zentren des Universitätsklinikums Regensburg > Zentrum für Plastische-, Hand- und Wiederherstellungschirurgie
Browse Publikationen

Details

Dokumentenart

Artikel

Titel eines Journals oder einer Zeitschrift

Plastic and Reconstructive Surgery - Global Open

Verlag:

Wolters Kluwer

Open Access Art:

Gold (mit APC - bezahlt UR)

Band:

Nummer des Zeitschriftenheftes oder des Kapitels:

Seitenbereich:

e6645

Datum

25 April 2025

Institutionen

Medizin > Zentren des Universitätsklinikums Regensburg > Zentrum für Plastische-, Hand- und Wiederherstellungschirurgie

Identifikationsnummer

Wert	Typ
10.1097/GOX.0000000000006645	DOI

Dewey-Dezimal-Klassifikation

600 Technik, Medizin, angewandte Wissenschaften > 610 Medizin

Status

Veröffentlicht

Begutachtet

Ja, diese Version wurde begutachtet

An der Universität Regensburg entstanden

URN der UB Regensburg

urn:nbn:de:bvb:355-epub-780242

Dokumenten-ID

78024

Bibliographische Daten exportieren

Nur für Besitzer und Autoren: Kontrollseite des Eintrags

Downloadstatistik

Altmetric

Alternative Statistik (altmetrics)

Weitere Literatur (mittels CORE)

nach oben