| Veröffentlichte Version Download ( PDF | 984kB) | Lizenz: Creative Commons Namensnennung-NichtKommerziell-KeineBearbeitung 4.0 International |
Artificial Intelligence Augmentation: Performance of GPT-4 and GPT-3.5 on the Plastic Surgery In-service Examination
Najafali, Daniel, Reiche, Erik, Araya, Sthefano, Orellana, Manuel, Liu, Farrah C., Camacho, Justin M., Patel, Sameer A., Broyles, Justin M., Dorafshar, Amir H., Morrison, Shane D., Knoedler, Leonard
und Fox, Paige M.
(2025)
Artificial Intelligence Augmentation: Performance of GPT-4 and GPT-3.5 on the Plastic Surgery In-service Examination.
Plastic and Reconstructive Surgery - Global Open 13 (4), e6645.
Veröffentlichungsdatum dieses Volltextes: 24 Okt 2025 15:04
Artikel
DOI zum Zitieren dieses Dokuments: 10.5283/epub.78024
Zusammenfassung
Background: ChatGPT-3.5 scored in the 52nd percentile of the Plastic Surgery In-service Examination, making its knowledge equivalent to a first-year integrated resident. The updated GPT-4 may have improved performance given its more expansive training set. We hypothesized that GPT-4 would outperform its predecessor, making it a more valuable potential asset to surgical education. Methods: ...
Background: ChatGPT-3.5 scored in the 52nd percentile of the Plastic Surgery
In-service Examination, making its knowledge equivalent to a first-year integrated resident. The updated GPT-4 may have improved performance given its more expansive training set. We hypothesized that GPT-4 would outperform its predecessor, making it a more valuable potential asset to surgical education.
Methods: Questions from the 2022 Plastic Surgery In-service Examination were
given to GPT-4 and GPT-3.5. Both were prompted using 3 different structures. The 2022 American Society of Plastic Surgeons Norm Tables were used to compare the performance of the chatbot to national metrics from plastic surgery residents. Results: GPT-4 answered a total of 237 questions with an overall accuracy of 63% across all 3 strategies. The accuracy was as follows for the prompting schemes: 54% for open ended, 67% for multiple choice (MC), and 68% for MC with explanation. The section with the highest accuracy (74%) among all strategies was Section 4: Breast and Cosmetic. GPT-4’s highest scoring methodology (MC with explanation, 68%) placed it in the following national integrated percentiles: 93rd percentile for the first year, 76th percentile for the second year, 52nd percentile for the third year, 34th percentile for the fourth year, 17th percentile for the fifth year, and 15th percentile for the sixth year. GPT-3.5 scored 58% overall. Conclusions: GPT-4 outperformed its predecessor but only scored in the 15th percentile compared with postgraduate year-6 residents. More refinement is needed to achieve performance metrics equivalent to an attending plastic surgeon and become a valuable tool for surgical education.
Alternative Links zum Volltext
Beteiligte Einrichtungen
Details
| Dokumentenart | Artikel | ||||
| Titel eines Journals oder einer Zeitschrift | Plastic and Reconstructive Surgery - Global Open | ||||
| Verlag: | Wolters Kluwer | ||||
|---|---|---|---|---|---|
| Band: | 13 | ||||
| Nummer des Zeitschriftenheftes oder des Kapitels: | 4 | ||||
| Seitenbereich: | e6645 | ||||
| Datum | 25 April 2025 | ||||
| Institutionen | Medizin > Zentren des Universitätsklinikums Regensburg > Zentrum für Plastische-, Hand- und Wiederherstellungschirurgie | ||||
| Identifikationsnummer |
| ||||
| Dewey-Dezimal-Klassifikation | 600 Technik, Medizin, angewandte Wissenschaften > 610 Medizin | ||||
| Status | Veröffentlicht | ||||
| Begutachtet | Ja, diese Version wurde begutachtet | ||||
| An der Universität Regensburg entstanden | Ja | ||||
| URN der UB Regensburg | urn:nbn:de:bvb:355-epub-780242 | ||||
| Dokumenten-ID | 78024 |
Downloadstatistik
Downloadstatistik