Lang, Siegmund

; Vitale, Jacopo ; Galbusera, Fabio ; Fekete, Tamás ; Boissiere, Louis ; Charles, Yann Philippe ; Yucekul, Altug ; Yilgor, Caglar ; Núñez-Pereira, Susana ; Haddad, Sleiman ; Gomez-Rice, Alejandro ; Mehta, Jwalant ; Pizones, Javier ; Pellisé, Ferran ; Obeid, Ibrahim ; Alanay, Ahmet ; Kleinstück, Frank ; Loibl, Markus

Is the information provided by large language models valid in educating patients about adolescent idiopathic scoliosis? An evaluation of content, clarity, and empathy

Lang, Siegmund

, Vitale, Jacopo, Galbusera, Fabio, Fekete, Tamás, Boissiere, Louis, Charles, Yann Philippe, Yucekul, Altug, Yilgor, Caglar, Núñez-Pereira, Susana, Haddad, Sleiman, Gomez-Rice, Alejandro, Mehta, Jwalant, Pizones, Javier, Pellisé, Ferran, Obeid, Ibrahim, Alanay, Ahmet, Kleinstück, Frank und Loibl, Markus (2024) Is the information provided by large language models valid in educating patients about adolescent idiopathic scoliosis? An evaluation of content, clarity, and empathy. Spine Deformity.

Veröffentlichungsdatum dieses Volltextes: 12 Nov 2024 13:04
Artikel
DOI zum Zitieren dieses Dokuments: 10.5283/epub.59588

Veröffentlichte Version
Download ( PDF | 2MB)

Lizenz: Creative Commons Namensnennung 4.0 International

Zusammenfassung

Purpose
Large language models (LLM) have the potential to bridge knowledge gaps in patient education and enrich patient-surgeon interactions. This study evaluated three chatbots for delivering empathetic and precise adolescent idiopathic scoliosis (AIS) related information and management advice. Specifically, we assessed the accuracy, clarity, and relevance of the information provided, aiming to determine the effectiveness of LLMs in addressing common patient queries and enhancing their understanding of AIS.
Methods
We sourced 20 webpages for the top frequently asked questions (FAQs) about AIS and formulated 10 critical questions based on them. Three advanced LLMs—ChatGPT 3.5, ChatGPT 4.0, and Google Bard—were selected to answer these questions, with responses limited to 200 words. The LLMs’ responses were evaluated by a blinded group of experienced deformity surgeons (members of the European Spine Study Group) from seven European spine centers. A pre-established 4-level rating system from excellent to unsatisfactory was used with a further rating for clarity, comprehensiveness, and empathy on the 5-point Likert scale. If not rated 'excellent', the raters were asked to report the reasons for their decision for each question. Lastly, raters were asked for their opinion towards AI in healthcare in general in six questions.
Results
The responses among all LLMs were ‘excellent’ in 26% of responses, with ChatGPT-4.0 leading (39%), followed by Bard (17%). ChatGPT-4.0 was rated superior to Bard and ChatGPT 3.5 (p = 0.003). Discrepancies among raters were significant (p < 0.0001), questioning inter-rater reliability. No substantial differences were noted in answer distribution by question (p = 0.43). The answers on diagnosis (Q2) and causes (Q4) of AIS were top-rated. The most dissatisfaction was seen in the answers regarding definitions (Q1) and long-term results (Q7). Exhaustiveness, clarity, empathy, and length of the answers were positively rated (> 3.0 on 5.0) and did not demonstrate any differences among LLMs. However, GPT-3.5 struggled with language suitability and empathy, while Bard’s responses were overly detailed and less empathetic. Overall, raters found that 9% of answers were off-topic and 22% contained clear mistakes.
Conclusion
Our study offers crucial insights into the strengths and weaknesses of current LLMs in AIS patient and parent education, highlighting the promise of advancements like ChatGPT-4.o and Gemini alongside the need for continuous improvement in empathy, contextual understanding, and language appropriateness.

Alternative Links zum Volltext

Beteiligte Einrichtungen

Medizin > Lehrstuhl für Unfallchirurgie
Browse Publikationen

Details

Dokumentenart

Artikel

Titel eines Journals oder einer Zeitschrift

Spine Deformity

Verlag:

Springer

Datum

4 November 2024

Institutionen

Medizin > Lehrstuhl für Unfallchirurgie

Identifikationsnummer

Wert	Typ
10.1007/s43390-024-00955-3	DOI

Stichwörter / Keywords

Adolescent idiopathic scoliosis (AIS) · Large language models (LLMs) · Patient education · Spine surgery · Artificial intelligence (AI)

Dewey-Dezimal-Klassifikation

600 Technik, Medizin, angewandte Wissenschaften > 610 Medizin

Status

Veröffentlicht

Begutachtet

Ja, diese Version wurde begutachtet

An der Universität Regensburg entstanden

Zum Teil

URN der UB Regensburg

urn:nbn:de:bvb:355-epub-595886

Dokumenten-ID

59588

Bibliographische Daten exportieren

Nur für Besitzer und Autoren: Kontrollseite des Eintrags

Downloadstatistik

Altmetric

Alternative Statistik (altmetrics)

Weitere Literatur (mittels CORE)

nach oben