Application and efficacy of artificial intelligence in patient education on spinal cord injuries

URN zum Zitieren dieses Dokuments:: urn:nbn:de:bvb:355-epub-788460
DOI zum Zitieren dieses Dokuments:: 10.5283/epub.78846

Krueckel, Jonas ; Ardelt, Melanie ; Schiffelholz, David ; Straub, Josina

; Siller, Sebastian

; Hubertus, Vanessa ; Häckel, Sonja ; Bratelj, Denis ; Wutte, Christof ; Arias, Helena ; Hilber, Franz ; Alt, Volker

; Lang, Siegmund

Lizenz: Creative Commons Namensnennung 4.0 International
PDF - Veröffentlichte Version
(1MB)

Veröffentlichungsdatum dieses Volltextes: 03 Mrz 2026 07:29

Alternative Links zum Volltext:DOI Verlag

Diese Publikation ist Teil des DEAL-Vertrags mit Springer.

Details

Vorschau

Bibliographische Daten exportieren

Zusammenfassung

Zusammenfassung

Introduction/background:
Spinal cord injuries (SCI) present complex challenges for patients, who increasingly turn to online resources for supplementary information. Large language models (LLMs) like ChatGPT and Google Gemini have emerged as potential tools for patient education. However, concerns about the accuracy, clarity, and comprehensiveness of their responses remain, particularly in specialized fields such as SCI. This study aimed to evaluate the performance of ChatGPT 4, ChatGPT 3.5, and Google Gemini in addressing common patient questions about SCI.

Material and methods:
A systematic process was used to identify 10 key patient questions related to SCI from online sources, PubMed, and Google Trends. These questions were submitted to ChatGPT 4, ChatGPT 3.5, and Google Gemini using a standardized prompt and a 150-word response cap to elicit expert-like responses. Eight blinded spine surgeons evaluated the chatbot-generated answers for quality, clarity, empathy, and comprehensiveness using a validated rating system. Responses were categorized as “excellent,” “satisfactory with minimal clarification,” “satisfactory with moderate clarification,” or “unsatisfactory.”

Results:
Across all three models, the majority of responses were rated as either excellent or requiring only minimal clarification. ChatGPT 4 achieved the highest proportion of high-quality responses, with up to almost 90% rated as “excellent” or “minimal clarification required.” ChatGPT 3.5 and Google Gemini performed similarly, with slightly lower percentages of high-quality responses. No statistically significant differences were observed between the models in overall performance.

Conclusion:
In a standardized single turn, 150-word setting, publicly available LLMs produced largely satisfactory answers to common SCI questions with comparable performance across models. LLMs can be recommended as adjuncts for general patient education, while their outputs should be reviewed within clinical care. Further studies should test multi turn interactions, include patient and multidisciplinary evaluators, compare chatbot responses with clinician authored answers and evaluate the performance of domain specific medical LLMs.

Nur für Besitzer und Autoren: Kontrollseite des Eintrags

Downloadstatistik

Altmetric

Alternative Statistik (altmetrics)

Weitere Literatur (mittels CORE)

Details

Vorschau

Bibliographische Daten exportieren

Zusammenfassung

Zusammenfassung

Downloadstatistik

Downloads

Alternative Statistik (altmetrics)

Weitere Literatur (mittels CORE)

Universitätsbibliothek