Direkt zum Inhalt

Meyer, Selina ; Elsweiler, David ; Ludwig, Bernd ; Fernández-Pichel, Marcos ; Losada, David E.

Do We Still Need Human Assessors? Prompt-Based GPT-3 User Simulation in Conversational AI

Meyer, Selina, Elsweiler, David , Ludwig, Bernd , Fernández-Pichel, Marcos und Losada, David E. (2022) Do We Still Need Human Assessors? Prompt-Based GPT-3 User Simulation in Conversational AI. In: CUI 2022: 4th Conference on Conversational User Interfaces, July 26 - 28, 2022, Glasgow, United Kingdom.

Veröffentlichungsdatum dieses Volltextes: 15 Feb 2023 08:24
Konferenz- oder Workshop-Beitrag


Zusammenfassung

Scarcity of user data continues to be a problem in research on conversational user interfaces and often hinders or slows down technical innovation. In the past, different ways of synthetically generating data, such as data augmentation techniques have been explored. With the rise of ever improving pre-trained language models, we ask if we can go beyond such methods by simply providing appropriate ...

Scarcity of user data continues to be a problem in research on conversational user interfaces and often hinders or slows down technical innovation. In the past, different ways of synthetically generating data, such as data augmentation techniques have been explored. With the rise of ever improving pre-trained language models, we ask if we can go beyond such methods by simply providing appropriate prompts to these general purpose models to generate data. We explore the feasibility and cost-benefit trade-offs of using non fine-tuned synthetic data to train classification algorithms for conversational agents. We compare this synthetically generated data with real user data and evaluate the performance of classifiers trained on different combinations of synthetic and real data. We come to the conclusion that, although classifiers trained on such synthetic data perform much better than random baselines, they do not compare to the performance of classifiers trained on even very small amounts of real user data, largely because such data is lacking much of the variability found in user generated data. Nevertheless, we show that in situations where very little data and resources are available, classifiers trained on such synthetically generated data might be preferable to the collection and annotation of naturalistic data.



Beteiligte Einrichtungen


Details

DokumentenartKonferenz- oder Workshop-Beitrag (Paper)
ISBN978-1-4503-9739-1
Buchtitel:CUI '22: Proceedings of the 4th Conference on Conversational User Interfaces
Verlag:Association for Computing Machinery
Ort der Veröffentlichung:New York, United States
Nummer des Zeitschriftenheftes oder des Kapitels:8
Seitenbereich:S. 1-6
Datum2022
InstitutionenSprach- und Literatur- und Kulturwissenschaften > Institut für Information und Medien, Sprache und Kultur (I:IMSK) > Professur für Informationslinguistik (Prof. Dr. Bernd Ludwig)
Informatik und Data Science > Fachbereich Menschzentrierte Informatik > Professur für Informationslinguistik (Prof. Dr. Bernd Ludwig)
Identifikationsnummer
WertTyp
10.1145/3543829.3544529DOI
Stichwörter / Keywordsdatasets, nlp, text generation, conversational ai
Dewey-Dezimal-Klassifikation000 Informatik, Informationswissenschaft, allgemeine Werke > 004 Informatik
StatusVeröffentlicht
BegutachtetJa, diese Version wurde begutachtet
An der Universität Regensburg entstandenZum Teil
URN der UB Regensburgurn:nbn:de:bvb:355-epub-537688
Dokumenten-ID53768

Bibliographische Daten exportieren

Nur für Besitzer und Autoren: Kontrollseite des Eintrags

nach oben