Nur für Besitzer und Autoren: Kontrollseite des Eintrags

Meyer, Selina ; Elsweiler, David

; Ludwig, Bernd

; Fernández-Pichel, Marcos ; Losada, David E.

Do We Still Need Human Assessors? Prompt-Based GPT-3 User Simulation in Conversational AI

Meyer, Selina, Elsweiler, David

, Ludwig, Bernd

, Fernández-Pichel, Marcos und Losada, David E. (2022) Do We Still Need Human Assessors? Prompt-Based GPT-3 User Simulation in Conversational AI. In: CUI 2022: 4th Conference on Conversational User Interfaces, July 26 - 28, 2022, Glasgow, United Kingdom.

Veröffentlichungsdatum dieses Volltextes: 15 Feb 2023 08:24
Konferenz- oder Workshop-Beitrag

Veröffentlichte Version
Download ( PDF | 557kB)

Lizenz: Creative Commons Namensnennung 4.0 International

Zusammenfassung

Scarcity of user data continues to be a problem in research on conversational user interfaces and often hinders or slows down technical innovation. In the past, different ways of synthetically generating data, such as data augmentation techniques have been explored. With the rise of ever improving pre-trained language models, we ask if we can go beyond such methods by simply providing appropriate prompts to these general purpose models to generate data. We explore the feasibility and cost-benefit trade-offs of using non fine-tuned synthetic data to train classification algorithms for conversational agents. We compare this synthetically generated data with real user data and evaluate the performance of classifiers trained on different combinations of synthetic and real data. We come to the conclusion that, although classifiers trained on such synthetic data perform much better than random baselines, they do not compare to the performance of classifiers trained on even very small amounts of real user data, largely because such data is lacking much of the variability found in user generated data. Nevertheless, we show that in situations where very little data and resources are available, classifiers trained on such synthetically generated data might be preferable to the collection and annotation of naturalistic data.

Alternative Links zum Volltext

Beteiligte Einrichtungen

Sprach- und Literatur- und Kulturwissenschaften > Institut für Information und Medien, Sprache und Kultur (I:IMSK) > Professur für Informationslinguistik (Prof. Dr. Bernd Ludwig) Informatik und Data Science > Fachbereich Menschzentrierte Informatik > Professur für Informationslinguistik (Prof. Dr. Bernd Ludwig)
Browse Publikationen

Details

Dokumentenart

Konferenz- oder Workshop-Beitrag (Paper)

ISBN

978-1-4503-9739-1

Buchtitel:

CUI '22: Proceedings of the 4th Conference on Conversational User Interfaces

Verlag:

Association for Computing Machinery

Open Access Art:

Assoc. of Comp. Machinery (ACM)

Ort der Veröffentlichung:

New York, United States

Nummer des Zeitschriftenheftes oder des Kapitels:

Seitenbereich:

S. 1-6

Datum

2022

Institutionen

Sprach- und Literatur- und Kulturwissenschaften > Institut für Information und Medien, Sprache und Kultur (I:IMSK) > Professur für Informationslinguistik (Prof. Dr. Bernd Ludwig)
Informatik und Data Science > Fachbereich Menschzentrierte Informatik > Professur für Informationslinguistik (Prof. Dr. Bernd Ludwig)

Identifikationsnummer

Wert	Typ
10.1145/3543829.3544529	DOI

Stichwörter / Keywords

datasets, nlp, text generation, conversational ai

Dewey-Dezimal-Klassifikation

000 Informatik, Informationswissenschaft, allgemeine Werke > 004 Informatik

Status

Veröffentlicht

Begutachtet

Ja, diese Version wurde begutachtet

An der Universität Regensburg entstanden

Zum Teil

URN der UB Regensburg

urn:nbn:de:bvb:355-epub-537688

Dokumenten-ID

53768

Bibliographische Daten exportieren

Nur für Besitzer und Autoren: Kontrollseite des Eintrags

Downloadstatistik

Altmetric

Alternative Statistik (altmetrics)

Weitere Literatur (mittels CORE)

nach oben