Lexicon-based Sentiment Analysis in German: Systematic Evaluation of Resources and Preprocessing Techniques

Fehle, Jakob, Schmidt, Thomas und Wolff, Christian

(2021) Lexicon-based Sentiment Analysis in German: Systematic Evaluation of Resources and Preprocessing Techniques. In: Proceedings of the 17th Conference on Natural Language Processing (KONVENS 2021), September 6-9, 2021, Düsseldorf, Germany.

Veröffentlichungsdatum dieses Volltextes: 19 Okt 2021 05:26
Konferenz- oder Workshop-Beitrag
DOI zum Zitieren dieses Dokuments: 10.5283/epub.50833

Vorschau

Veröffentlichte Version
Download ( PDF | 291kB)

Zusammenfassung

We present the results of an evaluation study in the context of lexicon-based sentiment analysis resources for German texts. We have set up a comprehensive compilation of 19 sentiment lexicon resources and 20 sentiment-annotated corpora available for German across multiple domains. In addition to the evaluation of the sentiment lexicons we also investigate the influence of the following preprocessing steps and modifiers: stemming and lemmatization, part-of-speech-tagging, usage of emoticons, stop words removal, usage of valence shifters, intensifiers, and diminishers. We report the best performing lexicons as well as the influence of preprocessing steps and other modifications on average performance across all corpora. We show that larger lexicons with continuous values like SentiWS and SentiMerge perform best across the domains. The best performing configuration of lexicon and modifications considering the f1-value and accuracy averages across all corpora achieves around 67%. Preprocessing, especially stemming or lemmatization increases the performance consistently on average around 6% and for certain lexicons and configurations up to 16.5% while methods like the usage of valence shifters, intensifiers or diminishers rarely influence overall performance. We discuss domain-specific differences and give recommendations for the selection of lexicons, preprocessing and modifications.

Alternative Links zum Volltext

Verlagexterner Link, öffnet neues Fenster

Beteiligte Einrichtungen

Sprach- und Literatur- und Kulturwissenschaften > Institut für Information und Medien, Sprache und Kultur (I:IMSK) > Lehrstuhl für Medieninformatik (Prof. Dr. Christian Wolff) Informatik und Data Science > Fachbereich Menschzentrierte Informatik > Lehrstuhl für Medieninformatik (Prof. Dr. Christian Wolff)
Browse Publikationen

Details

Dokumentenart

Konferenz- oder Workshop-Beitrag (Paper)

Verlag:

KONVENS 2021 Organizers

Ort der Veröffentlichung:

Düsseldorf, Germany

Seitenbereich:

S. 86-103

Datum

September 2021

Institutionen

Sprach- und Literatur- und Kulturwissenschaften > Institut für Information und Medien, Sprache und Kultur (I:IMSK) > Lehrstuhl für Medieninformatik (Prof. Dr. Christian Wolff)
Informatik und Data Science > Fachbereich Menschzentrierte Informatik > Lehrstuhl für Medieninformatik (Prof. Dr. Christian Wolff)

Verwandte URLs

URL	URL Typ
https://github.com/JakobFehle/Lexicon-based-SentA-German	Zusätzliches Material / Supplementary Material

Stichwörter / Keywords

Sentiment Analysis, German, Lexicon-based Sentiment Analysis, Corpus, Evaluation

Dewey-Dezimal-Klassifikation

000 Informatik, Informationswissenschaft, allgemeine Werke > 004 Informatik
000 Informatik, Informationswissenschaft, allgemeine Werke > 020 Bibliotheks- und Informationswissenschaft
400 Sprache > 400 Sprachwissenschaft, Linguistik

Status

Veröffentlicht

Begutachtet

Ja, diese Version wurde begutachtet

An der Universität Regensburg entstanden

URN der UB Regensburg

urn:nbn:de:bvb:355-epub-508339

Dokumenten-ID

50833

Bibliographische Daten exportieren

Nur für Besitzer und Autoren: Kontrollseite des Eintrags

Downloadstatistik

Weitere Literatur (mittels CORE)

nach oben