Direkt zum Inhalt

Fehle, Jakob ; Schmidt, Thomas ; Wolff, Christian

Lexicon-based Sentiment Analysis in German: Systematic Evaluation of Resources and Preprocessing Techniques

Fehle, Jakob, Schmidt, Thomas and Wolff, Christian (2021) Lexicon-based Sentiment Analysis in German: Systematic Evaluation of Resources and Preprocessing Techniques. In: Proceedings of the 17th Conference on Natural Language Processing (KONVENS 2021), September 6-9, 2021, Düsseldorf, Germany.

Date of publication of this fulltext: 19 Oct 2021 05:26
Conference or workshop item
DOI to cite this document: 10.5283/epub.50833


Abstract

We present the results of an evaluation study in the context of lexicon-based sentiment analysis resources for German texts. We have set up a comprehensive compilation of 19 sentiment lexicon resources and 20 sentiment-annotated corpora available for German across multiple domains. In addition to the evaluation of the sentiment lexicons we also investigate the influence of the following ...

We present the results of an evaluation study in the context of lexicon-based sentiment analysis resources for German texts. We have set up a comprehensive compilation of 19 sentiment lexicon resources and 20 sentiment-annotated corpora available for German across multiple domains. In addition to the evaluation of the sentiment lexicons we also investigate the influence of the following preprocessing steps and modifiers: stemming and lemmatization, part-of-speech-tagging, usage of emoticons, stop words removal, usage of valence shifters, intensifiers, and diminishers. We report the best performing lexicons as well as the influence of preprocessing steps and other modifications on average performance across all corpora. We show that larger lexicons with continuous values like SentiWS and SentiMerge perform best across the domains. The best performing configuration of lexicon and modifications considering the f1-value and accuracy averages across all corpora achieves around 67%. Preprocessing, especially stemming or lemmatization increases the performance consistently on average around 6% and for certain lexicons and configurations up to 16.5% while methods like the usage of valence shifters, intensifiers or diminishers rarely influence overall performance. We discuss domain-specific differences and give recommendations for the selection of lexicons, preprocessing and modifications.



Involved Institutions


Details

Item typeConference or workshop item (Paper)
Publisher:KONVENS 2021 Organizers
Place of Publication:Düsseldorf, Germany
Page Range:pp. 86-103
DateSeptember 2021
InstitutionsLanguages and Literatures > Institut für Information und Medien, Sprache und Kultur (I:IMSK) > Lehrstuhl für Medieninformatik (Prof. Dr. Christian Wolff)
Informatics and Data Science > Department Human-Centered Computing > Lehrstuhl für Medieninformatik (Prof. Dr. Christian Wolff)
Related URLs
URLURL Type
https://github.com/JakobFehle/Lexicon-based-SentA-GermanSupplementary Material
KeywordsSentiment Analysis, German, Lexicon-based Sentiment Analysis, Corpus, Evaluation
Dewey Decimal Classification000 Computer science, information & general works > 004 Computer science
000 Computer science, information & general works > 020 Library & information sciences
400 Language > 400 Language, Linguistics
StatusPublished
RefereedYes, this version has been refereed
Created at the University of RegensburgYes
URN of the UB Regensburgurn:nbn:de:bvb:355-epub-508339
Item ID50833

Export bibliographical data

Owner only: item control page

nach oben