Direkt zum Inhalt

Schütz, Mina

Detection and Identification of Fake News: Binary Content Classification with Pre-trained Language Models

Schütz, Mina (2021) Detection and Identification of Fake News: Binary Content Classification with Pre-trained Language Models. In: Information between Data and Knowledge. Schriften zur Informationswissenschaft, 74. Werner Hülsbusch, Glückstadt, S. 422-431. ISBN 978-3-86488-172-5.

Veröffentlichungsdatum dieses Volltextes: 18 Apr 2021 13:29
Buchkapitel
DOI zum Zitieren dieses Dokuments: 10.5283/epub.44959


Zusammenfassung

Fake news has emerged as a critical problem for society and professional journalism. Many individuals consume their news via online media, such as social networks and news websites. Therefore, the demand for automatic fake news detection is increasing. There is still no agreed upon definition for fake news, since it can include various concepts, such as clickbait, propaganda, satire, hoaxes, and ...

Fake news has emerged as a critical problem for society and professional journalism. Many individuals consume their news via online media, such as social networks and news websites. Therefore, the demand for automatic fake news detection is increasing. There is still no agreed upon definition for fake news, since it can include various concepts, such as clickbait, propaganda, satire, hoaxes, and rumors. This results in a broad landscape of machine learning approaches, which have a varying accuracy in detecting fake news. This masterthesis focused on a binary content-based classification approach, with a bidirectional Transformer ( BERT ), to detect fake news in online articles. BERT creates a pretrained language model during training and is fine-tuned on a labeled dataset. The FakeNewsNet dataset is used to test two variants of the model (cased / uncased) with articles, using only the body text, the title, and a concatenation of both. Additionally, both models were tested with different preprocessing steps. The models gain in all 29 carried out experiments high accuracy results, without overfitting. Using the body text and the concatenation resulted in five models with an accuracy of 87% after testing, whereas using only titles resulted in 84%. This shows that short statements could be already enough for fake news detection using language models. Also, the preprocessing steps seem to have no major impact on the predictions. It is concluded that transformer models, such as BERT , are a promising approach to detect fake news, since it achieves notable results, even without using a large dataset.


Beteiligte Einrichtungen


Details

DokumentenartBuchkapitel
ISBN978-3-86488-172-5
Buchtitel:Information between Data and Knowledge
Verlag:Werner Hülsbusch
Ort der Veröffentlichung:Glückstadt
Sonstige Reihe:Schriften zur Informationswissenschaft
Band:74
Seitenbereich:S. 422-431
Datum2021
Zusätzliche Informationen (Öffentlich)Gerhard Lustig Award Papers
InstitutionenSprach- und Literatur- und Kulturwissenschaften > Institut für Information und Medien, Sprache und Kultur (I:IMSK) > Lehrstuhl für Medieninformatik (Prof. Dr. Christian Wolff)
Informatik und Data Science > Fachbereich Menschzentrierte Informatik > Lehrstuhl für Medieninformatik (Prof. Dr. Christian Wolff)
Stichwörter / Keywordsfake news; fake news detection; BERT; transformer; pre-trained language model; binary classification
Dewey-Dezimal-Klassifikation000 Informatik, Informationswissenschaft, allgemeine Werke > 020 Bibliotheks- und Informationswissenschaft
StatusVeröffentlicht
BegutachtetJa, diese Version wurde begutachtet
An der Universität Regensburg entstandenNein
URN der UB Regensburgurn:nbn:de:bvb:355-epub-449597
Dokumenten-ID44959

Bibliographische Daten exportieren

Nur für Besitzer und Autoren: Kontrollseite des Eintrags

nach oben