Detection and Identification of Fake News: Binary Content Classification with Pre-trained Language Models

Schütz, Mina

(2021) Detection and Identification of Fake News: Binary Content Classification with Pre-trained Language Models. In: Information between Data and Knowledge. Schriften zur Informationswissenschaft, 74. Werner Hülsbusch, Glückstadt, S. 422-431. ISBN 978-3-86488-172-5.

Veröffentlichungsdatum dieses Volltextes: 18 Apr 2021 13:29
Buchkapitel
DOI zum Zitieren dieses Dokuments: 10.5283/epub.44959

Vorschau

Veröffentlichte Version
Download ( PDF | 1MB)

Zusammenfassung

Fake news has emerged as a critical problem for society and professional journalism. Many individuals consume their news via online media, such as social networks and news websites. Therefore, the demand for automatic fake news detection is increasing. There is still no agreed upon definition for fake news, since it can include various concepts, such as clickbait, propaganda, satire, hoaxes, and rumors. This results in a broad landscape of machine learning approaches, which have a varying accuracy in detecting fake news. This masterthesis focused on a binary content-based classification approach, with a bidirectional Transformer ( BERT ), to detect fake news in online articles. BERT creates a pretrained language model during training and is fine-tuned on a labeled dataset. The FakeNewsNet dataset is used to test two variants of the model (cased / uncased) with articles, using only the body text, the title, and a concatenation of both. Additionally, both models were tested with different preprocessing steps. The models gain in all 29 carried out experiments high accuracy results, without overfitting. Using the body text and the concatenation resulted in five models with an accuracy of 87% after testing, whereas using only titles resulted in 84%. This shows that short statements could be already enough for fake news detection using language models. Also, the preprocessing steps seem to have no major impact on the predictions. It is concluded that transformer models, such as BERT , are a promising approach to detect fake news, since it achieves notable results, even without using a large dataset.

Beteiligte Einrichtungen

Sprach- und Literatur- und Kulturwissenschaften > Institut für Information und Medien, Sprache und Kultur (I:IMSK) > Lehrstuhl für Medieninformatik (Prof. Dr. Christian Wolff) Informatik und Data Science > Fachbereich Menschzentrierte Informatik > Lehrstuhl für Medieninformatik (Prof. Dr. Christian Wolff)
Browse Publikationen

Details

Dokumentenart	Buchkapitel
ISBN	978-3-86488-172-5
Buchtitel:	Information between Data and Knowledge
Verlag:	Werner Hülsbusch
Open Access Art:	Individueller Autorenvertrag
Ort der Veröffentlichung:	Glückstadt
Sonstige Reihe:	Schriften zur Informationswissenschaft
Band:	74
Seitenbereich:	S. 422-431
Datum	2021
Zusätzliche Informationen (Öffentlich)	Gerhard Lustig Award Papers
Institutionen	Sprach- und Literatur- und Kulturwissenschaften > Institut für Information und Medien, Sprache und Kultur (I:IMSK) > Lehrstuhl für Medieninformatik (Prof. Dr. Christian Wolff) Informatik und Data Science > Fachbereich Menschzentrierte Informatik > Lehrstuhl für Medieninformatik (Prof. Dr. Christian Wolff)
Stichwörter / Keywords	fake news; fake news detection; BERT; transformer; pre-trained language model; binary classification
Dewey-Dezimal-Klassifikation	000 Informatik, Informationswissenschaft, allgemeine Werke > 020 Bibliotheks- und Informationswissenschaft
Status	Veröffentlicht
Begutachtet	Ja, diese Version wurde begutachtet
An der Universität Regensburg entstanden	Nein
URN der UB Regensburg	urn:nbn:de:bvb:355-epub-449597
Dokumenten-ID	44959

Bibliographische Daten exportieren

Nur für Besitzer und Autoren: Kontrollseite des Eintrags

Downloadstatistik

Weitere Literatur (mittels CORE)

nach oben