| Veröffentlichte Version Download ( PDF | 1MB) |
Detection and Identification of Fake News: Binary Content Classification with Pre-trained Language Models
Schütz, Mina
(2021)
Detection and Identification of Fake News: Binary Content Classification with Pre-trained Language Models.
In:
Information between Data and Knowledge.
Schriften zur Informationswissenschaft, 74.
Werner Hülsbusch, Glückstadt, S. 422-431.
ISBN 978-3-86488-172-5.
Veröffentlichungsdatum dieses Volltextes: 18 Apr 2021 13:29
Buchkapitel
DOI zum Zitieren dieses Dokuments: 10.5283/epub.44959
Zusammenfassung
Fake news has emerged as a critical problem for society and professional journalism. Many individuals consume their news via online media, such as social networks and news websites. Therefore, the demand for automatic fake news detection is increasing. There is still no agreed upon definition for fake news, since it can include various concepts, such as clickbait, propaganda, satire, hoaxes, and ...
Fake news has emerged as a critical problem for society and professional journalism. Many individuals consume their news via online media, such as social networks and news websites. Therefore, the demand for automatic fake news detection is increasing. There is still no agreed upon definition for fake news, since it can include various concepts, such as clickbait, propaganda, satire, hoaxes, and rumors. This results in a broad landscape of machine learning approaches, which have a varying accuracy in detecting fake news. This masterthesis focused on a binary content-based classification approach, with a bidirectional Transformer ( BERT ), to detect fake news in online articles. BERT creates a pretrained language model during training and is fine-tuned on a labeled dataset. The FakeNewsNet dataset is used to test two variants of the model (cased / uncased) with articles, using only the body text, the title, and a concatenation of both. Additionally, both models were tested with different preprocessing steps. The models gain in all 29 carried out experiments high accuracy results, without overfitting. Using the body text and the concatenation resulted in five models with an accuracy of 87% after testing, whereas using only titles resulted in 84%. This shows that short statements could be already enough for fake news detection using language models. Also, the preprocessing steps seem to have no major impact on the predictions. It is concluded that transformer models, such as BERT , are a promising approach to detect fake news, since it achieves notable results, even without using a large dataset.
Beteiligte Einrichtungen
Details
| Dokumentenart | Buchkapitel |
| ISBN | 978-3-86488-172-5 |
| Buchtitel: | Information between Data and Knowledge |
|---|---|
| Verlag: | Werner Hülsbusch |
| Ort der Veröffentlichung: | Glückstadt |
| Sonstige Reihe: | Schriften zur Informationswissenschaft |
| Band: | 74 |
| Seitenbereich: | S. 422-431 |
| Datum | 2021 |
| Zusätzliche Informationen (Öffentlich) | Gerhard Lustig Award Papers |
| Institutionen | Sprach- und Literatur- und Kulturwissenschaften > Institut für Information und Medien, Sprache und Kultur (I:IMSK) > Lehrstuhl für Medieninformatik (Prof. Dr. Christian Wolff) Informatik und Data Science > Fachbereich Menschzentrierte Informatik > Lehrstuhl für Medieninformatik (Prof. Dr. Christian Wolff) |
| Stichwörter / Keywords | fake news; fake news detection; BERT; transformer; pre-trained language model; binary classification |
| Dewey-Dezimal-Klassifikation | 000 Informatik, Informationswissenschaft, allgemeine Werke > 020 Bibliotheks- und Informationswissenschaft |
| Status | Veröffentlicht |
| Begutachtet | Ja, diese Version wurde begutachtet |
| An der Universität Regensburg entstanden | Nein |
| URN der UB Regensburg | urn:nbn:de:bvb:355-epub-449597 |
| Dokumenten-ID | 44959 |
Downloadstatistik
Downloadstatistik