Direkt zum Inhalt

Wehrheim, Lino ; Liebl, Bernhard ; Burghardt, Manuel

Extracting Textual Data from Historical Newspaper Scans and its Challenges for “Guerilla-Projects”

Wehrheim, Lino , Liebl, Bernhard und Burghardt, Manuel (2022) Extracting Textual Data from Historical Newspaper Scans and its Challenges for “Guerilla-Projects”. Regensburg Economic and Social History (RESH) Discussion Paper Series 08, Diskussionspapier, Universitätsbibliothek Regensburg, Regensburg.

Veröffentlichungsdatum dieses Volltextes: 02 Dez 2022 05:29
Monographie
DOI zum Zitieren dieses Dokuments: 10.5283/epub.53259


Zusammenfassung

In 2022, it is a common place that digital historical newspapers (DHN) have become increasingly available. Despite the undeniable progress in the supply of DHN and the methods to perform rigorous quantitative analysis, however, working with DHN still poses various pitfalls, especially when scholars use data provided by third parties, such as libraries or commercial providers. Reporting from a ...

In 2022, it is a common place that digital historical newspapers (DHN) have become increasingly available. Despite the undeniable progress in the supply of DHN and the methods to perform rigorous quantitative analysis, however, working with DHN still poses various pitfalls, especially when scholars use data provided by third parties, such as libraries or commercial providers. Reporting from a current project, we want to share our experiences and communicate the various problems we faced while working with DHN. After a short project summary, we present the main problems that we faced in our project and that we think might also be relevant for other scholars, particularly those who work in small research groups. We arrange these problems according to an archetype workflow, which is divided into the three steps of corpus acquisition, corpus evaluation, and corpus preparation. By raising some red flags, we want to call attention to what we think common DHN related problems, to raise awareness for potential pitfalls, and, this way, to provide some guidelines for scholars who consider using DHN for their research.



Beteiligte Einrichtungen


Details

DokumentenartMonographie (Diskussionspapier)
Verlag:Universitätsbibliothek Regensburg
Ort der Veröffentlichung:Regensburg
Schriftenreihe der Universität Regensburg:Regensburg Economic and Social History (RESH) Discussion Paper Series
Band:08
Datum23 November 2022
InstitutionenPhilosophie, Kunst-, Geschichts- und Gesellschaftswissenschaften > Institut für Geschichte > Wirtschafts- und Sozialgeschichte - Prof. Dr. Mark Spoerer
Identifikationsnummer
WertTyp
10.5283/epub.53259DOI
Klassifikation
NotationArt
C80Journal of Economics Literature Classification
C02Journal of Economics Literature Classification
Stichwörter / KeywordsHistorical newspapers, OCR, layout detection, trouble shooting
Dewey-Dezimal-Klassifikation900 Geschichte und Geografie > 900 Geschichte
900 Geschichte und Geografie > 943 Geschichte Deutschlands
StatusVeröffentlicht
BegutachtetNie, das Dokument wird nicht wissenschaftlich begutachtet werden
An der Universität Regensburg entstandenJa
URN der UB Regensburgurn:nbn:de:bvb:355-epub-532598
Dokumenten-ID53259

Bibliographische Daten exportieren

Nur für Besitzer und Autoren: Kontrollseite des Eintrags

nach oben