Event-Driven Duplicate Detection: A probability based Approach

Heinrich, Bernd, Klier, Mathias, Obermeier, Andreas und Schiller, Alexander (2018) Event-Driven Duplicate Detection: A probability based Approach. In: European Conference on Information Systems (ECIS), June 23-28, 2018, Portsmouth, United Kingdom.

Veröffentlichungsdatum dieses Volltextes: 07 Aug 2018 12:52
Konferenz- oder Workshop-Beitrag
DOI zum Zitieren dieses Dokuments: 10.5283/epub.37587

Download ( PDF | 417kB)

Zusammenfassung

The importance of probability-based approaches for duplicate detection has been recognized in both research and practice. However, existing approaches do not aim to consider the underlying real-world events resulting in duplicates (e.g., that a relocation may lead to the storage of two records for the same customer, once before and after the relocation). Duplicates resulting from real-world events exhibit specific characteristics. For instance, duplicates resulting from relocations tend to have significantly different attribute values for all address-related attributes. Hence, existing approaches focusing on high similarity with respect to attribute values are hardly able to identify possible duplicates resulting from such real-world events. To address this issue, we propose an approach for event-driven duplicate de-tection based on probability theory. Our approach assigns the probability of being a duplicate resulting from real-world events to each analysed pair of records while avoiding limiting assumptions (of existing approaches). We demonstrate the practical applicability and effectiveness of our approach in a real-world setting by analysing customer master data of a German insurer. The evaluation shows that the results provided by the approach are reliable and useful for decision support and can outperform well-known state-of-the-art approaches for duplicate detection.

Beteiligte Einrichtungen

Wirtschaftswissenschaften > Institut für Wirtschaftsinformatik > Lehrstuhl für Wirtschaftsinformatik II (Prof. Dr. Bernd Heinrich) Informatik und Data Science > Fachbereich Wirtschaftsinformatik > Lehrstuhl für Wirtschaftsinformatik II (Prof. Dr. Bernd Heinrich)
Browse Publikationen

Details

Dokumentenart	Konferenz- oder Workshop-Beitrag (Paper)
Datum	September 2018
Institutionen	Wirtschaftswissenschaften > Institut für Wirtschaftsinformatik > Lehrstuhl für Wirtschaftsinformatik II (Prof. Dr. Bernd Heinrich) Informatik und Data Science > Fachbereich Wirtschaftsinformatik > Lehrstuhl für Wirtschaftsinformatik II (Prof. Dr. Bernd Heinrich)
Stichwörter / Keywords	Duplicate detection, Record linkage, Entity Resolution, Data quality
Dewey-Dezimal-Klassifikation	300 Sozialwissenschaften > 330 Wirtschaft
Status	Veröffentlicht
Begutachtet	Ja, diese Version wurde begutachtet
An der Universität Regensburg entstanden	Ja
URN der UB Regensburg	urn:nbn:de:bvb:355-epub-375871
Dokumenten-ID	37587

Bibliographische Daten exportieren

Nur für Besitzer und Autoren: Kontrollseite des Eintrags

Downloadstatistik

Weitere Literatur (mittels CORE)

nach oben