Direkt zum Inhalt

Heinrich, Bernd ; Klier, Mathias ; Obermeier, Andreas ; Schiller, Alexander

Event-Driven Duplicate Detection: A probability based Approach

Heinrich, Bernd, Klier, Mathias, Obermeier, Andreas und Schiller, Alexander (2018) Event-Driven Duplicate Detection: A probability based Approach. In: European Conference on Information Systems (ECIS), June 23-28, 2018, Portsmouth, United Kingdom.

Veröffentlichungsdatum dieses Volltextes: 07 Aug 2018 12:52
Konferenz- oder Workshop-Beitrag
DOI zum Zitieren dieses Dokuments: 10.5283/epub.37587


Zusammenfassung

The importance of probability-based approaches for duplicate detection has been recognized in both research and practice. However, existing approaches do not aim to consider the underlying real-world events resulting in duplicates (e.g., that a relocation may lead to the storage of two records for the same customer, once before and after the relocation). Duplicates resulting from real-world ...

The importance of probability-based approaches for duplicate detection has been recognized in both research and practice. However, existing approaches do not aim to consider the underlying real-world events resulting in duplicates (e.g., that a relocation may lead to the storage of two records for the same customer, once before and after the relocation). Duplicates resulting from real-world events exhibit specific characteristics. For instance, duplicates resulting from relocations tend to have significantly different attribute values for all address-related attributes. Hence, existing approaches focusing on high similarity with respect to attribute values are hardly able to identify possible duplicates resulting from such real-world events. To address this issue, we propose an approach for event-driven duplicate de-tection based on probability theory. Our approach assigns the probability of being a duplicate resulting from real-world events to each analysed pair of records while avoiding limiting assumptions (of existing approaches). We demonstrate the practical applicability and effectiveness of our approach in a real-world setting by analysing customer master data of a German insurer. The evaluation shows that the results provided by the approach are reliable and useful for decision support and can outperform well-known state-of-the-art approaches for duplicate detection.


Beteiligte Einrichtungen


Details

DokumentenartKonferenz- oder Workshop-Beitrag (Paper)
DatumSeptember 2018
InstitutionenWirtschaftswissenschaften > Institut für Wirtschaftsinformatik > Lehrstuhl für Wirtschaftsinformatik II (Prof. Dr. Bernd Heinrich)
Informatik und Data Science > Fachbereich Wirtschaftsinformatik > Lehrstuhl für Wirtschaftsinformatik II (Prof. Dr. Bernd Heinrich)
Stichwörter / KeywordsDuplicate detection, Record linkage, Entity Resolution, Data quality
Dewey-Dezimal-Klassifikation300 Sozialwissenschaften > 330 Wirtschaft
StatusVeröffentlicht
BegutachtetJa, diese Version wurde begutachtet
An der Universität Regensburg entstandenJa
URN der UB Regensburgurn:nbn:de:bvb:355-epub-375871
Dokumenten-ID37587

Bibliographische Daten exportieren

Nur für Besitzer und Autoren: Kontrollseite des Eintrags

nach oben