| Download ( PDF | 417kB) |
Event-Driven Duplicate Detection: A probability based Approach
Heinrich, Bernd, Klier, Mathias, Obermeier, Andreas und Schiller, Alexander (2018) Event-Driven Duplicate Detection: A probability based Approach. In: European Conference on Information Systems (ECIS), June 23-28, 2018, Portsmouth, United Kingdom.Veröffentlichungsdatum dieses Volltextes: 07 Aug 2018 12:52
Konferenz- oder Workshop-Beitrag
DOI zum Zitieren dieses Dokuments: 10.5283/epub.37587
Zusammenfassung
The importance of probability-based approaches for duplicate detection has been recognized in both research and practice. However, existing approaches do not aim to consider the underlying real-world events resulting in duplicates (e.g., that a relocation may lead to the storage of two records for the same customer, once before and after the relocation). Duplicates resulting from real-world ...
The importance of probability-based approaches for duplicate detection has been recognized in both research and practice. However, existing approaches do not aim to consider the underlying real-world events resulting in duplicates (e.g., that a relocation may lead to the storage of two records for the same customer, once before and after the relocation). Duplicates resulting from real-world events exhibit specific characteristics. For instance, duplicates resulting from relocations tend to have significantly different attribute values for all address-related attributes. Hence, existing approaches focusing on high similarity with respect to attribute values are hardly able to identify possible duplicates resulting from such real-world events. To address this issue, we propose an approach for event-driven duplicate de-tection based on probability theory. Our approach assigns the probability of being a duplicate resulting from real-world events to each analysed pair of records while avoiding limiting assumptions (of existing approaches). We demonstrate the practical applicability and effectiveness of our approach in a real-world setting by analysing customer master data of a German insurer. The evaluation shows that the results provided by the approach are reliable and useful for decision support and can outperform well-known state-of-the-art approaches for duplicate detection.
Beteiligte Einrichtungen
Details
| Dokumentenart | Konferenz- oder Workshop-Beitrag (Paper) |
| Datum | September 2018 |
| Institutionen | Wirtschaftswissenschaften > Institut für Wirtschaftsinformatik > Lehrstuhl für Wirtschaftsinformatik II (Prof. Dr. Bernd Heinrich) Informatik und Data Science > Fachbereich Wirtschaftsinformatik > Lehrstuhl für Wirtschaftsinformatik II (Prof. Dr. Bernd Heinrich) |
| Stichwörter / Keywords | Duplicate detection, Record linkage, Entity Resolution, Data quality |
| Dewey-Dezimal-Klassifikation | 300 Sozialwissenschaften > 330 Wirtschaft |
| Status | Veröffentlicht |
| Begutachtet | Ja, diese Version wurde begutachtet |
| An der Universität Regensburg entstanden | Ja |
| URN der UB Regensburg | urn:nbn:de:bvb:355-epub-375871 |
| Dokumenten-ID | 37587 |
Downloadstatistik
Downloadstatistik