Direkt zum Inhalt

Strasser, Sebastian ; Klettke, Meike

Transparent Data Preprocessing for Machine Learning

Strasser, Sebastian und Klettke, Meike (2024) Transparent Data Preprocessing for Machine Learning. In: HILDA 24: 2024 Workshop on Human-In-the-Loop Data Analytics, June 14, 2024, AA, Santiago, Chile.

Veröffentlichungsdatum dieses Volltextes: 13 Jan 2025 14:44
Konferenz- oder Workshop-Beitrag
DOI zum Zitieren dieses Dokuments: 10.5283/epub.59825


Zusammenfassung

Data preprocessing is an important task in machine learning which can significantly improve model outcomes. However, evaluating the impact of data preprocessing is often difficult. There is a need for tools which make it transparent to the user on how certain transformations conducted in preprocessing affect the data. Thus, we propose a vision of a transparency system for data preprocessing that ...

Data preprocessing is an important task in machine learning which can significantly improve model outcomes. However, evaluating the impact of data preprocessing is often difficult. There is a need for tools which make it transparent to the user on how certain transformations conducted in preprocessing affect the data. Thus, we propose a vision of a transparency system for data preprocessing that provides insights into data preparation pipelines. Our envisioned system consists of a Python library which enables users to log transformations and processed data. Subsequently, the system generates summaries of the data which was processed in the pipeline and so-called change profiles which capture the changes conducted in each processing step. These abstractions offer insight into the transformations and their effects on data. Additionally, the system includes an user interface where users can interactively discover the implemented pipeline and the changes made during preprocessing. This paper presents an initial concept of such a system. It also examines further challenges related to making preprocessing transparent and discusses potential solutions to address these challenges.



Beteiligte Einrichtungen


Details

DokumentenartKonferenz- oder Workshop-Beitrag (Paper)
Verlag:ACM
Seitenbereich:S. 1-6
Datum18 Juni 2024
InstitutionenInformatik und Data Science > Allgemeine Informatik > Data Engineering (Prof. Dr.-Ing. Meike Klettke)
Identifikationsnummer
WertTyp
10.1145/3665939.3665960DOI
Stichwörter / Keywordsdata preprocessing, data profiles, change profiles, transparency
Dewey-Dezimal-Klassifikation000 Informatik, Informationswissenschaft, allgemeine Werke > 004 Informatik
StatusVeröffentlicht
BegutachtetJa, diese Version wurde begutachtet
An der Universität Regensburg entstandenJa
URN der UB Regensburgurn:nbn:de:bvb:355-epub-598259
Dokumenten-ID59825

Bibliographische Daten exportieren

Nur für Besitzer und Autoren: Kontrollseite des Eintrags

nach oben