| Veröffentlichte Version Download ( PDF | 690kB) | Lizenz: Creative Commons Namensnennung 4.0 International |
Transparent Data Preprocessing for Machine Learning
Strasser, Sebastian und Klettke, Meike
(2024)
Transparent Data Preprocessing for Machine Learning.
In: HILDA 24: 2024 Workshop on Human-In-the-Loop Data Analytics, June 14, 2024, AA, Santiago, Chile.
Veröffentlichungsdatum dieses Volltextes: 13 Jan 2025 14:44
Konferenz- oder Workshop-Beitrag
DOI zum Zitieren dieses Dokuments: 10.5283/epub.59825
Zusammenfassung
Data preprocessing is an important task in machine learning which can significantly improve model outcomes. However, evaluating the impact of data preprocessing is often difficult. There is a need for tools which make it transparent to the user on how certain transformations conducted in preprocessing affect the data. Thus, we propose a vision of a transparency system for data preprocessing that ...
Data preprocessing is an important task in machine learning which can significantly improve model outcomes. However, evaluating the impact of data preprocessing is often difficult. There is a need for tools which make it transparent to the user on how certain transformations conducted in preprocessing affect the data. Thus, we propose a vision of a transparency system for data preprocessing that provides insights into data preparation pipelines. Our envisioned system consists of a Python library which enables users to log transformations and processed data. Subsequently, the system generates summaries of the data which was processed in the pipeline and so-called change profiles which capture the changes conducted in each processing step. These abstractions offer insight into the transformations and their effects on data. Additionally, the system includes an user interface where users can interactively discover the implemented pipeline and the changes made during preprocessing. This paper presents an initial concept of such a system. It also examines further challenges related to making preprocessing transparent and discusses potential solutions to address these challenges.
Alternative Links zum Volltext
Beteiligte Einrichtungen
Details
| Dokumentenart | Konferenz- oder Workshop-Beitrag (Paper) | ||||
| Verlag: | ACM | ||||
|---|---|---|---|---|---|
| Seitenbereich: | S. 1-6 | ||||
| Datum | 18 Juni 2024 | ||||
| Institutionen | Informatik und Data Science > Allgemeine Informatik > Data Engineering (Prof. Dr.-Ing. Meike Klettke) | ||||
| Identifikationsnummer |
| ||||
| Stichwörter / Keywords | data preprocessing, data profiles, change profiles, transparency | ||||
| Dewey-Dezimal-Klassifikation | 000 Informatik, Informationswissenschaft, allgemeine Werke > 004 Informatik | ||||
| Status | Veröffentlicht | ||||
| Begutachtet | Ja, diese Version wurde begutachtet | ||||
| An der Universität Regensburg entstanden | Ja | ||||
| URN der UB Regensburg | urn:nbn:de:bvb:355-epub-598259 | ||||
| Dokumenten-ID | 59825 |
Downloadstatistik
Downloadstatistik