Direkt zum Inhalt

Klessinger, Stefan ; Klettke, Meike ; Störl, Uta ; Scherzinger, Stefanie

Extracting JSON Schemas with tagged union

Klessinger, Stefan, Klettke, Meike , Störl, Uta und Scherzinger, Stefanie (2022) Extracting JSON Schemas with tagged union. In: First International Workshop on Data Ecosystems co-located with 48th International Conference on Very Large Databases (VLDB 2022), September 5, 2022, Sydney, Australia.

Veröffentlichungsdatum dieses Volltextes: 28 Aug 2025 08:20
Konferenz- oder Workshop-Beitrag
DOI zum Zitieren dieses Dokuments: 10.5283/epub.77288


Zusammenfassung

With data lakes and schema-free NoSQL document stores, extracting a descriptive schema from JSON data collections is an acute challenge. In this paper, we target the discovery of tagged unions, a JSON Schema design pattern where the value of one property of an object (the tag) conditionally implies subschemas for sibling properties. We formalize these implications as conditional functional ...

With data lakes and schema-free NoSQL document stores, extracting a descriptive schema from JSON data collections is an acute challenge. In this paper, we target the discovery of tagged unions, a JSON Schema design pattern where the value of one property of an object (the tag) conditionally implies subschemas for sibling properties. We formalize these implications as conditional functional dependencies and capture them using the JSON Schema operators if-then-else. We further motivate our heuristics to avoid overfitting. Experiments with our prototype implementation are promising, and show that this form of tagged unions can successfully be detected in real-world GeoJSON and TopoJSON datasets. In discussing future work, we outline how our approach can be extended further.



Beteiligte Einrichtungen


Details

DokumentenartKonferenz- oder Workshop-Beitrag (Paper)
Buchtitel:Proceedings of the First International Workshop on Data Ecosystems co-located with 48th International Conference on Very Large Databases (VLDB 2022)
Verlag:CEUR-WS.org
Sonstige Reihe:CEUR Workshop Proceedings
Band:3306
Seitenbereich:S. 27-40
Datum2022
InstitutionenInformatik und Data Science > Allgemeine Informatik > Data Engineering (Prof. Dr.-Ing. Meike Klettke)
Projekte
Gefördert von: Deutsche Forschungsgemeinschaft (DFG) (385808805)
Dewey-Dezimal-Klassifikation000 Informatik, Informationswissenschaft, allgemeine Werke > 004 Informatik
StatusVeröffentlicht
BegutachtetJa, diese Version wurde begutachtet
An der Universität Regensburg entstandenZum Teil
URN der UB Regensburgurn:nbn:de:bvb:355-epub-772883
Dokumenten-ID77288

Bibliographische Daten exportieren

Nur für Besitzer und Autoren: Kontrollseite des Eintrags

nach oben