| Veröffentlichte Version Download ( PDF | 1MB) | Lizenz: Creative Commons Namensnennung 4.0 International |
Extracting JSON Schemas with tagged union
Klessinger, Stefan, Klettke, Meike
, Störl, Uta und Scherzinger, Stefanie
(2022)
Extracting JSON Schemas with tagged union.
In: First International Workshop on Data Ecosystems co-located with 48th International Conference on Very Large Databases (VLDB 2022), September 5, 2022, Sydney, Australia.
Veröffentlichungsdatum dieses Volltextes: 28 Aug 2025 08:20
Konferenz- oder Workshop-Beitrag
DOI zum Zitieren dieses Dokuments: 10.5283/epub.77288
Zusammenfassung
With data lakes and schema-free NoSQL document stores, extracting a descriptive schema from JSON data collections is an acute challenge. In this paper, we target the discovery of tagged unions, a JSON Schema design pattern where the value of one property of an object (the tag) conditionally implies subschemas for sibling properties. We formalize these implications as conditional functional ...
With data lakes and schema-free NoSQL document stores, extracting a descriptive schema from JSON data collections is an acute challenge. In this paper, we target the discovery of tagged unions, a JSON Schema design pattern where the value of one property of an object (the tag) conditionally implies subschemas for sibling properties. We formalize these implications as conditional functional dependencies and capture them using the JSON Schema operators if-then-else. We further motivate our heuristics to avoid overfitting. Experiments with our prototype implementation are promising, and show that this form of tagged unions can successfully be detected in real-world GeoJSON and TopoJSON datasets. In discussing future work, we outline how our approach can be extended further.
Alternative Links zum Volltext
Beteiligte Einrichtungen
Details
| Dokumentenart | Konferenz- oder Workshop-Beitrag (Paper) |
| Buchtitel: | Proceedings of the First International Workshop on Data Ecosystems co-located with 48th International Conference on Very Large Databases (VLDB 2022) |
|---|---|
| Verlag: | CEUR-WS.org |
| Sonstige Reihe: | CEUR Workshop Proceedings |
| Band: | 3306 |
| Seitenbereich: | S. 27-40 |
| Datum | 2022 |
| Institutionen | Informatik und Data Science > Allgemeine Informatik > Data Engineering (Prof. Dr.-Ing. Meike Klettke) |
| Projekte |
Gefördert von:
Deutsche Forschungsgemeinschaft (DFG)
(385808805)
|
| Dewey-Dezimal-Klassifikation | 000 Informatik, Informationswissenschaft, allgemeine Werke > 004 Informatik |
| Status | Veröffentlicht |
| Begutachtet | Ja, diese Version wurde begutachtet |
| An der Universität Regensburg entstanden | Zum Teil |
| URN der UB Regensburg | urn:nbn:de:bvb:355-epub-772883 |
| Dokumenten-ID | 77288 |
Downloadstatistik
Downloadstatistik