Compound Schema Registry
- URL: http://arxiv.org/abs/2406.11227v1
- Date: Mon, 17 Jun 2024 05:50:46 GMT
- Title: Compound Schema Registry
- Authors: Silvery D. Fu, Xuewei Chen,
- Abstract summary: We propose the use of generalized schema evolution (GSE) facilitated by a compound AI system.
This system employs Large Language Models (LLMs) to interpret the semantics of schema changes.
Our approach includes developing a task-specific language, Transformation Language (STL), to generate schema mappings as an intermediate representation.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Schema evolution is critical in managing database systems to ensure compatibility across different data versions. A schema registry typically addresses the challenges of schema evolution in real-time data streaming by managing, validating, and ensuring schema compatibility. However, current schema registries struggle with complex syntactic alterations like field renaming or type changes, which often require significant manual intervention and can disrupt service. To enhance the flexibility of schema evolution, we propose the use of generalized schema evolution (GSE) facilitated by a compound AI system. This system employs Large Language Models (LLMs) to interpret the semantics of schema changes, supporting a broader range of syntactic modifications without interrupting data streams. Our approach includes developing a task-specific language, Schema Transformation Language (STL), to generate schema mappings as an intermediate representation (IR), simplifying the integration of schema changes across different data processing platforms. Initial results indicate that this approach can improve schema mapping accuracy and efficiency, demonstrating the potential of GSE in practical applications.
Related papers
- Matchmaker: Self-Improving Large Language Model Programs for Schema Matching [60.23571456538149]
We propose a compositional language model program for schema matching, comprised of candidate generation, refinement and confidence scoring.
Matchmaker self-improves in a zero-shot manner without the need for labeled demonstrations.
Empirically, we demonstrate on real-world medical schema matching benchmarks that Matchmaker outperforms previous ML-based approaches.
arXiv Detail & Related papers (2024-10-31T16:34:03Z) - ReMatch: Retrieval Enhanced Schema Matching with LLMs [0.874967598360817]
We present a novel method, named ReMatch, for matching schemas using retrieval-enhanced Large Language Models (LLMs)
Our experimental results on large real-world schemas demonstrate that ReMatch is an effective matcher.
arXiv Detail & Related papers (2024-03-03T17:14:40Z) - Grounding Description-Driven Dialogue State Trackers with
Knowledge-Seeking Turns [54.56871462068126]
Augmenting the training set with human or synthetic schema paraphrases improves the model robustness to these variations but can be either costly or difficult to control.
We propose to circumvent these issues by grounding the state tracking model in knowledge-seeking turns collected from the dialogue corpus as well as the schema.
arXiv Detail & Related papers (2023-09-23T18:33:02Z) - Open-Domain Hierarchical Event Schema Induction by Incremental Prompting
and Verification [81.17473088621209]
We treat event schemas as a form of commonsense knowledge that can be derived from large language models (LLMs)
We design an incremental prompting and verification method to break down the construction of a complex event graph into three stages.
Compared to directly using LLMs to generate a linearized graph, our method can generate large and complex schemas with 7.2% F1 improvement in temporal relations and 31.0% F1 improvement in hierarchical relations.
arXiv Detail & Related papers (2023-07-05T01:00:44Z) - Drafting Event Schemas using Language Models [48.81285141287434]
We look at the process of creating such schemas to describe complex events.
Our focus is on whether we can achieve sufficient diversity and recall of key events.
We show that large language models are able to achieve moderate recall against schemas taken from two different datasets.
arXiv Detail & Related papers (2023-05-24T07:57:04Z) - Schema-adaptable Knowledge Graph Construction [47.772335354080795]
Conventional Knowledge Graph Construction (KGC) approaches typically follow the static information extraction paradigm with a closed set of pre-defined schema.
We propose a new task called schema-adaptable KGC, which aims to continually extract entity, relation, and event based on a dynamically changing schema graph without re-training.
arXiv Detail & Related papers (2023-05-15T15:06:20Z) - Universal Information Extraction as Unified Semantic Matching [54.19974454019611]
We decouple information extraction into two abilities, structuring and conceptualizing, which are shared by different tasks and schemas.
Based on this paradigm, we propose to universally model various IE tasks with Unified Semantic Matching framework.
In this way, USM can jointly encode schema and input text, uniformly extract substructures in parallel, and controllably decode target structures on demand.
arXiv Detail & Related papers (2023-01-09T11:51:31Z) - SGD-X: A Benchmark for Robust Generalization in Schema-Guided Dialogue
Systems [26.14268488547028]
We release SGD-X, a benchmark for measuring robustness of dialogue systems to linguistic variations in schemas.
We evaluate two dialogue state tracking models on SGD-X and observe that neither generalizes well across schema variations.
We present a simple model-agnostic data augmentation method to improve schema robustness and zero-shot generalization to unseen services.
arXiv Detail & Related papers (2021-10-13T15:38:29Z) - Automated Metadata Harmonization Using Entity Resolution & Contextual
Embedding [0.0]
We demonstrate automation of this step with the help of Cogntive Database's Db2Vec embedding approach.
Apart from matching schemas, we demonstrate that it can also infer the correct ontological structure of the target data model.
arXiv Detail & Related papers (2020-10-17T02:14:15Z) - Survive the Schema Changes: Integration of Unmanaged Data Using Deep
Learning [2.6464841907587004]
We propose to use deep learning to automatically deal with schema changes through a super cell representation and automatic injection of perturbations to the training data.
Our experimental results demonstrate that our proposed approach is effective for two real-world data integration scenarios.
arXiv Detail & Related papers (2020-10-15T08:10:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.