Related papers: Compound Schema Registry

Compound Schema Registry

URL: http://arxiv.org/abs/2406.11227v1
Date: Mon, 17 Jun 2024 05:50:46 GMT
Title: Compound Schema Registry
Authors: Silvery D. Fu, Xuewei Chen,
Abstract summary: We propose the use of generalized schema evolution (GSE) facilitated by a compound AI system. This system employs Large Language Models (LLMs) to interpret the semantics of schema changes. Our approach includes developing a task-specific language, Transformation Language (STL), to generate schema mappings as an intermediate representation.
Score: 0.0
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Schema evolution is critical in managing database systems to ensure compatibility across different data versions. A schema registry typically addresses the challenges of schema evolution in real-time data streaming by managing, validating, and ensuring schema compatibility. However, current schema registries struggle with complex syntactic alterations like field renaming or type changes, which often require significant manual intervention and can disrupt service. To enhance the flexibility of schema evolution, we propose the use of generalized schema evolution (GSE) facilitated by a compound AI system. This system employs Large Language Models (LLMs) to interpret the semantics of schema changes, supporting a broader range of syntactic modifications without interrupting data streams. Our approach includes developing a task-specific language, Schema Transformation Language (STL), to generate schema mappings as an intermediate representation (IR), simplifying the integration of schema changes across different data processing platforms. Initial results indicate that this approach can improve schema mapping accuracy and efficiency, demonstrating the potential of GSE in practical applications.

Related papers

Dynamic Generation of Multi-LLM Agents Communication Topologies with Graph Diffusion Models [99.85131798240808]
We introduce a novel generative framework called textitGuided Topology Diffusion (GTD)<n>Inspired by conditional discrete graph diffusion models, GTD formulates topology synthesis as an iterative construction process.<n>At each step, the generation is steered by a lightweight proxy model that predicts multi-objective rewards.<n>Experiments show that GTD can generate highly task-adaptive, sparse, and efficient communication topologies.
arXiv Detail & Related papers (2025-10-09T05:28:28Z)
Data Dependency-Aware Code Generation from Enhanced UML Sequence Diagrams [54.528185120850274]
We propose a novel step-by-step code generation framework named API2Dep.<n>First, we introduce an enhanced Unified Modeling Language (UML) API diagram tailored for service-oriented architectures.<n>Second, recognizing the critical role of data flow, we introduce a dedicated data dependency inference task.
arXiv Detail & Related papers (2025-08-05T12:28:23Z)
Towards Scalable Schema Mapping using Large Language Models [14.028425711746513]
We identify three core issues with using large language models (LLMs) for schema mapping.<n>We propose methods to address through sampling and aggregation techniques.<n>We propose to mitigate through strategies like data type prefiltering.
arXiv Detail & Related papers (2025-05-30T15:36:56Z)
SchemaAgent: A Multi-Agents Framework for Generating Relational Database Schema [35.57815867567431]
Existing efforts are mostly based on customized rules or conventional deep learning models, often producing relational schema. We propose a unified LLM-based multi-agent framework for the automated generation of high-quality database schema.Agent. We incorporate dedicated roles for reflection and inspection, alongside an innovative error detection and correction mechanism to identify rectify issues across various phases.
arXiv Detail & Related papers (2025-03-31T09:39:19Z)
Matchmaker: Self-Improving Large Language Model Programs for Schema Matching [60.23571456538149]
We propose a compositional language model program for schema matching, comprised of candidate generation, refinement and confidence scoring. Matchmaker self-improves in a zero-shot manner without the need for labeled demonstrations. Empirically, we demonstrate on real-world medical schema matching benchmarks that Matchmaker outperforms previous ML-based approaches.
arXiv Detail & Related papers (2024-10-31T16:34:03Z)
ReMatch: Retrieval Enhanced Schema Matching with LLMs [0.874967598360817]
We present a novel method, named ReMatch, for matching schemas using retrieval-enhanced Large Language Models (LLMs) Our experimental results on large real-world schemas demonstrate that ReMatch is an effective matcher.
arXiv Detail & Related papers (2024-03-03T17:14:40Z)
Grounding Description-Driven Dialogue State Trackers with Knowledge-Seeking Turns [54.56871462068126]
Augmenting the training set with human or synthetic schema paraphrases improves the model robustness to these variations but can be either costly or difficult to control. We propose to circumvent these issues by grounding the state tracking model in knowledge-seeking turns collected from the dialogue corpus as well as the schema.
arXiv Detail & Related papers (2023-09-23T18:33:02Z)
Open-Domain Hierarchical Event Schema Induction by Incremental Prompting and Verification [81.17473088621209]
We treat event schemas as a form of commonsense knowledge that can be derived from large language models (LLMs) We design an incremental prompting and verification method to break down the construction of a complex event graph into three stages. Compared to directly using LLMs to generate a linearized graph, our method can generate large and complex schemas with 7.2% F1 improvement in temporal relations and 31.0% F1 improvement in hierarchical relations.
arXiv Detail & Related papers (2023-07-05T01:00:44Z)
Drafting Event Schemas using Language Models [48.81285141287434]
We look at the process of creating such schemas to describe complex events. Our focus is on whether we can achieve sufficient diversity and recall of key events. We show that large language models are able to achieve moderate recall against schemas taken from two different datasets.
arXiv Detail & Related papers (2023-05-24T07:57:04Z)
Schema-adaptable Knowledge Graph Construction [47.772335354080795]
Conventional Knowledge Graph Construction (KGC) approaches typically follow the static information extraction paradigm with a closed set of pre-defined schema. We propose a new task called schema-adaptable KGC, which aims to continually extract entity, relation, and event based on a dynamically changing schema graph without re-training.
arXiv Detail & Related papers (2023-05-15T15:06:20Z)
Universal Information Extraction as Unified Semantic Matching [54.19974454019611]
We decouple information extraction into two abilities, structuring and conceptualizing, which are shared by different tasks and schemas. Based on this paradigm, we propose to universally model various IE tasks with Unified Semantic Matching framework. In this way, USM can jointly encode schema and input text, uniformly extract substructures in parallel, and controllably decode target structures on demand.
arXiv Detail & Related papers (2023-01-09T11:51:31Z)
SGD-X: A Benchmark for Robust Generalization in Schema-Guided Dialogue Systems [26.14268488547028]
We release SGD-X, a benchmark for measuring robustness of dialogue systems to linguistic variations in schemas. We evaluate two dialogue state tracking models on SGD-X and observe that neither generalizes well across schema variations. We present a simple model-agnostic data augmentation method to improve schema robustness and zero-shot generalization to unseen services.
arXiv Detail & Related papers (2021-10-13T15:38:29Z)
Automated Metadata Harmonization Using Entity Resolution & Contextual Embedding [0.0]
We demonstrate automation of this step with the help of Cogntive Database's Db2Vec embedding approach. Apart from matching schemas, we demonstrate that it can also infer the correct ontological structure of the target data model.
arXiv Detail & Related papers (2020-10-17T02:14:15Z)
Survive the Schema Changes: Integration of Unmanaged Data Using Deep Learning [2.6464841907587004]
We propose to use deep learning to automatically deal with schema changes through a super cell representation and automatic injection of perturbations to the training data. Our experimental results demonstrate that our proposed approach is effective for two real-world data integration scenarios.
arXiv Detail & Related papers (2020-10-15T08:10:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.