Towards Agentic Schema Refinement
- URL: http://arxiv.org/abs/2412.07786v1
- Date: Mon, 25 Nov 2024 19:57:16 GMT
- Title: Towards Agentic Schema Refinement
- Authors: Agapi Rissaki, Ilias Fountalis, Nikolaos Vasiloglou, Wolfgang Gatterbauer,
- Abstract summary: We propose a semantic layer in-between the database and the user as a set of small and easy-to-interpret database views.<n>Our approach paves the way for LLM-powered exploration of unwieldy databases.
- Score: 3.7173623393215287
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large enterprise databases can be complex and messy, obscuring the data semantics needed for analytical tasks. We propose a semantic layer in-between the database and the user as a set of small and easy-to-interpret database views, effectively acting as a refined version of the schema. To discover these views, we introduce a multi-agent Large Language Model (LLM) simulation where LLM agents collaborate to iteratively define and refine views with minimal input. Our approach paves the way for LLM-powered exploration of unwieldy databases.
Related papers
- Beyond Quacking: Deep Integration of Language Models and RAG into DuckDB [44.057784044659726]
Large language models (LLMs) have made it easier to prototype such retrieval and reasoning data pipelines.
This often involves orchestrating data systems, managing data movement, and handling low-level details.
We introduce FlockMTL: an extension for abstractions that integrates deeply LLM capabilities and retrieval-augmented generation.
arXiv Detail & Related papers (2025-04-01T19:48:17Z) - SchemaAgent: A Multi-Agents Framework for Generating Relational Database Schema [35.57815867567431]
Existing efforts are mostly based on customized rules or conventional deep learning models, often producing relational schema.
We propose a unified LLM-based multi-agent framework for the automated generation of high-quality database schema.Agent.
We incorporate dedicated roles for reflection and inspection, alongside an innovative error detection and correction mechanism to identify rectify issues across various phases.
arXiv Detail & Related papers (2025-03-31T09:39:19Z) - An LLM-Based Approach for Insight Generation in Data Analysis [9.077654650104055]
This paper introduces a novel approach using Large Language Models (LLMs) to automatically generate textual insights.
Given a multi-table database as input, our method leverages LLMs to produce concise, text-based insights that reflect interesting patterns in the tables.
The insights are evaluated for both correctness and subjective insightfulness using a hybrid model of human judgment and automated metrics.
arXiv Detail & Related papers (2025-02-20T17:09:59Z) - MM-Embed: Universal Multimodal Retrieval with Multimodal LLMs [78.5013630951288]
This paper introduces techniques for advancing information retrieval with multimodal large language models (MLLMs)
We first study fine-tuning an MLLM as a bi-encoder retriever on 10 datasets with 16 retrieval tasks.
Our model, MM-Embed, achieves state-of-the-art performance on the multimodal retrieval benchmark M-BEIR.
arXiv Detail & Related papers (2024-11-04T20:06:34Z) - Matchmaker: Self-Improving Large Language Model Programs for Schema Matching [60.23571456538149]
We propose a compositional language model program for schema matching, comprised of candidate generation, refinement and confidence scoring.
Matchmaker self-improves in a zero-shot manner without the need for labeled demonstrations.
Empirically, we demonstrate on real-world medical schema matching benchmarks that Matchmaker outperforms previous ML-based approaches.
arXiv Detail & Related papers (2024-10-31T16:34:03Z) - The Synergy between Data and Multi-Modal Large Language Models: A Survey from Co-Development Perspective [53.48484062444108]
We find that the development of models and data is not two separate paths but rather interconnected.
On the one hand, vaster and higher-quality data contribute to better performance of MLLMs; on the other hand, MLLMs can facilitate the development of data.
To promote the data-model co-development for MLLM community, we systematically review existing works related to MLLMs from the data-model co-development perspective.
arXiv Detail & Related papers (2024-07-11T15:08:11Z) - Improving Retrieval-augmented Text-to-SQL with AST-based Ranking and Schema Pruning [10.731045939849125]
We focus on Text-to- semantic parsing from the perspective of retrieval-augmented generation.
Motivated by challenges related to the size of commercial database schemata and the deployability of business intelligence solutions, we propose $textASTReS$ that dynamically retrieves input database information.
arXiv Detail & Related papers (2024-07-03T15:55:14Z) - Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More? [54.667202878390526]
Long-context language models (LCLMs) have the potential to revolutionize our approach to tasks traditionally reliant on external tools like retrieval systems or databases.
We introduce LOFT, a benchmark of real-world tasks requiring context up to millions of tokens designed to evaluate LCLMs' performance on in-context retrieval and reasoning.
Our findings reveal LCLMs' surprising ability to rival state-of-the-art retrieval and RAG systems, despite never having been explicitly trained for these tasks.
arXiv Detail & Related papers (2024-06-19T00:28:58Z) - TAP4LLM: Table Provider on Sampling, Augmenting, and Packing Semi-structured Data for Large Language Model Reasoning [55.33939289989238]
We propose TAP4LLM as a versatile pre-processor suite for leveraging large language models (LLMs) in table-based tasks effectively.
It covers several distinct components: (1) table sampling to decompose large tables into manageable sub-tables based on query semantics, (2) table augmentation to enhance tables with additional knowledge from external sources or models, and (3) table packing & serialization to convert tables into various formats suitable for LLMs' understanding.
arXiv Detail & Related papers (2023-12-14T15:37:04Z) - Recommender AI Agent: Integrating Large Language Models for Interactive
Recommendations [53.76682562935373]
We introduce an efficient framework called textbfInteRecAgent, which employs LLMs as the brain and recommender models as tools.
InteRecAgent achieves satisfying performance as a conversational recommender system, outperforming general-purpose LLMs.
arXiv Detail & Related papers (2023-08-31T07:36:44Z) - Querying Large Language Models with SQL [16.383179496709737]
In many use-cases, information is stored in text but not available in structured data.
With the rise of pre-trained Large Language Models (LLMs), there is now an effective solution to store and use information extracted from massive corpora of text documents.
We present Galois, a prototype based on a traditional database architecture, but with new physical operators for querying the underlying LLM.
arXiv Detail & Related papers (2023-04-02T06:58:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.