Text2Schema: Filling the Gap in Designing Database Table Structures based on Natural Language
- URL: http://arxiv.org/abs/2503.23886v2
- Date: Fri, 17 Oct 2025 07:09:40 GMT
- Title: Text2Schema: Filling the Gap in Designing Database Table Structures based on Natural Language
- Authors: Qin Wang, Youhuan Li, Yansong Feng, Si Chen, Ziming Li, Pan Zhang, Zihui Si, Yixuan Chen, Zhichao Shi, Zebin Huang, Guo Chen, Wenqiang Jin,
- Abstract summary: People without a database background usually rely on file systems or tools such as Excel data management.<n> Database systems possess strong management capabilities, but require a high level of professional expertise from users.
- Score: 22.15408079332362
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: People without a database background usually rely on file systems or tools such as Excel for data management, which often lead to redundancy and data inconsistency. Relational databases possess strong data management capabilities, but require a high level of professional expertise from users. Although there are already many works on Text2SQL to automate the translation of natural language into SQL queries for data manipulation, all of them presuppose that the database schema is pre-designed. In practice, schema design itself demands domain expertise, and research on directly generating schemas from textual requirements remains unexplored. In this paper, we systematically define a new problem, called Text2Schema, to convert a natural language text requirement into a relational database schema. With an effective Text2Schema technique, users can effortlessly create database table structures using natural language, and subsequently leverage existing Text2SQL techniques to perform data manipulations, which significantly narrows the gap between non-technical personnel and highly efficient, versatile relational database systems. We propose SchemaAgent, an LLM-based multi-agent framework for Text2Schema. We emulate the workflow of manual schema design by assigning specialized roles to agents and enabling effective collaboration to refine their respective subtasks. We also incorporate dedicated roles for reflection and inspection, along with an innovative error detection and correction mechanism to identify and rectify issues across various phases. Moreover, we build and open source a benchmark containing 381 pairs of requirement description and schema. Experimental results demonstrate the superiority of our approach over comparative work.
Related papers
- AskDB: An LLM Agent for Natural Language Interaction with Relational Databases [0.06524460254566904]
We introduce AskDB, a large language model powered agent for interacting with databases through natural language.<n>AskDB supports both data analysis and administrative operations oversql databases through natural language.<n>Our results highlight the potential of AskDB as a unified and intelligent agent for relational database systems, offering an intuitive and accessible experience for end users.
arXiv Detail & Related papers (2025-11-20T08:06:09Z) - From Queries to Insights: Agentic LLM Pipelines for Spatio-Temporal Text-to-SQL [8.496933324334167]
We present a naive text-to-Act baseline (Rellama-sqlcoder-8b) with orchestration by a Mistral-based Rellama-sqlcoder-8b.<n>We evaluate on 35 natural-language queries over the NYC and Tokyo check-in, covering spatial, temporal multi-dataset reasoning.<n>The agent achieves substantially higher accuracy than the dataset 91.4% vs. 28.6% and enhances usability through maps, and plots structured natural-language summaries.
arXiv Detail & Related papers (2025-10-29T22:18:57Z) - TEXT2DB: Integration-Aware Information Extraction with Large Language Model Agents [64.11547566154947]
We propose a new formulation of IE TEXT2DB that emphasizes the integration of IE output and the target database.<n>We introduce a new benchmark featuring common demands such as data infilling, row population, and column addition.<n>Experiments show that OPAL can successfully adapt to diverse database schemas by generating different code plans and calling the required IE models.
arXiv Detail & Related papers (2025-10-28T02:49:40Z) - RASL: Retrieval Augmented Schema Linking for Massive Database Text-to-SQL [1.3654846342364308]
We introduce a component-based retrieval architecture that decomposes database schemas and metadata into discrete semantic units.<n>Our solution enables practical text-to- interfaces across diverse enterprise settings without specialized fine-tuning.
arXiv Detail & Related papers (2025-07-30T21:09:47Z) - Assemble Your Crew: Automatic Multi-agent Communication Topology Design via Autoregressive Graph Generation [72.44384066166147]
Multi-agent systems (MAS) based on large language models (LLMs) have emerged as a powerful solution for dealing with complex problems across diverse domains.<n>Existing approaches are fundamentally constrained by their reliance on a template graph modification paradigm with a predefined set of agents and hard-coded interaction structures.<n>We propose ARG-Designer, a novel autoregressive model that operationalizes this paradigm by constructing the collaboration graph from scratch.
arXiv Detail & Related papers (2025-07-24T09:17:41Z) - An advanced AI driven database system [0.0]
This paper presents a new database system supported by Artificial Intelligence (AI)<n>It is intended to improve the management of data using natural language processing (NLP) - based intuitive interfaces.<n>The system is intended to strengthen the potential of databases through the integration of Large Language Models (LLMs) and advanced machine learning algorithms.
arXiv Detail & Related papers (2025-07-22T16:10:45Z) - Large Language Models are Good Relational Learners [55.40941576497973]
We introduce Rel-LLM, a novel architecture that utilizes a graph neural network (GNN)- based encoder to generate structured relational prompts for large language models (LLMs)<n>Unlike traditional text-based serialization approaches, our method preserves the inherent relational structure of databases while enabling LLMs to process and reason over complex entity relationships.
arXiv Detail & Related papers (2025-06-06T04:07:55Z) - Structuring the Unstructured: A Multi-Agent System for Extracting and Querying Financial KPIs and Guidance [54.25184684077833]
We propose an efficient and scalable method for extracting quantitative insights from unstructured financial documents.<n>Our proposed system consists of two specialized agents: the emphExtraction Agent and the emphText-to-Agent
arXiv Detail & Related papers (2025-05-25T15:45:46Z) - SchemaGraphSQL: Efficient Schema Linking with Pathfinding Graph Algorithms for Text-to-SQL on Large-Scale Databases [1.6544167074080365]
We present a zero-shot, training-free schema linking approach that first constructs a schema graph based on foreign key relations.<n>We apply classical path-finding algorithms and post-processing to identify the optimal sequence of tables and columns that should be joined.<n>Our method achieves state-of-the-art results on the BIRD benchmark, outperforming previous specialized, fine-tuned, and complex multi-step LLM-based approaches.
arXiv Detail & Related papers (2025-05-23T20:42:36Z) - UNJOIN: Enhancing Multi-Table Text-to-SQL Generation via Schema Simplification [50.59009084277447]
We introduce UNJOIN, a framework that decouples the retrieval of schema elements from logic generation.<n>In the first stage, we merge the column names of all tables in the database into a single-table representation by prefixing each column with its table name.<n>In the second stage, the query is generated on this simplified schema and mapped back to the original schema by reconstructing JOINs, UNIONs, and relational logic.
arXiv Detail & Related papers (2025-05-23T17:28:43Z) - HM-RAG: Hierarchical Multi-Agent Multimodal Retrieval Augmented Generation [11.53083922927901]
HM-RAG is a novel Hierarchical Multi-agent Multimodal RAG framework.
It pioneers collaborative intelligence for dynamic knowledge synthesis across structured, unstructured, and graph-based data.
arXiv Detail & Related papers (2025-04-13T06:55:33Z) - DB-Explore: Automated Database Exploration and Instruction Synthesis for Text-to-SQL [18.915121803834698]
We propose DB-Explore, a novel framework that systematically aligns large language models with database knowledge.<n>Our framework enables comprehensive database understanding through diverse sampling strategies and automated instruction generation.
arXiv Detail & Related papers (2025-03-06T20:46:43Z) - A Collaborative Multi-Agent Approach to Retrieval-Augmented Generation Across Diverse Data [0.0]
Retrieval-Augmented Generation (RAG) enhances Large Language Models (LLMs)<n>Traditional RAG systems typically use a single-agent architecture to handle query generation, data retrieval, and response synthesis.<n>This paper proposes a multi-agent RAG system to address these limitations.
arXiv Detail & Related papers (2024-12-08T07:18:19Z) - Towards Agentic Schema Refinement [3.7173623393215287]
We propose a semantic layer in-between the database and the user as a set of small and easy-to-interpret database views.<n>Our approach paves the way for LLM-powered exploration of unwieldy databases.
arXiv Detail & Related papers (2024-11-25T19:57:16Z) - Matchmaker: Self-Improving Large Language Model Programs for Schema Matching [60.23571456538149]
We propose a compositional language model program for schema matching, comprised of candidate generation, refinement and confidence scoring.
Matchmaker self-improves in a zero-shot manner without the need for labeled demonstrations.
Empirically, we demonstrate on real-world medical schema matching benchmarks that Matchmaker outperforms previous ML-based approaches.
arXiv Detail & Related papers (2024-10-31T16:34:03Z) - AutoML-Agent: A Multi-Agent LLM Framework for Full-Pipeline AutoML [56.565200973244146]
Automated machine learning (AutoML) accelerates AI development by automating tasks in the development pipeline.
Recent works have started exploiting large language models (LLM) to lessen such burden.
This paper proposes AutoML-Agent, a novel multi-agent framework tailored for full-pipeline AutoML.
arXiv Detail & Related papers (2024-10-03T20:01:09Z) - Multi-Agent Planning Using Visual Language Models [2.2369578015657954]
Large Language Models (LLMs) and Visual Language Models (VLMs) are attracting increasing interest due to their improving performance and applications across various domains and tasks.<n>LLMs andVLMs can produce erroneous results, especially when a deep understanding of the problem domain is required.<n>We propose a multi-agent architecture for embodied task planning that operates without the need for specific data structures as input.
arXiv Detail & Related papers (2024-08-10T08:10:17Z) - DBCopilot: Natural Language Querying over Massive Databases via Schema Routing [47.009638761948466]
We present DBCopilot, a framework that addresses challenges by employing a compact and flexible copilot model for routing over massive databases.<n>This framework utilizes a single lightweight differentiable search index to construct semantic mappings for massive database schemata, and navigates natural language questions to their target databases and tables in a relation joint retrieval manner.
arXiv Detail & Related papers (2023-12-06T12:37:28Z) - Recommender AI Agent: Integrating Large Language Models for Interactive
Recommendations [53.76682562935373]
We introduce an efficient framework called textbfInteRecAgent, which employs LLMs as the brain and recommender models as tools.
InteRecAgent achieves satisfying performance as a conversational recommender system, outperforming general-purpose LLMs.
arXiv Detail & Related papers (2023-08-31T07:36:44Z) - Benchmarking Diverse-Modal Entity Linking with Generative Models [78.93737257356784]
We construct a benchmark for diverse-modal EL (DMEL) from existing EL datasets.
To approach the DMEL task, we proposed a generative diverse-modal model (GDMM) following a multimodal-encoder-decoder paradigm.
GDMM builds a stronger DMEL baseline, outperforming state-of-the-art task-specific EL models by 8.51 F1 score on average.
arXiv Detail & Related papers (2023-05-27T02:38:46Z) - SQL-PaLM: Improved Large Language Model Adaptation for Text-to-SQL (extended) [53.95151604061761]
This paper introduces the framework for enhancing Text-to- filtering using large language models (LLMs)
With few-shot prompting, we explore the effectiveness of consistency decoding with execution-based error analyses.
With instruction fine-tuning, we delve deep in understanding the critical paradigms that influence the performance of tuned LLMs.
arXiv Detail & Related papers (2023-05-26T21:39:05Z) - Proton: Probing Schema Linking Information from Pre-trained Language
Models for Text-to-SQL Parsing [66.55478402233399]
We propose a framework to elicit relational structures via a probing procedure based on Poincar'e distance metric.
Compared with commonly-used rule-based methods for schema linking, we found that probing relations can robustly capture semantic correspondences.
Our framework sets new state-of-the-art performance on three benchmarks.
arXiv Detail & Related papers (2022-06-28T14:05:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.