Related papers: MAG-SQL: Multi-Agent Generative Approach with Soft Schema Linking and Iterative Sub-SQL Refinement for Text-to-SQL

MAG-SQL: Multi-Agent Generative Approach with Soft Schema Linking and Iterative Sub-SQL Refinement for Text-to-SQL

URL: http://arxiv.org/abs/2408.07930v4
Date: Thu, 7 Nov 2024 03:37:51 GMT
Title: MAG-SQL: Multi-Agent Generative Approach with Soft Schema Linking and Iterative Sub-SQL Refinement for Text-to-SQL
Authors: Wenxuan Xie, Gaochen Wu, Bowen Zhou,
Abstract summary: Recent In-Context Learning based methods have achieved remarkable success in Text-to-Context task. There is still a large gap between the performance of these models and human performance on datasets with complex database schema and difficult questions, such as. In our framework, an entity-based method with tables' summary is used to select the columns in database, and a novel targets-conditions decomposition method is introduced to decompose those complex questions.
Score: 15.824894030016187
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent In-Context Learning based methods have achieved remarkable success in Text-to-SQL task. However, there is still a large gap between the performance of these models and human performance on datasets with complex database schema and difficult questions, such as BIRD. Besides, existing work has neglected to supervise intermediate steps when solving questions iteratively with question decomposition methods, and the schema linking methods used in these works are very rudimentary. To address these issues, we propose MAG-SQL, a multi-agent generative approach with soft schema linking and iterative Sub-SQL refinement. In our framework, an entity-based method with tables' summary is used to select the columns in database, and a novel targets-conditions decomposition method is introduced to decompose those complex questions. Additionally, we build a iterative generating module which includes a Sub-SQL Generator and Sub-SQL Refiner, introducing external oversight for each step of generation. Through a series of ablation studies, the effectiveness of each agent in our framework has been demonstrated. When evaluated on the BIRD benchmark with GPT-4, MAG-SQL achieves an execution accuracy of 61.08%, compared to the baseline accuracy of 46.35% for vanilla GPT-4 and the baseline accuracy of 57.56% for MAC-SQL. Besides, our approach makes similar progress on Spider. The codes are available at https://github.com/LancelotXWX/MAG-SQL.

Related papers

Weaver: Interweaving SQL and LLM for Table Reasoning [63.09519234853953]
Weaver generates a flexible, step-by-step plan that combinessql for structured data retrieval with LLMs for semantic processing.<n>Weaver consistently outperforms state-of-the-art methods across four TableQA datasets, reducing both API calls and error rates.
arXiv Detail & Related papers (2025-05-25T03:27:37Z)
SchemaGraphSQL: Efficient Schema Linking with Pathfinding Graph Algorithms for Text-to-SQL on Large-Scale Databases [1.6544167074080365]
We present a zero-shot, training-free schema linking approach that first constructs a schema graph based on foreign key relations.<n>We apply classical path-finding algorithms and post-processing to identify the optimal sequence of tables and columns that should be joined.<n>Our method achieves state-of-the-art results on the BIRD benchmark, outperforming previous specialized, fine-tuned, and complex multi-step LLM-based approaches.
arXiv Detail & Related papers (2025-05-23T20:42:36Z)
UNJOIN: Enhancing Multi-Table Text-to-SQL Generation via Schema Simplification [50.59009084277447]
We introduce UNJOIN, a framework that decouples the retrieval of schema elements from logic generation.<n>In the first stage, we merge the column names of all tables in the database into a single-table representation by prefixing each column with its table name.<n>In the second stage, the query is generated on this simplified schema and mapped back to the original schema by reconstructing JOINs, UNIONs, and relational logic.
arXiv Detail & Related papers (2025-05-23T17:28:43Z)
Bridging the Gap: Transforming Natural Language Questions into SQL Queries via Abstract Query Pattern and Contextual Schema Markup [6.249316460506702]
We identify two important gaps: the structural mapping gap and the lexical mapping gap. PAS-related achieves an execution accuracy of 87.9%, and leading results on the BIRD dataset with an execution accuracy of 64.67%. Results on the Spider benchmark set a new state-of-the-art on the Spider benchmark with an execution accuracy of 87.9%, and leading results on the BIRD dataset with an execution accuracy of 64.67%.
arXiv Detail & Related papers (2025-02-20T16:11:27Z)
OpenSearch-SQL: Enhancing Text-to-SQL with Dynamic Few-shot and Consistency Alignment [6.2089733671434875]
We propose OpenSearch-, which divides the Text-to-agent task into four main modules: Preprocessing, Extraction, Generation, and Refinement, along with an Alignment module based on consistency alignment mechanism. These methods have significantly improved the performance of LLMs in the Text-to-agent task. Experimental results show that OpenSearch- achieves an execution accuracy(EX) of 69.3% on the BIRD development set, 72.28% on the test set, and a reward-based efficiency score (R-VES) of 69.3, with all three metrics ranking first at the time of submission.
arXiv Detail & Related papers (2025-02-19T07:51:50Z)
MCTS-SQL: An Effective Framework for Text-to-SQL with Monte Carlo Tree Search [3.521199751827158]
We present a novel approach to converting natural language queries into database queries. We use Monte Carlo Tree Search (MCTS) and a self-refinement mechanism to enhance accuracy and reliability. Experimental results show that MCTS-IDER achieves state-of-the-art performance.
arXiv Detail & Related papers (2025-01-28T00:52:23Z)
RSL-SQL: Robust Schema Linking in Text-to-SQL Generation [51.00761167842468]
We propose a novel framework called RSL- that combines bidirectional schema linking, contextual information augmentation, binary selection strategy, and multi-turn self-correction. benchmarks demonstrate that our approach achieves SOTA execution accuracy among open-source solutions, with 67.2% on BIRD and 87.9% on GPT-4ocorrection. Our approach outperforms a series of GPT-4 based Text-to-Seek systems when adopting DeepSeek (much cheaper) with same intact prompts.
arXiv Detail & Related papers (2024-10-31T16:22:26Z)
Enhancing LLM Fine-tuning for Text-to-SQLs by SQL Quality Measurement [1.392448435105643]
Text-to-s enables non-expert users to effortlessly retrieve desired information from databases using natural language queries. Current state-of-the-art (SOTA) models like GPT4 and T5 have shown impressive performance on large-scale benchmarks like BIRD. This paper proposed a novel approach that only needs SQL Quality to enhance Text-to-s performance.
arXiv Detail & Related papers (2024-10-02T17:21:51Z)
DAC: Decomposed Automation Correction for Text-to-SQL [51.48239006107272]
We introduce De Automation Correction (DAC), which corrects text-to-composed by decomposing entity linking and skeleton parsing. We show that our method improves performance by $3.7%$ on average of Spider, Bird, and KaggleDBQA compared with the baseline method.
arXiv Detail & Related papers (2024-08-16T14:43:15Z)
MCS-SQL: Leveraging Multiple Prompts and Multiple-Choice Selection For Text-to-SQL Generation [10.726734105960924]
Large language models (LLMs) have enabled in-context learning (ICL)-based methods that significantly outperform fine-tuning approaches for text-to- tasks. This study considers the sensitivity of LLMs to the prompts and introduces a novel approach that leverages multiple prompts to explore a broader search space for possible answers. We establish a new SOTA performance on the BIRD in terms of both the accuracy and efficiency of the generated queries.
arXiv Detail & Related papers (2024-05-13T04:59:32Z)
Schema-Aware Multi-Task Learning for Complex Text-to-SQL [4.913409359995421]
We present a schema-aware multi-task learning framework (named MT) for complicatedsql queries. Specifically, we design a schema linking discriminator module to distinguish the valid question-schema linkings. On the decoder side, we define 6-type relationships to describe the connections between tables and columns.
arXiv Detail & Related papers (2024-03-09T01:13:37Z)
MAC-SQL: A Multi-Agent Collaborative Framework for Text-to-SQL [47.120862170230566]
Recent Text-to-yourself methods usually suffer from significant performance degradation on "huge" databases. We introduce MAC, a novel Text-to-yourself LLM-based multi-agent collaborative framework. In our framework, we leverage GPT-4 as the strong backbone for all agent tasks to determine the upper bound of our framework. We then fine-tune an open-sourced instruction-followed model,sql-Llama, by leveraging Code 7B, to accomplish all tasks as GPT-4 does.
arXiv Detail & Related papers (2023-12-18T14:40:20Z)
SQL-PaLM: Improved Large Language Model Adaptation for Text-to-SQL (extended) [53.95151604061761]
This paper introduces the framework for enhancing Text-to- filtering using large language models (LLMs) With few-shot prompting, we explore the effectiveness of consistency decoding with execution-based error analyses. With instruction fine-tuning, we delve deep in understanding the critical paradigms that influence the performance of tuned LLMs.
arXiv Detail & Related papers (2023-05-26T21:39:05Z)
S$^2$SQL: Injecting Syntax to Question-Schema Interaction Graph Encoder for Text-to-SQL Parsers [66.78665327694625]
We propose S$2$, injecting Syntax to question- encoder graph for Text-to- relational parsing. We also employ the decoupling constraint to induce diverse edge embedding, which further improves the network's performance. Experiments on the Spider and robustness setting Spider-Syn demonstrate that the proposed approach outperforms all existing methods when pre-training models are used.
arXiv Detail & Related papers (2022-03-14T09:49:15Z)
Bridging Textual and Tabular Data for Cross-Domain Text-to-SQL Semantic Parsing [110.97778888305506]
BRIDGE represents the question and DB schema in a tagged sequence where a subset of the fields are augmented with cell values mentioned in the question. BRIDGE attained state-of-the-art performance on popular cross-DB text-to- relational benchmarks. Our analysis shows that BRIDGE effectively captures the desired cross-modal dependencies and has the potential to generalize to more text-DB related tasks.
arXiv Detail & Related papers (2020-12-23T12:33:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.