MARS-SQL: A multi-agent reinforcement learning framework for Text-to-SQL
- URL: http://arxiv.org/abs/2511.01008v1
- Date: Sun, 02 Nov 2025 16:55:30 GMT
- Title: MARS-SQL: A multi-agent reinforcement learning framework for Text-to-SQL
- Authors: Haolin Yang, Jipeng Zhang, Zhitao He, Yi R. Fung,
- Abstract summary: We introduce MARS-, a novel multi-agent framework that combines principled task decomposition and interactive reinforcement learning (RL)<n>Experiments show that MARS- achieves state-of-the-art Execution Accuracy of 77.84% on the BIRD set and 89.84% on the Spider test set.
- Score: 22.59453421744114
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Translating natural language to SQL remains difficult for complex queries. Such queries often need environmental interaction and self-correction. To address this, we introduce MARS-SQL, a novel multi-agent framework that combines principled task decomposition and interactive reinforcement learning (RL). Our system comprises three specialized agents: a Grounding Agent for schema linking, a Generation Agent for query generation, and a Validation Agent for final selection. The core of our framework is the Generation agent, which is trained via a multi-turn RL policy. Adopting a ReAct-style Think-Act-Observe loop, the agent iteratively generates thoughts, executes SQL actions against a live database, and revises its strategy based on execution feedback, enabling dynamic, stateful reasoning and self-correction. At inference time, we generate multiple interaction trajectories to explore diverse reasoning paths. The Validation agent, then selects the optimal trajectory by modeling verification as a next-token prediction task and choosing the solution with the highest generation probability. This structured workflow pipelines specialized agents. It combines interactive RL for generation with generative modeling for verification. The approach proves highly effective for robust and accurate SQL generation. Experiments show that MARS-SQL achieves state-of-the-art Execution Accuracy of 77.84% on the BIRD dev set and 89.75% on the Spider test set. Our code is available at https://github.com/YangHaolin0526/MARS-SQL.
Related papers
- AgentArk: Distilling Multi-Agent Intelligence into a Single LLM Agent [57.10083973844841]
AgentArk is a novel framework to distill multi-agent dynamics into the weights of a single model.<n>We investigate three hierarchical distillation strategies across various models, tasks, scaling, and scenarios.<n>By shifting the burden of computation from inference to training, the distilled models preserve the efficiency of one agent while exhibiting strong reasoning and self-correction performance of multiple agents.
arXiv Detail & Related papers (2026-02-03T19:18:28Z) - SQL-Trail: Multi-Turn Reinforcement Learning with Interleaved Feedback for Text-to-SQL [20.49395306069103]
We introduce a multi-turn reinforcement learning (RL) agentic framework for Text-to-one generation.<n>Rather than producing a query in one shot, SQL-Trail interacts with the database environment and uses execution feedback to iteratively refine its predictions.<n>Our approach centers on two key ideas: (i) an adaptive turn-budget allocation mechanism that scales the agent's interaction depth to match question difficulty, and (ii) a composite reward panel that jointly incentivizessql correctness and efficient exploration.
arXiv Detail & Related papers (2026-01-25T05:16:52Z) - AgentPRM: Process Reward Models for LLM Agents via Step-Wise Promise and Progress [71.02263260394261]
Large language models (LLMs) still encounter challenges in multi-turn decision-making tasks.<n>We build process reward models (PRMs) to evaluate each decision and guide the agent's decision-making process.<n>AgentPRM captures both the interdependence between sequential decisions and their contribution to the final goal.
arXiv Detail & Related papers (2025-11-11T14:57:54Z) - From Queries to Insights: Agentic LLM Pipelines for Spatio-Temporal Text-to-SQL [8.496933324334167]
We present a naive text-to-Act baseline (Rellama-sqlcoder-8b) with orchestration by a Mistral-based Rellama-sqlcoder-8b.<n>We evaluate on 35 natural-language queries over the NYC and Tokyo check-in, covering spatial, temporal multi-dataset reasoning.<n>The agent achieves substantially higher accuracy than the dataset 91.4% vs. 28.6% and enhances usability through maps, and plots structured natural-language summaries.
arXiv Detail & Related papers (2025-10-29T22:18:57Z) - MTSQL-R1: Towards Long-Horizon Multi-Turn Text-to-SQL via Agentic Training [31.290164208264745]
We present MT-R1, an agentic training framework for multi-turn Text-to-the-guided.<n>We cast the task as a Markov Decision Process (MDP) in which an agent interacts with (i) a database for execution feedback and (ii) a persistent dialogue memory for verification.<n>Experiments demonstrate that MT-R1 consistently outperforms strong baselines, highlighting the importance of environment-driven verification and memory-guided refinement for conversational semantic parsing.
arXiv Detail & Related papers (2025-10-12T16:12:05Z) - AGENTIQL: An Agent-Inspired Multi-Expert Framework for Text-to-SQL Generation [0.509780930114934]
AGENTIQL is an agent-inspired framework that combines a reasoning agent for question decomposition, a coding agent for sub-query generation, and a refinement step for column selection.<n>We evaluate AGENTIQL on the Spider benchmark, achieving up to 86.07% EX with 14B models using the Planner&Executor merging strategy.<n>Beyond accuracy, AGENTIQL enhances transparency by exposing intermediate reasoning steps, offering a robust, scalable, and interpretable approach to semantic parsing.
arXiv Detail & Related papers (2025-10-12T15:35:05Z) - Agentic-R1: Distilled Dual-Strategy Reasoning [58.73951532294446]
Current long chain-of-thought (long-CoT) models excel at mathematical reasoning but rely on slow and error-prone natural language traces.<n>We introduce a fine-tuning framework, DualDistill, that distills complementary reasoning strategies from multiple teachers into a unified student model.<n>Our method improves accuracy across a range of tasks, including both computation-intensive and standard benchmarks.
arXiv Detail & Related papers (2025-07-08T06:35:16Z) - RAISE: Reasoning Agent for Interactive SQL Exploration [47.77323087050061]
We propose a novel framework that unifies schema linking, query generation, and iterative refinement within a single, end-to-end component.<n>Our method emulates how humans answer questions when working with unfamiliar databases.
arXiv Detail & Related papers (2025-06-02T03:07:08Z) - Cooperative SQL Generation for Segmented Databases By Using Multi-functional LLM Agents [48.25853644159186]
We propose a Cooperativesql Generation framework based on Multi-functional Agents (CSMA)<n>Inspired by the collaboration in human teamwork, CSMA consists of three stages.<n> Experiments on the Spider and Bird benckmark demonstrate that CSMA achieves a high performance level comparable to the state-of-the-arts.
arXiv Detail & Related papers (2024-12-08T08:16:19Z) - MALT: Improving Reasoning with Multi-Agent LLM Training [67.76186488361685]
MALT (Multi-Agent LLM Training) is a novel post-training strategy that divides the reasoning process into generation, verification, and refinement steps.<n>On MATH, GSM8K, and CSQA, MALT surpasses the same baseline LLM with a relative improvement of 15.66%, 7.42%, and 9.40% respectively.
arXiv Detail & Related papers (2024-12-02T19:30:36Z) - MAC-SQL: A Multi-Agent Collaborative Framework for Text-to-SQL [47.120862170230566]
Recent Text-to-yourself methods usually suffer from significant performance degradation on "huge" databases.<n>We introduce MAC, a novel Text-to-yourself LLM-based multi-agent collaborative framework.<n>In our framework, we leverage GPT-4 as the strong backbone for all agent tasks to determine the upper bound of our framework.<n>We then fine-tune an open-sourced instruction-followed model,sql-Llama, by leveraging Code 7B, to accomplish all tasks as GPT-4 does.
arXiv Detail & Related papers (2023-12-18T14:40:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.