MTIR-SQL: Multi-turn Tool-Integrated Reasoning Reinforcement Learning for Text-to-SQL
- URL: http://arxiv.org/abs/2510.25510v1
- Date: Wed, 29 Oct 2025 13:34:27 GMT
- Title: MTIR-SQL: Multi-turn Tool-Integrated Reasoning Reinforcement Learning for Text-to-SQL
- Authors: Zekun Xu, Siyu Xia, Chuhuai Yue, Jiajun Chai, Mingxue Tian, Xiaohan Wang, Wei Lin, Haoxuan Li, Guojun Yin,
- Abstract summary: Large language models (LLMs) are increasingly used in Text-to-aware tasks.<n>Existing methods rely on static execution feedback, which restricts real-time error correction.<n>We propose MTIR-IDER, an innovative Multi-turn Tool-Integrated Reasoning reinforcement learning framework.
- Score: 46.37961458768655
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: As large language models (LLMs) are increasingly used in Text-to-SQL tasks, Reinforcement Learning (RL) has become a common method for improving performance. Existing methods primarily rely on static execution feedback, which restricts real-time error correction. However, integrating multi-turn tool invocation along with dynamic feedback could significantly improve adaptability and robustness, ultimately enhancing model performance. To address these issues, we propose MTIR-SQL, an innovative Multi-turn Tool-Integrated Reasoning reinforcement learning framework for Text-to-SQL. Our approach introduces an execution-aware multi-turn reasoning paradigm that seamlessly incorporates database execution feedback at each reasoning step, enabling context-sensitive query generation and progressive refinement throughout the reasoning process. The framework extends the GRPO algorithm to accommodate complex multi-turn interaction scenarios. Considering the training instability characteristics of MTIR and the potential for significant Deviation of model distribution from the initial model, we enhance the GRPO algorithm by adding a trajectory filtering mechanism and removing KL loss constraints. Experimental results demonstrate that MTIR-SQL, with 4B parameters, achieves \textbf{64.4}\% accuracy in the BIRD Dev and 84.6% execution accuracy in the SPIDER Dev, significantly outperforming existing approaches.
Related papers
- SQL-Trail: Multi-Turn Reinforcement Learning with Interleaved Feedback for Text-to-SQL [20.49395306069103]
We introduce a multi-turn reinforcement learning (RL) agentic framework for Text-to-one generation.<n>Rather than producing a query in one shot, SQL-Trail interacts with the database environment and uses execution feedback to iteratively refine its predictions.<n>Our approach centers on two key ideas: (i) an adaptive turn-budget allocation mechanism that scales the agent's interaction depth to match question difficulty, and (ii) a composite reward panel that jointly incentivizessql correctness and efficient exploration.
arXiv Detail & Related papers (2026-01-25T05:16:52Z) - Fine-tuned LLM-based Code Migration Framework [0.0]
The study presents the outcomes of research and experimental validation in the domain of automated sampling migration.<n>The proposed method for migration essentially appears as a framework that leverages the best aspects of traditional software engineering techniques.
arXiv Detail & Related papers (2025-12-15T16:42:51Z) - Co-Training Vision Language Models for Remote Sensing Multi-task Learning [68.15604397741753]
Vision language models (VLMs) have achieved promising results in RS image understanding, grounding, and ultra-high-resolution (UHR) image reasoning.<n>We present RSCoVLM, a simple yet flexible VLM baseline for RS MTL.<n>We propose a unified dynamic-resolution strategy to address the diverse image scales inherent in RS imagery.
arXiv Detail & Related papers (2025-11-26T10:55:07Z) - Sample-Efficient Online Learning in LM Agents via Hindsight Trajectory Rewriting [92.57796055887995]
We introduce ECHO, a prompting framework that adapts hindsight experience replay from reinforcement learning for language model agents.<n> ECHO generates optimized trajectories for alternative goals that could have been achieved during failed attempts.<n>We evaluate ECHO on stateful versions of XMiniGrid, a text-based navigation and planning benchmark, and PeopleJoinQA, a collaborative information-gathering enterprise simulation.
arXiv Detail & Related papers (2025-10-11T18:11:09Z) - HES-SQL: Hybrid Reasoning for Efficient Text-to-SQL with Structural Skeleton Guidance [6.653834890554154]
We present HES-, a novel hybrid training framework that advances Text-to-latency generation through the integration of thinking-mode-fused supervised fine-tuning.<n>This framework enables switch between reasoning and non-reasoning modes while improving query accuracy and execution efficiency.
arXiv Detail & Related papers (2025-10-10T01:15:57Z) - VerlTool: Towards Holistic Agentic Reinforcement Learning with Tool Use [78.29315418819074]
We introduce VerlTool, a unified and modular framework that addresses limitations through systematic design principles.<n>Our framework formalizes ARLT as multi-turn trajectories with multi-modal observation tokens (text/image/video), extending beyond single-turn RLVR paradigms.<n>The modular plugin architecture enables rapid tool integration requiring only lightweight Python definitions.
arXiv Detail & Related papers (2025-09-01T01:45:18Z) - Feedback-Driven Tool-Use Improvements in Large Language Models via Automated Build Environments [70.42705564227548]
We propose an automated environment construction pipeline for large language models (LLMs)<n>This enables the creation of high-quality training environments that provide detailed and measurable feedback without relying on external tools.<n>We also introduce a verifiable reward mechanism that evaluates both the precision of tool use and the completeness of task execution.
arXiv Detail & Related papers (2025-08-12T09:45:19Z) - MLE-Dojo: Interactive Environments for Empowering LLM Agents in Machine Learning Engineering [57.156093929365255]
Gym-style framework for systematically reinforcement learning, evaluating, and improving autonomous large language model (LLM) agents.<n>MLE-Dojo covers diverse, open-ended MLE tasks carefully curated to reflect realistic engineering scenarios.<n>Its fully executable environment supports comprehensive agent training via both supervised fine-tuning and reinforcement learning.
arXiv Detail & Related papers (2025-05-12T17:35:43Z) - Reward-SQL: Boosting Text-to-SQL via Stepwise Reasoning and Process-Supervised Rewards [25.810871864483076]
External Process Reward Models (PRMs) can be introduced during training to provide fine-grained supervision.<n>We propose Reward-BIRD, a framework that explores how to incorporate PRMs into the Text-to-the- reasoning process effectively.
arXiv Detail & Related papers (2025-05-07T08:32:22Z) - Reasoning-SQL: Reinforcement Learning with SQL Tailored Partial Rewards for Reasoning-Enhanced Text-to-SQL [13.215512957681185]
Existing approaches often rely on handcrafted reasoning paths with inductive biases that can limit their overall effectiveness.<n>Motivated by the recent success of reasoning-enhanced models such as OpenAI o1, we propose a novel set of partial rewards tailored specifically for the Text-to- exploration task.<n>We demonstrate that RL-only training with our proposed rewards consistently achieves higher accuracy and superior generalization compared to supervised fine-tuning.
arXiv Detail & Related papers (2025-03-29T17:29:30Z) - Reliable Text-to-SQL with Adaptive Abstention [21.07332675929629]
We present a novel framework that enhances query generation reliability by incorporating abstention and human-in-the-loop mechanisms.<n>We validate our approach through comprehensive experiments on the BIRD benchmark, demonstrating significant improvements in robustness and reliability.
arXiv Detail & Related papers (2025-01-18T19:36:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.