Related papers: MTIR-SQL: Multi-turn Tool-Integrated Reasoning Reinforcement Learning for Text-to-SQL

MTIR-SQL: Multi-turn Tool-Integrated Reasoning Reinforcement Learning for Text-to-SQL

URL: http://arxiv.org/abs/2510.25510v1
Date: Wed, 29 Oct 2025 13:34:27 GMT
Title: MTIR-SQL: Multi-turn Tool-Integrated Reasoning Reinforcement Learning for Text-to-SQL
Authors: Zekun Xu, Siyu Xia, Chuhuai Yue, Jiajun Chai, Mingxue Tian, Xiaohan Wang, Wei Lin, Haoxuan Li, Guojun Yin,
Abstract summary: Large language models (LLMs) are increasingly used in Text-to-aware tasks.<n>Existing methods rely on static execution feedback, which restricts real-time error correction.<n>We propose MTIR-IDER, an innovative Multi-turn Tool-Integrated Reasoning reinforcement learning framework.
Score: 46.37961458768655
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: As large language models (LLMs) are increasingly used in Text-to-SQL tasks, Reinforcement Learning (RL) has become a common method for improving performance. Existing methods primarily rely on static execution feedback, which restricts real-time error correction. However, integrating multi-turn tool invocation along with dynamic feedback could significantly improve adaptability and robustness, ultimately enhancing model performance. To address these issues, we propose MTIR-SQL, an innovative Multi-turn Tool-Integrated Reasoning reinforcement learning framework for Text-to-SQL. Our approach introduces an execution-aware multi-turn reasoning paradigm that seamlessly incorporates database execution feedback at each reasoning step, enabling context-sensitive query generation and progressive refinement throughout the reasoning process. The framework extends the GRPO algorithm to accommodate complex multi-turn interaction scenarios. Considering the training instability characteristics of MTIR and the potential for significant Deviation of model distribution from the initial model, we enhance the GRPO algorithm by adding a trajectory filtering mechanism and removing KL loss constraints. Experimental results demonstrate that MTIR-SQL, with 4B parameters, achieves \textbf{64.4}\% accuracy in the BIRD Dev and 84.6% execution accuracy in the SPIDER Dev, significantly outperforming existing approaches.

Related papers

SQL-Trail: Multi-Turn Reinforcement Learning with Interleaved Feedback for Text-to-SQL [20.49395306069103]
We introduce a multi-turn reinforcement learning (RL) agentic framework for Text-to-one generation.<n>Rather than producing a query in one shot, SQL-Trail interacts with the database environment and uses execution feedback to iteratively refine its predictions.<n>Our approach centers on two key ideas: (i) an adaptive turn-budget allocation mechanism that scales the agent's interaction depth to match question difficulty, and (ii) a composite reward panel that jointly incentivizessql correctness and efficient exploration.
arXiv Detail & Related papers (2026-01-25T05:16:52Z)
Fine-tuned LLM-based Code Migration Framework [0.0]
The study presents the outcomes of research and experimental validation in the domain of automated sampling migration.<n>The proposed method for migration essentially appears as a framework that leverages the best aspects of traditional software engineering techniques.
arXiv Detail & Related papers (2025-12-15T16:42:51Z)
Co-Training Vision Language Models for Remote Sensing Multi-task Learning [68.15604397741753]
Vision language models (VLMs) have achieved promising results in RS image understanding, grounding, and ultra-high-resolution (UHR) image reasoning.<n>We present RSCoVLM, a simple yet flexible VLM baseline for RS MTL.<n>We propose a unified dynamic-resolution strategy to address the diverse image scales inherent in RS imagery.
arXiv Detail & Related papers (2025-11-26T10:55:07Z)
Sample-Efficient Online Learning in LM Agents via Hindsight Trajectory Rewriting [92.57796055887995]
We introduce ECHO, a prompting framework that adapts hindsight experience replay from reinforcement learning for language model agents.<n> ECHO generates optimized trajectories for alternative goals that could have been achieved during failed attempts.<n>We evaluate ECHO on stateful versions of XMiniGrid, a text-based navigation and planning benchmark, and PeopleJoinQA, a collaborative information-gathering enterprise simulation.
arXiv Detail & Related papers (2025-10-11T18:11:09Z)
HES-SQL: Hybrid Reasoning for Efficient Text-to-SQL with Structural Skeleton Guidance [6.653834890554154]
We present HES-, a novel hybrid training framework that advances Text-to-latency generation through the integration of thinking-mode-fused supervised fine-tuning.<n>This framework enables switch between reasoning and non-reasoning modes while improving query accuracy and execution efficiency.
arXiv Detail & Related papers (2025-10-10T01:15:57Z)
VerlTool: Towards Holistic Agentic Reinforcement Learning with Tool Use [78.29315418819074]
We introduce VerlTool, a unified and modular framework that addresses limitations through systematic design principles.<n>Our framework formalizes ARLT as multi-turn trajectories with multi-modal observation tokens (text/image/video), extending beyond single-turn RLVR paradigms.<n>The modular plugin architecture enables rapid tool integration requiring only lightweight Python definitions.
arXiv Detail & Related papers (2025-09-01T01:45:18Z)
Feedback-Driven Tool-Use Improvements in Large Language Models via Automated Build Environments [70.42705564227548]
We propose an automated environment construction pipeline for large language models (LLMs)<n>This enables the creation of high-quality training environments that provide detailed and measurable feedback without relying on external tools.<n>We also introduce a verifiable reward mechanism that evaluates both the precision of tool use and the completeness of task execution.
arXiv Detail & Related papers (2025-08-12T09:45:19Z)
MLE-Dojo: Interactive Environments for Empowering LLM Agents in Machine Learning Engineering [57.156093929365255]
Gym-style framework for systematically reinforcement learning, evaluating, and improving autonomous large language model (LLM) agents.<n>MLE-Dojo covers diverse, open-ended MLE tasks carefully curated to reflect realistic engineering scenarios.<n>Its fully executable environment supports comprehensive agent training via both supervised fine-tuning and reinforcement learning.
arXiv Detail & Related papers (2025-05-12T17:35:43Z)
Reward-SQL: Boosting Text-to-SQL via Stepwise Reasoning and Process-Supervised Rewards [25.810871864483076]
External Process Reward Models (PRMs) can be introduced during training to provide fine-grained supervision.<n>We propose Reward-BIRD, a framework that explores how to incorporate PRMs into the Text-to-the- reasoning process effectively.
arXiv Detail & Related papers (2025-05-07T08:32:22Z)
Reasoning-SQL: Reinforcement Learning with SQL Tailored Partial Rewards for Reasoning-Enhanced Text-to-SQL [13.215512957681185]
Existing approaches often rely on handcrafted reasoning paths with inductive biases that can limit their overall effectiveness.<n>Motivated by the recent success of reasoning-enhanced models such as OpenAI o1, we propose a novel set of partial rewards tailored specifically for the Text-to- exploration task.<n>We demonstrate that RL-only training with our proposed rewards consistently achieves higher accuracy and superior generalization compared to supervised fine-tuning.
arXiv Detail & Related papers (2025-03-29T17:29:30Z)
Reliable Text-to-SQL with Adaptive Abstention [21.07332675929629]
We present a novel framework that enhances query generation reliability by incorporating abstention and human-in-the-loop mechanisms.<n>We validate our approach through comprehensive experiments on the BIRD benchmark, demonstrating significant improvements in robustness and reliability.
arXiv Detail & Related papers (2025-01-18T19:36:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.