Sparks of Tabular Reasoning via Text2SQL Reinforcement Learning
- URL: http://arxiv.org/abs/2505.00016v2
- Date: Fri, 02 May 2025 11:34:00 GMT
- Title: Sparks of Tabular Reasoning via Text2SQL Reinforcement Learning
- Authors: Josefa Lia Stoisser, Marc Boubnovski Martell, Julien Fauqueur,
- Abstract summary: This work reframes the Text-to-the-task as a pathway for teaching large language models (LLMs) to reason over and manipulate data.<n>We propose a two-stage framework that teaches a model how to traverse, filter, and aggregate table fields.<n> Empirically, our approach achieves substantial gains on reasoning-intensive datasets such as BIRD and CRT-QA.
- Score: 0.12289361708127876
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This work reframes the Text-to-SQL task as a pathway for teaching large language models (LLMs) to reason over and manipulate tabular data--moving beyond the traditional focus on query generation. We propose a two-stage framework that leverages SQL supervision to develop transferable table reasoning capabilities. First, we synthesize detailed chain-of-thought (CoT) traces from real-world SQL queries, providing step-by-step, clause-level supervision that teaches the model how to traverse, filter, and aggregate table fields. Second, we introduce a Group Relative Policy Optimization (GRPO) reinforcement learning objective that connects SQL execution accuracy to generalizable reasoning by encouraging steps that extend beyond task-specific syntax and transfer across datasets. Empirically, our approach improves performance on standard Text-to-SQL benchmarks and achieves substantial gains on reasoning-intensive datasets such as BIRD and CRT-QA, demonstrating enhanced generalization and interpretability. Specifically, the distilled-quantized LLaMA model achieved a relative 33.9\% increase in accuracy when trained on Text-to-SQL tasks, while Qwen achieved a relative 14.5\% increase. These results suggest that SQL can serve not only as a target formalism but also as an effective scaffold for learning robust, transferable reasoning over structured data.
Related papers
- STRuCT-LLM: Unifying Tabular and Graph Reasoning with Reinforcement Learning for Semantic Parsing [2.8977258426533115]
We propose STRuCT-LLM, a unified framework for training large language models (LLMs)<n>Our approach jointly optimize Text-to-aware and Text-to-Cypher tasks using reinforcement learning (RL) combined with Chain-Thought supervision (CoT)
arXiv Detail & Related papers (2025-06-15T22:40:36Z) - Arctic-Text2SQL-R1: Simple Rewards, Strong Reasoning in Text-to-SQL [35.21185734929167]
We present Arctic-Text2-R1, a reinforcement learning (RL) framework and model family designed to generate accurate, executablesql.<n>Our approach avoids curated intermediate supervision and complex reward shaping, promoting stable training and alignment with the end task.<n> Notably, our 7B model outperforms prior 70B-class systems, highlighting the framework's scalability and efficiency.
arXiv Detail & Related papers (2025-05-22T23:33:47Z) - Bridging the Gap: Transforming Natural Language Questions into SQL Queries via Abstract Query Pattern and Contextual Schema Markup [6.249316460506702]
We identify two important gaps: the structural mapping gap and the lexical mapping gap.<n> PAS-related achieves an execution accuracy of 87.9%, and leading results on the BIRD dataset with an execution accuracy of 64.67%.<n>Results on the Spider benchmark set a new state-of-the-art on the Spider benchmark with an execution accuracy of 87.9%, and leading results on the BIRD dataset with an execution accuracy of 64.67%.
arXiv Detail & Related papers (2025-02-20T16:11:27Z) - STaR-SQL: Self-Taught Reasoner for Text-to-SQL [20.719165038519744]
"chain-of-thought" rationales have proven effective for improving the performance of large language models on complex reasoning tasks.<n>Applying such techniques to structured tasks, such as text-to-driven, remains largely unexplored.<n>In this paper, we introduce Self-Taughter for text-to-driven (STaR-), a novel approach that reframes query generation as a reasoning process.<n> Experimental results on the challenging Spider benchmark demonstrate that STaR- significantly improves text-to-performance, achieving an execution accuracy of 86.6%.<n>These findings underscore the potential of reasoning-augmented training for
arXiv Detail & Related papers (2025-02-19T08:58:44Z) - OpenSearch-SQL: Enhancing Text-to-SQL with Dynamic Few-shot and Consistency Alignment [6.2089733671434875]
We propose OpenSearch-, which divides the Text-to-agent task into four main modules: Preprocessing, Extraction, Generation, and Refinement, along with an Alignment module based on consistency alignment mechanism.<n>These methods have significantly improved the performance of LLMs in the Text-to-agent task.<n> Experimental results show that OpenSearch- achieves an execution accuracy(EX) of 69.3% on the BIRD development set, 72.28% on the test set, and a reward-based efficiency score (R-VES) of 69.3, with all three metrics ranking first at the time of submission.
arXiv Detail & Related papers (2025-02-19T07:51:50Z) - Enhancing LLM Fine-tuning for Text-to-SQLs by SQL Quality Measurement [1.392448435105643]
Text-to-s enables non-expert users to effortlessly retrieve desired information from databases using natural language queries.
Current state-of-the-art (SOTA) models like GPT4 and T5 have shown impressive performance on large-scale benchmarks like BIRD.
This paper proposed a novel approach that only needs SQL Quality to enhance Text-to-s performance.
arXiv Detail & Related papers (2024-10-02T17:21:51Z) - PTD-SQL: Partitioning and Targeted Drilling with LLMs in Text-to-SQL [54.304872649870575]
Large Language Models (LLMs) have emerged as powerful tools for Text-to-sense tasks.
In this study, we propose that employing query group partitioning allows LLMs to focus on learning the thought processes specific to a single problem type.
arXiv Detail & Related papers (2024-09-21T09:33:14Z) - RB-SQL: A Retrieval-based LLM Framework for Text-to-SQL [48.516004807486745]
Large language models (LLMs) with in-context learning have significantly improved the performance of text-to- task.
We propose RB-, a novel retrieval-based framework for in-context prompt engineering.
Experiment results demonstrate that our model achieves better performance than several competitive baselines on public datasets BIRD and Spider.
arXiv Detail & Related papers (2024-07-11T08:19:58Z) - SQL-PaLM: Improved Large Language Model Adaptation for Text-to-SQL (extended) [53.95151604061761]
This paper introduces the framework for enhancing Text-to- filtering using large language models (LLMs)
With few-shot prompting, we explore the effectiveness of consistency decoding with execution-based error analyses.
With instruction fine-tuning, we delve deep in understanding the critical paradigms that influence the performance of tuned LLMs.
arXiv Detail & Related papers (2023-05-26T21:39:05Z) - STAR: SQL Guided Pre-Training for Context-dependent Text-to-SQL Parsing [64.80483736666123]
We propose a novel pre-training framework STAR for context-dependent text-to- parsing.
In addition, we construct a large-scale context-dependent text-to-the-art conversation corpus to pre-train STAR.
Extensive experiments show that STAR achieves new state-of-the-art performance on two downstream benchmarks.
arXiv Detail & Related papers (2022-10-21T11:30:07Z) - Proton: Probing Schema Linking Information from Pre-trained Language
Models for Text-to-SQL Parsing [66.55478402233399]
We propose a framework to elicit relational structures via a probing procedure based on Poincar'e distance metric.
Compared with commonly-used rule-based methods for schema linking, we found that probing relations can robustly capture semantic correspondences.
Our framework sets new state-of-the-art performance on three benchmarks.
arXiv Detail & Related papers (2022-06-28T14:05:25Z) - Weakly Supervised Text-to-SQL Parsing through Question Decomposition [53.22128541030441]
We take advantage of the recently proposed question meaning representation called QDMR.
Given questions, their QDMR structures (annotated by non-experts or automatically predicted) and the answers, we are able to automatically synthesizesql queries.
Our results show that the weakly supervised models perform competitively with those trained on NL- benchmark data.
arXiv Detail & Related papers (2021-12-12T20:02:42Z) - GraPPa: Grammar-Augmented Pre-Training for Table Semantic Parsing [117.98107557103877]
We present GraPPa, an effective pre-training approach for table semantic parsing.
We construct synthetic question-pairs over high-free tables via a synchronous context-free grammar.
To maintain the model's ability to represent real-world data, we also include masked language modeling.
arXiv Detail & Related papers (2020-09-29T08:17:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.