Arctic-Text2SQL-R1: Simple Rewards, Strong Reasoning in Text-to-SQL
- URL: http://arxiv.org/abs/2505.20315v1
- Date: Thu, 22 May 2025 23:33:47 GMT
- Title: Arctic-Text2SQL-R1: Simple Rewards, Strong Reasoning in Text-to-SQL
- Authors: Zhewei Yao, Guoheng Sun, Lukasz Borchmann, Zheyu Shen, Minghang Deng, Bohan Zhai, Hao Zhang, Ang Li, Yuxiong He,
- Abstract summary: We present Arctic-Text2-R1, a reinforcement learning (RL) framework and model family designed to generate accurate, executablesql.<n>Our approach avoids curated intermediate supervision and complex reward shaping, promoting stable training and alignment with the end task.<n> Notably, our 7B model outperforms prior 70B-class systems, highlighting the framework's scalability and efficiency.
- Score: 35.21185734929167
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Translating natural language into SQL (Test2SQL) is a longstanding challenge at the intersection of natural language understanding and structured data access. While large language models (LLMs) have significantly improved fluency in SQL generation, producing correct and executable SQL--particularly for complex queries--remains a bottleneck. We present Arctic-Text2SQL-R1, a reinforcement learning (RL) framework and model family designed to generate accurate, executable SQL using a lightweight reward signal based solely on execution correctness. Our approach avoids brittle intermediate supervision and complex reward shaping, promoting stable training and alignment with the end task. Combined with carefully curated data, strong supervised initialization, and effective training practices, Arctic-Text2SQL-R1 achieves state-of-the-art execution accuracy across six diverse Test2SQL benchmarks, including the top position on the BIRD leaderboard. Notably, our 7B model outperforms prior 70B-class systems, highlighting the framework's scalability and efficiency. We further demonstrate inference-time robustness through simple extensions like value retrieval and majority voting. Extensive experiments and ablation studies offer both positive and negative insights, providing practical guidance for future Test2SQL research.
Related papers
- CogniSQL-R1-Zero: Lightweight Reinforced Reasoning for Efficient SQL Generation [1.169202600932732]
We introduce Cogni-R1-Zero, a reinforcement learning (RL) framework and model.<n>We use a lightweight reward signal based on execution correctness and format-tag compliance.<n>Our method achieves state-of-the-art execution accuracy on Text2 benchmark.<n>To support further research in efficient and interpretable Text-to-code modeling, we release two curated datasets.
arXiv Detail & Related papers (2025-07-08T14:17:07Z) - Graph-Reward-SQL: Execution-Free Reinforcement Learning for Text-to-SQL via Graph Matching and Stepwise Reward [12.196626575891546]
Reinforcement learning (RL) has been widely adopted to enhance the performance of large language models (LLMs) on Text-to- tasks.<n>Existing methods often rely on execution-based or LLM-based Bradley-Terry reward models.<n>We propose a novel Text-to- RL fine-tuning framework named Graph-Reward-Reward, which employs the GMNScore outcome reward model.
arXiv Detail & Related papers (2025-05-18T11:53:01Z) - Sparks of Tabular Reasoning via Text2SQL Reinforcement Learning [0.12289361708127876]
This work reframes the Text-to-the-task as a pathway for teaching large language models (LLMs) to reason over and manipulate data.<n>We propose a two-stage framework that teaches a model how to traverse, filter, and aggregate table fields.<n> Empirically, our approach achieves substantial gains on reasoning-intensive datasets such as BIRD and CRT-QA.
arXiv Detail & Related papers (2025-04-23T19:02:04Z) - MCTS-SQL: Light-Weight LLMs can Master the Text-to-SQL through Monte Carlo Tree Search [1.166711394125328]
Text-to-OTA is a fundamental yet challenging task in the NLP area.<n>We propose MCTS-OTA, a novel framework that uses Monte Carlo Tree Search.<n>We propose a token-level prefixcache mechanism that stores prior information during iterations.
arXiv Detail & Related papers (2025-01-28T00:52:23Z) - Learning from Imperfect Data: Towards Efficient Knowledge Distillation of Autoregressive Language Models for Text-to-SQL [83.99974309930072]
Knowledge distillation (KD) is a common approach, which aims to distill the larger teacher model into a smaller student model.
We propose to improve the KD with Imperfect Data, namely KID, which effectively boosts the performance without introducing much training budget.
KID can not only achieve consistent and significant performance gains across all model types and sizes, but also effectively improve the training efficiency.
arXiv Detail & Related papers (2024-10-15T07:51:00Z) - Enhancing LLM Fine-tuning for Text-to-SQLs by SQL Quality Measurement [1.392448435105643]
Text-to-s enables non-expert users to effortlessly retrieve desired information from databases using natural language queries.
Current state-of-the-art (SOTA) models like GPT4 and T5 have shown impressive performance on large-scale benchmarks like BIRD.
This paper proposed a novel approach that only needs SQL Quality to enhance Text-to-s performance.
arXiv Detail & Related papers (2024-10-02T17:21:51Z) - E-SQL: Direct Schema Linking via Question Enrichment in Text-to-SQL [1.187832944550453]
We introduce E-Seek, a novel pipeline specifically designed to address these challenges through direct schema linking and candidate predicate augmentation.<n>E-Seek enhances the natural language query by incorporating relevant database items (i.e., tables, columns, and values) and conditions directly into the question andsql construction plan, bridging the gap between the query and the database structure.<n> Comprehensive evaluations illustrate that E-Seek achieves competitive performance, particularly excelling in complex queries with a 66.29% execution accuracy on the test set.
arXiv Detail & Related papers (2024-09-25T09:02:48Z) - SQL-PaLM: Improved Large Language Model Adaptation for Text-to-SQL (extended) [53.95151604061761]
This paper introduces the framework for enhancing Text-to- filtering using large language models (LLMs)
With few-shot prompting, we explore the effectiveness of consistency decoding with execution-based error analyses.
With instruction fine-tuning, we delve deep in understanding the critical paradigms that influence the performance of tuned LLMs.
arXiv Detail & Related papers (2023-05-26T21:39:05Z) - UNITE: A Unified Benchmark for Text-to-SQL Evaluation [72.72040379293718]
We introduce a UNIfied benchmark for Text-to-domain systems.
It is composed of publicly available text-to-domain datasets and 29K databases.
Compared to the widely used Spider benchmark, we introduce a threefold increase in SQL patterns.
arXiv Detail & Related papers (2023-05-25T17:19:52Z) - Wav2SQL: Direct Generalizable Speech-To-SQL Parsing [55.10009651476589]
Speech-to-Spider (S2Spider) aims to convert spoken questions intosql queries given databases.
We propose the first direct speech-to-speaker parsing model Wav2 which avoids error compounding across cascaded systems.
Experimental results demonstrate that Wav2 avoids error compounding and achieves state-of-the-art results by up to 2.5% accuracy improvement over the baseline.
arXiv Detail & Related papers (2023-05-21T19:26:46Z) - Towards Generalizable and Robust Text-to-SQL Parsing [77.18724939989647]
We propose a novel TKK framework consisting of Task decomposition, Knowledge acquisition, and Knowledge composition to learn text-to- parsing in stages.
We show that our framework is effective in all scenarios and state-of-the-art performance on the Spider, SParC, and Co. datasets.
arXiv Detail & Related papers (2022-10-23T09:21:27Z) - Weakly Supervised Text-to-SQL Parsing through Question Decomposition [53.22128541030441]
We take advantage of the recently proposed question meaning representation called QDMR.
Given questions, their QDMR structures (annotated by non-experts or automatically predicted) and the answers, we are able to automatically synthesizesql queries.
Our results show that the weakly supervised models perform competitively with those trained on NL- benchmark data.
arXiv Detail & Related papers (2021-12-12T20:02:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.