Think2SQL: Reinforce LLM Reasoning Capabilities for Text2SQL
- URL: http://arxiv.org/abs/2504.15077v2
- Date: Sun, 27 Apr 2025 14:25:09 GMT
- Title: Think2SQL: Reinforce LLM Reasoning Capabilities for Text2SQL
- Authors: Simone Papicchio, Simone Rossi, Luca Cagliero, Paolo Papotti,
- Abstract summary: This paper investigates the influence of reasoning on Text2 performance on four benchmark datasets.<n>It considers the following ZSL settings: general-purpose reasoning or not; (2) SFT, with and without task-specific reasoning traces; (3) RL, exploring the use of different rewarding functions.<n>The results show that general-purpose reasoning under ZSL proves to be ineffective in tackling complex Text2 cases.
- Score: 16.02851357789021
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Large Language Models (LLMs) have shown impressive capabilities in transforming natural language questions about relational databases into SQL queries. Despite recent improvements, small LLMs struggle to handle questions involving multiple tables and complex SQL patterns under a Zero-Shot Learning (ZSL) setting. Supervised Fine-Tuning (SFT) partially compensates for the knowledge deficits in pretrained models but falls short while dealing with queries involving multi-hop reasoning. To bridge this gap, different LLM training strategies to reinforce reasoning capabilities have been proposed, ranging from leveraging a thinking process within ZSL, including reasoning traces in SFT, or adopt Reinforcement Learning (RL) strategies. However, the influence of reasoning on Text2SQL performance is still largely unexplored. This paper investigates to what extent LLM reasoning capabilities influence their Text2SQL performance on four benchmark datasets. To this end, it considers the following LLM settings: (1) ZSL, including general-purpose reasoning or not; (2) SFT, with and without task-specific reasoning traces; (3) RL, exploring the use of different rewarding functions, both the established EXecution accuracy (EX) and a mix with fine-grained ones that also account the precision, recall, and cardinality of partially correct answers; (4) SFT+RL, i.e, a two-stage approach that combines SFT and RL. The results show that general-purpose reasoning under ZSL proves to be ineffective in tackling complex Text2SQL cases. Small LLMs benefit from SFT with reasoning much more than larger ones. RL is generally beneficial across all tested models and datasets. The use of the fine-grained metrics turns out to be the most effective RL strategy. Thanks to RL and the novel text2SQL rewards, the 7B Qwen-Coder-2.5 model performs on par with 400+ Billion ones (including gpt-4o) on the Bird dataset.
Related papers
- PaVeRL-SQL: Text-to-SQL via Partial-Match Rewards and Verbal Reinforcement Learning [10.353862232815844]
We present emphPaVeRL--------, a framework that combines emphPartial-Match Rewards and emphVerbal Reinforcement Learning to drive self-evaluation in reasoning language models (RLMs) for Text-to-context.<n> pipelines achieve state-of-the-art (SOTA) results on popular Text-to- benchmarks -- Spider, Spider 2.0, and BIRD.
arXiv Detail & Related papers (2025-09-08T19:15:38Z) - CogniSQL-R1-Zero: Lightweight Reinforced Reasoning for Efficient SQL Generation [1.169202600932732]
We introduce Cogni-R1-Zero, a reinforcement learning (RL) framework and model.<n>We use a lightweight reward signal based on execution correctness and format-tag compliance.<n>Our method achieves state-of-the-art execution accuracy on Text2 benchmark.<n>To support further research in efficient and interpretable Text-to-code modeling, we release two curated datasets.
arXiv Detail & Related papers (2025-07-08T14:17:07Z) - Unleashing the Reasoning Potential of Pre-trained LLMs by Critique Fine-Tuning on One Problem [53.3188041952701]
We show that Critique Fine-Tuning (CFT) on only one problem can effectively unleash the reasoning potential of LLMs.<n>With just 5 GPU hours of training, Qwen-Math-7B-CFT show an average improvement of 15% on six math benchmarks and 16% on three logic reasoning benchmarks.<n>Results are comparable to or even surpass the results from RL with 20x less compute.
arXiv Detail & Related papers (2025-06-03T18:35:52Z) - Learning Together to Perform Better: Teaching Small-Scale LLMs to Collaborate via Preferential Rationale Tuning [20.784944581469205]
COLLATE is a framework that tunes a (small) LLM to generate outputs from a pool of diverse rationales that selectively improves the downstream task.<n>We show the eff icacy of COLLATE on LLMs from different model families across varying parameter scales (1B to 8B) and demonstrate the benefit of multiple rationale providers guided by the end task through ablations.
arXiv Detail & Related papers (2025-06-03T06:50:08Z) - SHARE: An SLM-based Hierarchical Action CorREction Assistant for Text-to-SQL [18.493226915913638]
We propose SHARE, an SLM-based Hierarchical Action corREction assistant for text-to-correction.<n> SHARE orchestrates three specialized Small Language Models (SLMs) in a sequential pipeline.<n> Experimental results demonstrate that SHARE effectively enhances self-correction capabilities while proving robust across various LLMs.
arXiv Detail & Related papers (2025-05-31T04:51:12Z) - OpenVLThinker: Complex Vision-Language Reasoning via Iterative SFT-RL Cycles [91.88062410741833]
We introduce OpenVLThinker, one of the first open-source large vision-language models (LVLMs) to exhibit sophisticated chain-of-thought reasoning.<n>We show that OpenVLThinker-7B consistently advances performance across six benchmarks demanding mathematical and general reasoning.
arXiv Detail & Related papers (2025-03-21T17:52:43Z) - Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning [50.419872452397684]
Search-R1 is an extension of reinforcement learning for reasoning frameworks.<n>It generates search queries during step-by-step reasoning with real-time retrieval.<n>It improves performance by 41% (Qwen2.5-7B) and 20% (Qwen2.5-3B) over various RAG baselines.
arXiv Detail & Related papers (2025-03-12T16:26:39Z) - MCTS-SQL: Light-Weight LLMs can Master the Text-to-SQL through Monte Carlo Tree Search [1.166711394125328]
Text-to-OTA is a fundamental yet challenging task in the NLP area.<n>We propose MCTS-OTA, a novel framework that uses Monte Carlo Tree Search.<n>We propose a token-level prefixcache mechanism that stores prior information during iterations.
arXiv Detail & Related papers (2025-01-28T00:52:23Z) - Semantic Captioning: Benchmark Dataset and Graph-Aware Few-Shot In-Context Learning for SQL2Text [3.4688186440441893]
Large Language Models (LLMs) have demonstrated remarkable performance in various NLP tasks.<n>The reverse process, translating code into natural language, termed semantic captioning, has received less attention.<n>In this paper, we focus on the captioning ofsql query (2Text) to address the critical need for understanding and explaining queries.
arXiv Detail & Related papers (2025-01-06T17:36:09Z) - Exploring the Use of LLMs for SQL Equivalence Checking [15.42143912008553]
Equivalence checking of twosql queries is an intractable problem.<n>Existing methods can handle only a small subset ofsql, even for bounded equivalence checking.<n>This paper explores whether large language models (LLMs) can also demonstrate the ability to reason withsql queries.
arXiv Detail & Related papers (2024-12-07T06:50:12Z) - Grounding by Trying: LLMs with Reinforcement Learning-Enhanced Retrieval [55.63711219190506]
Large language models (LLMs) often struggle with posing the right search queries.
We introduce $underlineLe$arning to $underlineRe$trieve by $underlineT$rying (LeReT)
LeReT can improve the absolute retrieval accuracy by up to 29% and the downstream generator evaluations by 17%.
arXiv Detail & Related papers (2024-10-30T17:02:54Z) - PTD-SQL: Partitioning and Targeted Drilling with LLMs in Text-to-SQL [54.304872649870575]
Large Language Models (LLMs) have emerged as powerful tools for Text-to-sense tasks.
In this study, we propose that employing query group partitioning allows LLMs to focus on learning the thought processes specific to a single problem type.
arXiv Detail & Related papers (2024-09-21T09:33:14Z) - Relational Database Augmented Large Language Model [59.38841050766026]
Large language models (LLMs) excel in many natural language processing (NLP) tasks.
They can only incorporate new knowledge through training or supervised fine-tuning processes.
This precise, up-to-date, and private information is typically stored in relational databases.
arXiv Detail & Related papers (2024-07-21T06:19:10Z) - RB-SQL: A Retrieval-based LLM Framework for Text-to-SQL [48.516004807486745]
Large language models (LLMs) with in-context learning have significantly improved the performance of text-to- task.
We propose RB-, a novel retrieval-based framework for in-context prompt engineering.
Experiment results demonstrate that our model achieves better performance than several competitive baselines on public datasets BIRD and Spider.
arXiv Detail & Related papers (2024-07-11T08:19:58Z) - Lucy: Think and Reason to Solve Text-to-SQL [12.52968634440807]
Large Language Models (LLMs) have made significant progress in assisting users to query databases in natural language.
LLMs provide state-of-the-art results on many standard benchmarks, but their performance significantly drops when applied to large enterprise databases.
We propose a new solution that combines the power of LLMs in understanding questions with automated reasoning techniques to handle complex database constraints.
arXiv Detail & Related papers (2024-07-06T18:56:42Z) - Before Generation, Align it! A Novel and Effective Strategy for Mitigating Hallucinations in Text-to-SQL Generation [21.58204328067628]
Large Language Models (LLMs) driven by In-Context Learning (ICL) have significantly improved the performance of text-to-.
Previous methods generally employ a two-stage reasoning framework, namely 1) linking schema and 2) logical synthesis, making the framework not only effective but also interpretable.
Despite these advancements, the inherent bad nature of the generalization of LLMs often results in hallucinations, which limits the full potential of LLMs.
In this work, we first identify and categorize the common types of hallucinations at each stage in text-to-.
We then introduce a novel strategy, Task Alignment (TA),
arXiv Detail & Related papers (2024-05-24T07:51:08Z) - PURPLE: Making a Large Language Model a Better SQL Writer [14.627323505405327]
We propose PURPLE, which improves accuracy by retrieving demonstrations containing the requisite logical operator composition for the NL2 task.
PURPLE achieves a new state-of-the-art performance of 80.5% exact-set match accuracy and 87.8% execution match accuracy on the validation set of the popular NL2 benchmark.
arXiv Detail & Related papers (2024-03-29T07:01:29Z) - Optimizing LLM Queries in Relational Data Analytics Workloads [50.95919232839785]
Batch data analytics is a growing application for Large Language Models (LLMs)<n>LLMs enable users to perform a wide range of natural language tasks, such as classification, entity extraction, and translation, over large datasets.<n>We propose novel techniques that can significantly reduce the cost of LLM calls for relational data analytics workloads.
arXiv Detail & Related papers (2024-03-09T07:01:44Z) - TAP4LLM: Table Provider on Sampling, Augmenting, and Packing Semi-structured Data for Large Language Model Reasoning [55.33939289989238]
We propose TAP4LLM as a versatile pre-processor suite for leveraging large language models (LLMs) in table-based tasks effectively.
It covers several distinct components: (1) table sampling to decompose large tables into manageable sub-tables based on query semantics, (2) table augmentation to enhance tables with additional knowledge from external sources or models, and (3) table packing & serialization to convert tables into various formats suitable for LLMs' understanding.
arXiv Detail & Related papers (2023-12-14T15:37:04Z) - Text-to-SQL Empowered by Large Language Models: A Benchmark Evaluation [76.76046657162306]
Large language models (LLMs) have emerged as a new paradigm for Text-to- task.
Large language models (LLMs) have emerged as a new paradigm for Text-to- task.
arXiv Detail & Related papers (2023-08-29T14:59:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.