Query and Conquer: Execution-Guided SQL Generation
- URL: http://arxiv.org/abs/2503.24364v1
- Date: Mon, 31 Mar 2025 17:43:36 GMT
- Title: Query and Conquer: Execution-Guided SQL Generation
- Authors: Ćukasz Borchmann, Marek Wydmuch,
- Abstract summary: We propose a novel approach for generating complex outputs that significantly improves accuracy in text-to- tasks.<n>Our method leverages execution results to select the most semantically consistent query from multiple candidates.
- Score: 2.07180164747172
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a novel approach for generating complex outputs that significantly improves accuracy in text-to-SQL tasks. Our method leverages execution results to select the most semantically consistent query from multiple candidates, enabling smaller, cost-effective models to surpass computationally intensive reasoning methods such as o1, o3-mini, and DeepSeek R1 while reducing inference cost by as much as 30 times. It integrates effortlessly with existing models, offering a practical and scalable pathway to state-of-the-art SQL generation.
Related papers
- XiYan-SQL: A Novel Multi-Generator Framework For Text-to-SQL [48.45491386478092]
We present XiYan-, an innovative framework effectively generating and utilizing multiplesql candidates.<n>Overall, XiYan- achieves a new SOTA performance of 75.63% on the notable BIRD benchmark.<n>It also attains SOTA performance on the Spider test set with an accuracy of 89.65%.
arXiv Detail & Related papers (2025-07-07T06:50:46Z) - HI-SQL: Optimizing Text-to-SQL Systems through Dynamic Hint Integration [1.3927943269211591]
Text-to-generation bridges the gap between natural language and databases, enabling users to query data without requiringsql expertise.<n>We propose HI-the, a pipeline that incorporates a novel hint generation mechanism utilizing historical query logs.<n>By analyzing prior queries, our method generates contextual hints that focus on handling the complexities of multi-table and nested operations.<n>Our approach significantly improves query accuracy of LLM-generated queries while ensuring efficiency in terms of calls and latency.
arXiv Detail & Related papers (2025-06-11T12:07:55Z) - SchemaGraphSQL: Efficient Schema Linking with Pathfinding Graph Algorithms for Text-to-SQL on Large-Scale Databases [1.6544167074080365]
We present a zero-shot, training-free schema linking approach that first constructs a schema graph based on foreign key relations.<n>We apply classical path-finding algorithms and post-processing to identify the optimal sequence of tables and columns that should be joined.<n>Our method achieves state-of-the-art results on the BIRD benchmark, outperforming previous specialized, fine-tuned, and complex multi-step LLM-based approaches.
arXiv Detail & Related papers (2025-05-23T20:42:36Z) - STaR-SQL: Self-Taught Reasoner for Text-to-SQL [20.719165038519744]
"chain-of-thought" rationales have proven effective for improving the performance of large language models on complex reasoning tasks.
Applying such techniques to structured tasks, such as text-to-driven, remains largely unexplored.
In this paper, we introduce Self-Taughter for text-to-driven (STaR-), a novel approach that reframes query generation as a reasoning process.
Experimental results on the challenging Spider benchmark demonstrate that STaR- significantly improves text-to-performance, achieving an execution accuracy of 86.6%.
These findings underscore the potential of reasoning-augmented training for
arXiv Detail & Related papers (2025-02-19T08:58:44Z) - Rationalization Models for Text-to-SQL [13.792561265515003]
We introduce a framework for generating Chain-of-Thought (CoT) rationales to enhance text-to-thought model fine-tuning.<n>The process begins with manually annotating a small set of examples, which are then used to prompt a large language model.<n>A rationalization model is subsequently trained on the validated queries, enabling extensive synthetic CoT annotations.
arXiv Detail & Related papers (2025-02-10T18:38:57Z) - MCTS-SQL: Light-Weight LLMs can Master the Text-to-SQL through Monte Carlo Tree Search [1.166711394125328]
Text-to-OTA is a fundamental yet challenging task in the NLP area.<n>We propose MCTS-OTA, a novel framework that uses Monte Carlo Tree Search.<n>We propose a token-level prefixcache mechanism that stores prior information during iterations.
arXiv Detail & Related papers (2025-01-28T00:52:23Z) - Leveraging Foundation Language Models (FLMs) for Automated Cohort Extraction from Large EHR Databases [50.552056536968166]
We propose and evaluate an algorithm for automating column matching on two large, popular and publicly-accessible EHR databases.
Our approach achieves a high top-three accuracy of $92%$, correctly matching $12$ out of the $13$ columns of interest, when using a small, pre-trained general purpose language model.
arXiv Detail & Related papers (2024-12-16T06:19:35Z) - CHASE-SQL: Multi-Path Reasoning and Preference Optimized Candidate Selection in Text-to-SQL [9.47170756607886]
CHASE- is a new framework that employs innovative strategies, using test-time compute in multi-agent modeling to improve candidate generation and selection.
To identify the best candidate, a selection agent is employed to rank the candidates through pairwise comparisons with a fine-tuned binary-candidates selection LLM.
Overall, our proposed CHASE- achieves the state-of-the-art execution accuracy of 73.0% and 73.01% on the test set and development set of the notable BIRD Text-to- dataset benchmark.
arXiv Detail & Related papers (2024-10-02T18:41:35Z) - CHESS: Contextual Harnessing for Efficient SQL Synthesis [1.9506402593665235]
We introduce CHESS, a framework for efficient and scalable text-to- queries.
It comprises four specialized agents, each targeting one of the aforementioned challenges.
Our framework offers features that adapt to various deployment constraints.
arXiv Detail & Related papers (2024-05-27T01:54:16Z) - MCS-SQL: Leveraging Multiple Prompts and Multiple-Choice Selection For Text-to-SQL Generation [10.726734105960924]
Large language models (LLMs) have enabled in-context learning (ICL)-based methods that significantly outperform fine-tuning approaches for text-to- tasks.
This study considers the sensitivity of LLMs to the prompts and introduces a novel approach that leverages multiple prompts to explore a broader search space for possible answers.
We establish a new SOTA performance on the BIRD in terms of both the accuracy and efficiency of the generated queries.
arXiv Detail & Related papers (2024-05-13T04:59:32Z) - SQLPrompt: In-Context Text-to-SQL with Minimal Labeled Data [54.69489315952524]
"Prompt" is designed to improve the few-shot prompting capabilities of Text-to-LLMs.
"Prompt" outperforms previous approaches for in-context learning with few labeled data by a large margin.
We show that emphPrompt outperforms previous approaches for in-context learning with few labeled data by a large margin.
arXiv Detail & Related papers (2023-11-06T05:24:06Z) - JoinGym: An Efficient Query Optimization Environment for Reinforcement
Learning [58.71541261221863]
Join order selection (JOS) is the problem of ordering join operations to minimize total query execution cost.
We present JoinGym, a query optimization environment for bushy reinforcement learning (RL)
Under the hood, JoinGym simulates a query plan's cost by looking up intermediate result cardinalities from a pre-computed dataset.
arXiv Detail & Related papers (2023-07-21T17:00:06Z) - SQL-PaLM: Improved Large Language Model Adaptation for Text-to-SQL (extended) [53.95151604061761]
This paper introduces the framework for enhancing Text-to- filtering using large language models (LLMs)
With few-shot prompting, we explore the effectiveness of consistency decoding with execution-based error analyses.
With instruction fine-tuning, we delve deep in understanding the critical paradigms that influence the performance of tuned LLMs.
arXiv Detail & Related papers (2023-05-26T21:39:05Z) - Wav2SQL: Direct Generalizable Speech-To-SQL Parsing [55.10009651476589]
Speech-to-Spider (S2Spider) aims to convert spoken questions intosql queries given databases.
We propose the first direct speech-to-speaker parsing model Wav2 which avoids error compounding across cascaded systems.
Experimental results demonstrate that Wav2 avoids error compounding and achieves state-of-the-art results by up to 2.5% accuracy improvement over the baseline.
arXiv Detail & Related papers (2023-05-21T19:26:46Z) - Proton: Probing Schema Linking Information from Pre-trained Language
Models for Text-to-SQL Parsing [66.55478402233399]
We propose a framework to elicit relational structures via a probing procedure based on Poincar'e distance metric.
Compared with commonly-used rule-based methods for schema linking, we found that probing relations can robustly capture semantic correspondences.
Our framework sets new state-of-the-art performance on three benchmarks.
arXiv Detail & Related papers (2022-06-28T14:05:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.