Related papers: Query and Conquer: Execution-Guided SQL Generation

Related papers

XiYan-SQL: A Novel Multi-Generator Framework For Text-to-SQL [48.45491386478092]
We present XiYan-, an innovative framework effectively generating and utilizing multiplesql candidates.<n>Overall, XiYan- achieves a new SOTA performance of 75.63% on the notable BIRD benchmark.<n>It also attains SOTA performance on the Spider test set with an accuracy of 89.65%.
arXiv Detail & Related papers (2025-07-07T06:50:46Z)
HI-SQL: Optimizing Text-to-SQL Systems through Dynamic Hint Integration [1.3927943269211591]
Text-to-generation bridges the gap between natural language and databases, enabling users to query data without requiringsql expertise.<n>We propose HI-the, a pipeline that incorporates a novel hint generation mechanism utilizing historical query logs.<n>By analyzing prior queries, our method generates contextual hints that focus on handling the complexities of multi-table and nested operations.<n>Our approach significantly improves query accuracy of LLM-generated queries while ensuring efficiency in terms of calls and latency.
arXiv Detail & Related papers (2025-06-11T12:07:55Z)
SchemaGraphSQL: Efficient Schema Linking with Pathfinding Graph Algorithms for Text-to-SQL on Large-Scale Databases [1.6544167074080365]
We present a zero-shot, training-free schema linking approach that first constructs a schema graph based on foreign key relations.<n>We apply classical path-finding algorithms and post-processing to identify the optimal sequence of tables and columns that should be joined.<n>Our method achieves state-of-the-art results on the BIRD benchmark, outperforming previous specialized, fine-tuned, and complex multi-step LLM-based approaches.
arXiv Detail & Related papers (2025-05-23T20:42:36Z)
STaR-SQL: Self-Taught Reasoner for Text-to-SQL [20.719165038519744]
"chain-of-thought" rationales have proven effective for improving the performance of large language models on complex reasoning tasks. Applying such techniques to structured tasks, such as text-to-driven, remains largely unexplored. In this paper, we introduce Self-Taughter for text-to-driven (STaR-), a novel approach that reframes query generation as a reasoning process. Experimental results on the challenging Spider benchmark demonstrate that STaR- significantly improves text-to-performance, achieving an execution accuracy of 86.6%. These findings underscore the potential of reasoning-augmented training for
arXiv Detail & Related papers (2025-02-19T08:58:44Z)
Rationalization Models for Text-to-SQL [13.792561265515003]
We introduce a framework for generating Chain-of-Thought (CoT) rationales to enhance text-to-thought model fine-tuning.<n>The process begins with manually annotating a small set of examples, which are then used to prompt a large language model.<n>A rationalization model is subsequently trained on the validated queries, enabling extensive synthetic CoT annotations.
arXiv Detail & Related papers (2025-02-10T18:38:57Z)
MCTS-SQL: Light-Weight LLMs can Master the Text-to-SQL through Monte Carlo Tree Search [1.166711394125328]
Text-to-OTA is a fundamental yet challenging task in the NLP area.<n>We propose MCTS-OTA, a novel framework that uses Monte Carlo Tree Search.<n>We propose a token-level prefixcache mechanism that stores prior information during iterations.
arXiv Detail & Related papers (2025-01-28T00:52:23Z)
Leveraging Foundation Language Models (FLMs) for Automated Cohort Extraction from Large EHR Databases [50.552056536968166]
We propose and evaluate an algorithm for automating column matching on two large, popular and publicly-accessible EHR databases. Our approach achieves a high top-three accuracy of $92%$, correctly matching $12$ out of the $13$ columns of interest, when using a small, pre-trained general purpose language model.
arXiv Detail & Related papers (2024-12-16T06:19:35Z)
CHASE-SQL: Multi-Path Reasoning and Preference Optimized Candidate Selection in Text-to-SQL [9.47170756607886]
CHASE- is a new framework that employs innovative strategies, using test-time compute in multi-agent modeling to improve candidate generation and selection. To identify the best candidate, a selection agent is employed to rank the candidates through pairwise comparisons with a fine-tuned binary-candidates selection LLM. Overall, our proposed CHASE- achieves the state-of-the-art execution accuracy of 73.0% and 73.01% on the test set and development set of the notable BIRD Text-to- dataset benchmark.
arXiv Detail & Related papers (2024-10-02T18:41:35Z)
CHESS: Contextual Harnessing for Efficient SQL Synthesis [1.9506402593665235]
We introduce CHESS, a framework for efficient and scalable text-to- queries. It comprises four specialized agents, each targeting one of the aforementioned challenges. Our framework offers features that adapt to various deployment constraints.
arXiv Detail & Related papers (2024-05-27T01:54:16Z)
MCS-SQL: Leveraging Multiple Prompts and Multiple-Choice Selection For Text-to-SQL Generation [10.726734105960924]
Large language models (LLMs) have enabled in-context learning (ICL)-based methods that significantly outperform fine-tuning approaches for text-to- tasks. This study considers the sensitivity of LLMs to the prompts and introduces a novel approach that leverages multiple prompts to explore a broader search space for possible answers. We establish a new SOTA performance on the BIRD in terms of both the accuracy and efficiency of the generated queries.
arXiv Detail & Related papers (2024-05-13T04:59:32Z)
SQLPrompt: In-Context Text-to-SQL with Minimal Labeled Data [54.69489315952524]
"Prompt" is designed to improve the few-shot prompting capabilities of Text-to-LLMs. "Prompt" outperforms previous approaches for in-context learning with few labeled data by a large margin. We show that emphPrompt outperforms previous approaches for in-context learning with few labeled data by a large margin.
arXiv Detail & Related papers (2023-11-06T05:24:06Z)
JoinGym: An Efficient Query Optimization Environment for Reinforcement Learning [58.71541261221863]
Join order selection (JOS) is the problem of ordering join operations to minimize total query execution cost. We present JoinGym, a query optimization environment for bushy reinforcement learning (RL) Under the hood, JoinGym simulates a query plan's cost by looking up intermediate result cardinalities from a pre-computed dataset.
arXiv Detail & Related papers (2023-07-21T17:00:06Z)
SQL-PaLM: Improved Large Language Model Adaptation for Text-to-SQL (extended) [53.95151604061761]
This paper introduces the framework for enhancing Text-to- filtering using large language models (LLMs) With few-shot prompting, we explore the effectiveness of consistency decoding with execution-based error analyses. With instruction fine-tuning, we delve deep in understanding the critical paradigms that influence the performance of tuned LLMs.
arXiv Detail & Related papers (2023-05-26T21:39:05Z)
Wav2SQL: Direct Generalizable Speech-To-SQL Parsing [55.10009651476589]
Speech-to-Spider (S2Spider) aims to convert spoken questions intosql queries given databases. We propose the first direct speech-to-speaker parsing model Wav2 which avoids error compounding across cascaded systems. Experimental results demonstrate that Wav2 avoids error compounding and achieves state-of-the-art results by up to 2.5% accuracy improvement over the baseline.
arXiv Detail & Related papers (2023-05-21T19:26:46Z)
Proton: Probing Schema Linking Information from Pre-trained Language Models for Text-to-SQL Parsing [66.55478402233399]
We propose a framework to elicit relational structures via a probing procedure based on Poincar'e distance metric. Compared with commonly-used rule-based methods for schema linking, we found that probing relations can robustly capture semantic correspondences. Our framework sets new state-of-the-art performance on three benchmarks.
arXiv Detail & Related papers (2022-06-28T14:05:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.