Related papers: LitE-SQL: A Lightweight and Efficient Text-to-SQL Framework with Vector-based Schema Linking and Execution-Guided Self-Correction

LitE-SQL: A Lightweight and Efficient Text-to-SQL Framework with Vector-based Schema Linking and Execution-Guided Self-Correction

URL: http://arxiv.org/abs/2510.09014v1
Date: Fri, 10 Oct 2025 05:27:47 GMT
Title: LitE-SQL: A Lightweight and Efficient Text-to-SQL Framework with Vector-based Schema Linking and Execution-Guided Self-Correction
Authors: Shengmin Piao, Jieun Lee, Sanghyun Park,
Abstract summary: We introduce LitE-, a Lightweight and Efficient framework with two components.<n>On BIRD, LitE- achieves 72.10% execution accuracy, and on Spider it reaches 88.45%, demonstrating comparable or superior performance to Retriever.<n>Our findings demonstrate that high-quality Text-to-correction generation is feasible with lightweight models, offering a practical solution for privacy-sensitive and resource-constrained settings.
Score: 5.123751486259634
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The Text-to-SQL task translates natural language questions into SQL queries, enabling intuitive database interaction for non-experts. While recent methods leveraging Large Language Models (LLMs) achieve strong performance, their reliance on proprietary models raise concerns about deployment feasibility and data privacy. In this work, we introduce LitE-SQL, a Lightweight and Efficient framework with two components: (i) a Schema Retriever that performs efficient schema linking using a vector database of pre-computed schema embeddings, and (ii) a SQL Generator fine-tuned in two stages-supervised fine-tuning followed by execution-guided reinforcement-enabling self-correction without costly multi-candidate generation. On BIRD, LitE-SQL achieves 72.10% execution accuracy, and on Spider 1.0 it reaches 88.45%, demonstrating comparable or superior performance to LLM-based methods despite using 2x to 30x fewer parameters. Our findings demonstrate that high-quality Text-to-SQL generation is feasible with lightweight models, offering a practical solution for privacy-sensitive and resource-constrained settings.

Related papers

Text-to-SQL as Dual-State Reasoning: Integrating Adaptive Context and Progressive Generation [54.53145282349042]
We introduce DSR-sourced, a textbfDual-textbfS textbfReasoning framework that models Text-to-context as an interaction between an adaptive context state and a progressive generation state.<n>Without any post-training or in-context examples, DSR-sourced achieves competitive performance, reaching 35.28% execution accuracy on Spider 2.0-Snow and 68.32% on BIRD development set.
arXiv Detail & Related papers (2025-11-26T13:52:50Z)
SING-SQL: A Synthetic Data Generation Framework for In-Domain Text-to-SQL Translation [2.0799061948689306]
SING-a is a fully automated two-stage framework for generating high-quality, high-coverage synthetic Text-to-data.<n>SING-LM is a family of compact language models fine-tuned on the synthetic data.
arXiv Detail & Related papers (2025-09-30T02:14:49Z)
X-SQL: Expert Schema Linking and Understanding of Text-to-SQL with Multi-LLMs [0.0]
We find that database schema information plays a significant or even dominant role in the Text-to-Admin task.<n>We introduce X-Supervised Finetuning (SFT) that achieves superior Linking results compared to existing open-source Text-to-Admin methods.<n>In addition, we propose an X- component that focuses on Understanding by bridging the gap between abstract information and the user's natural language question.
arXiv Detail & Related papers (2025-09-07T02:51:43Z)
XiYan-SQL: A Novel Multi-Generator Framework For Text-to-SQL [48.45491386478092]
We present XiYan-, an innovative framework effectively generating and utilizing multiplesql candidates.<n>Overall, XiYan- achieves a new SOTA performance of 75.63% on the notable BIRD benchmark.<n>It also attains SOTA performance on the Spider test set with an accuracy of 89.65%.
arXiv Detail & Related papers (2025-07-07T06:50:46Z)
OpenSearch-SQL: Enhancing Text-to-SQL with Dynamic Few-shot and Consistency Alignment [6.2089733671434875]
We propose OpenSearch-, which divides the Text-to-agent task into four main modules: Preprocessing, Extraction, Generation, and Refinement, along with an Alignment module based on consistency alignment mechanism.<n>These methods have significantly improved the performance of LLMs in the Text-to-agent task.<n> Experimental results show that OpenSearch- achieves an execution accuracy(EX) of 69.3% on the BIRD development set, 72.28% on the test set, and a reward-based efficiency score (R-VES) of 69.3, with all three metrics ranking first at the time of submission.
arXiv Detail & Related papers (2025-02-19T07:51:50Z)
MCTS-SQL: Light-Weight LLMs can Master the Text-to-SQL through Monte Carlo Tree Search [1.166711394125328]
Text-to-OTA is a fundamental yet challenging task in the NLP area.<n>We propose MCTS-OTA, a novel framework that uses Monte Carlo Tree Search.<n>We propose a token-level prefixcache mechanism that stores prior information during iterations.
arXiv Detail & Related papers (2025-01-28T00:52:23Z)
RSL-SQL: Robust Schema Linking in Text-to-SQL Generation [51.00761167842468]
We propose a novel framework called RSL- that combines bidirectional schema linking, contextual information augmentation, binary selection strategy, and multi-turn self-correction. benchmarks demonstrate that our approach achieves SOTA execution accuracy among open-source solutions, with 67.2% on BIRD and 87.9% on GPT-4ocorrection. Our approach outperforms a series of GPT-4 based Text-to-Seek systems when adopting DeepSeek (much cheaper) with same intact prompts.
arXiv Detail & Related papers (2024-10-31T16:22:26Z)
RB-SQL: A Retrieval-based LLM Framework for Text-to-SQL [48.516004807486745]
Large language models (LLMs) with in-context learning have significantly improved the performance of text-to- task. We propose RB-, a novel retrieval-based framework for in-context prompt engineering. Experiment results demonstrate that our model achieves better performance than several competitive baselines on public datasets BIRD and Spider.
arXiv Detail & Related papers (2024-07-11T08:19:58Z)
CHESS: Contextual Harnessing for Efficient SQL Synthesis [1.9506402593665235]
We introduce CHESS, a framework for efficient and scalable text-to- queries. It comprises four specialized agents, each targeting one of the aforementioned challenges. Our framework offers features that adapt to various deployment constraints.
arXiv Detail & Related papers (2024-05-27T01:54:16Z)
Text-to-SQL Empowered by Large Language Models: A Benchmark Evaluation [76.76046657162306]
Large language models (LLMs) have emerged as a new paradigm for Text-to- task. Large language models (LLMs) have emerged as a new paradigm for Text-to- task.
arXiv Detail & Related papers (2023-08-29T14:59:54Z)
SQL-PaLM: Improved Large Language Model Adaptation for Text-to-SQL (extended) [53.95151604061761]
This paper introduces the framework for enhancing Text-to- filtering using large language models (LLMs) With few-shot prompting, we explore the effectiveness of consistency decoding with execution-based error analyses. With instruction fine-tuning, we delve deep in understanding the critical paradigms that influence the performance of tuned LLMs.
arXiv Detail & Related papers (2023-05-26T21:39:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.