Prompt Tuning for Natural Language to SQL with Embedding Fine-Tuning and RAG
- URL: http://arxiv.org/abs/2511.08245v1
- Date: Wed, 12 Nov 2025 01:48:28 GMT
- Title: Prompt Tuning for Natural Language to SQL with Embedding Fine-Tuning and RAG
- Authors: Jisoo Jang, Tien-Cuong Bui, Yunjun Choi, Wen-Syan Li,
- Abstract summary: We propose a novel framework integrating an error correction mechanism that diagnoses error types and identifies their causes.<n>We demonstrate that our framework achieves a significant 12 percent accuracy improvement over existing baselines.
- Score: 1.988259704465628
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper introduces an Error Correction through Prompt Tuning for NL-to-SQL, leveraging the latest advancements in generative pre-training-based LLMs and RAG. Our work addresses the crucial need for efficient and accurate translation of natural language queries into SQL expressions in various settings with the growing use of natural language interfaces. We explore the evolution of NLIDBs from early rule-based systems to advanced neural network-driven approaches. Drawing inspiration from the medical diagnostic process, we propose a novel framework integrating an error correction mechanism that diagnoses error types, identifies their causes, provides fixing instructions, and applies these corrections to SQL queries. This approach is further enriched by embedding fine-tuning and RAG, which harnesses external knowledge bases for improved accuracy and transparency. Through comprehensive experiments, we demonstrate that our framework achieves a significant 12 percent accuracy improvement over existing baselines, highlighting its potential to revolutionize data access and handling in contemporary data-driven environments.
Related papers
- Fine-tuned LLM-based Code Migration Framework [0.0]
The study presents the outcomes of research and experimental validation in the domain of automated sampling migration.<n>The proposed method for migration essentially appears as a framework that leverages the best aspects of traditional software engineering techniques.
arXiv Detail & Related papers (2025-12-15T16:42:51Z) - MTIR-SQL: Multi-turn Tool-Integrated Reasoning Reinforcement Learning for Text-to-SQL [46.37961458768655]
Large language models (LLMs) are increasingly used in Text-to-aware tasks.<n>Existing methods rely on static execution feedback, which restricts real-time error correction.<n>We propose MTIR-IDER, an innovative Multi-turn Tool-Integrated Reasoning reinforcement learning framework.
arXiv Detail & Related papers (2025-10-29T13:34:27Z) - Sample-Efficient Online Learning in LM Agents via Hindsight Trajectory Rewriting [92.57796055887995]
We introduce ECHO, a prompting framework that adapts hindsight experience replay from reinforcement learning for language model agents.<n> ECHO generates optimized trajectories for alternative goals that could have been achieved during failed attempts.<n>We evaluate ECHO on stateful versions of XMiniGrid, a text-based navigation and planning benchmark, and PeopleJoinQA, a collaborative information-gathering enterprise simulation.
arXiv Detail & Related papers (2025-10-11T18:11:09Z) - End-to-End Text-to-SQL with Dataset Selection: Leveraging LLMs for Adaptive Query Generation [6.5390580456423555]
Traditional approaches model text-to- query as a direct translation task.<n>Recent advances in large language models (LLMs) have significantly improved translation accuracy.<n>We propose a three-stage end-to-end text-to-end framework to identify the user's intended database.
arXiv Detail & Related papers (2025-08-08T15:16:36Z) - RAISE: Reasoning Agent for Interactive SQL Exploration [47.77323087050061]
We propose a novel framework that unifies schema linking, query generation, and iterative refinement within a single, end-to-end component.<n>Our method emulates how humans answer questions when working with unfamiliar databases.
arXiv Detail & Related papers (2025-06-02T03:07:08Z) - Failing Forward: Improving Generative Error Correction for ASR with Synthetic Data and Retrieval Augmentation [73.9145653659403]
We show that Generative Error Correction models struggle to generalize beyond the specific types of errors encountered during training.
We propose DARAG, a novel approach designed to improve GEC for ASR in in-domain (ID) and OOD scenarios.
Our approach is simple, scalable, and both domain- and language-agnostic.
arXiv Detail & Related papers (2024-10-17T04:00:29Z) - DFIN-SQL: Integrating Focused Schema with DIN-SQL for Superior Accuracy
in Large-Scale Databases [0.0]
This paper introduces DFIN, an innovative extension of DIN-composed (Decomposed-In-Context)
DFIN enhances Text-to-composed conversion by addressing schema linking errors, which are a major source of inaccuracies.
Our evaluation on the BIRD dataset, a challenging real-world benchmark, demonstrates that DFIN not only efficiently but also improves accuracy, achieving a score of 51.69.
arXiv Detail & Related papers (2024-03-01T07:14:45Z) - SQL-PaLM: Improved Large Language Model Adaptation for Text-to-SQL (extended) [53.95151604061761]
This paper introduces the framework for enhancing Text-to- filtering using large language models (LLMs)
With few-shot prompting, we explore the effectiveness of consistency decoding with execution-based error analyses.
With instruction fine-tuning, we delve deep in understanding the critical paradigms that influence the performance of tuned LLMs.
arXiv Detail & Related papers (2023-05-26T21:39:05Z) - SUN: Exploring Intrinsic Uncertainties in Text-to-SQL Parsers [61.48159785138462]
This paper aims to improve the performance of text-to-dependence by exploring the intrinsic uncertainties in the neural network based approaches (called SUN)
Extensive experiments on five benchmark datasets demonstrate that our method significantly outperforms competitors and achieves new state-of-the-art results.
arXiv Detail & Related papers (2022-09-14T06:27:51Z) - SDA: Improving Text Generation with Self Data Augmentation [88.24594090105899]
We propose to improve the standard maximum likelihood estimation (MLE) paradigm by incorporating a self-imitation-learning phase for automatic data augmentation.
Unlike most existing sentence-level augmentation strategies, our method is more general and could be easily adapted to any MLE-based training procedure.
arXiv Detail & Related papers (2021-01-02T01:15:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.