ORANGE: An Online Reflection ANd GEneration framework with Domain Knowledge for Text-to-SQL
- URL: http://arxiv.org/abs/2511.00985v2
- Date: Tue, 04 Nov 2025 17:28:21 GMT
- Title: ORANGE: An Online Reflection ANd GEneration framework with Domain Knowledge for Text-to-SQL
- Authors: Yiwen Jiao, Tonghui Ren, Yuche Gao, Zhenying He, Yinan Jing, Kai Zhang, X. Sean Wang,
- Abstract summary: Large Language Models (LLMs) have demonstrated remarkable progress in translating natural language tosql.<n>A significant semantic gap persists between their general knowledge and domain-specific semantics of databases.<n>We introduce Orange, an online self-evolutionary framework that constructs database-specific knowledge bases by parsing queries from translation logs.
- Score: 8.241433772695018
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large Language Models (LLMs) have demonstrated remarkable progress in translating natural language to SQL, but a significant semantic gap persists between their general knowledge and domain-specific semantics of databases. Historical translation logs constitute a rich source of this missing in-domain knowledge, where SQL queries inherently encapsulate real-world usage patterns of database schema. Existing methods primarily enhance the reasoning process for individual translations but fail to accumulate in-domain knowledge from past translations. We introduce ORANGE, an online self-evolutionary framework that constructs database-specific knowledge bases by parsing SQL queries from translation logs. By accumulating in-domain knowledge that contains schema and data semantics, ORANGE progressively reduces the semantic gap and enhances the accuracy of subsequent SQL translations. To ensure reliability, we propose a novel nested Chain-of-Thought SQL-to-Text strategy with tuple-semantic tracking, which reduces semantic errors during knowledge generation. Experiments on multiple benchmarks confirm the practicality of ORANGE, demonstrating its effectiveness for real-world Text-to-SQL deployment, particularly in handling complex and domain-specific queries.
Related papers
- Bridging Global Intent with Local Details: A Hierarchical Representation Approach for Semantic Validation in Text-to-SQL [30.78817492504152]
HERO is a hierarchical representation approach that integrates global intent and local details.<n>We employ a Nested Message Passing Neural Network (NMPNN) to capture inherent information in relational schema-guided semantics.<n>Our approach outperforms existing state-of-the-art methods, achieving an average 9.40% improvement of AUPRC and 12.35% of AUROC in identifying semantic inconsistencies.<n>It excels at detecting fine-grained semantic errors, provides large language models with more granular feedback, and ultimately enhances the reliability and interpretability of data querying platforms.
arXiv Detail & Related papers (2025-12-28T02:25:33Z) - Companion Agents: A Table-Information Mining Paradigm for Text-to-SQL [8.159121916366727]
Large-scale Text-to-curated benchmarks such as BIRD typically assume complete and accurate database annotations as well as available external knowledge.<n>This mismatch substantially limits the real-world applicability of state-of-the-domain Text-to-art systems.<n>We propose a database-centric approach that leverages intrinsic, fine-grained information residing in relational databases to construct missing evidence.
arXiv Detail & Related papers (2025-12-17T07:11:55Z) - Retrieval and Augmentation of Domain Knowledge for Text-to-SQL Semantic Parsing [28.56221748194599]
We propose a systematic framework for associating structured domain statements at the database level.<n>We present retrieval of relevant structured domain statements given a user query using sub-string level match.
arXiv Detail & Related papers (2025-10-01T04:01:17Z) - End-to-End Text-to-SQL with Dataset Selection: Leveraging LLMs for Adaptive Query Generation [6.5390580456423555]
Traditional approaches model text-to- query as a direct translation task.<n>Recent advances in large language models (LLMs) have significantly improved translation accuracy.<n>We propose a three-stage end-to-end text-to-end framework to identify the user's intended database.
arXiv Detail & Related papers (2025-08-08T15:16:36Z) - RASL: Retrieval Augmented Schema Linking for Massive Database Text-to-SQL [1.3654846342364308]
We introduce a component-based retrieval architecture that decomposes database schemas and metadata into discrete semantic units.<n>Our solution enables practical text-to- interfaces across diverse enterprise settings without specialized fine-tuning.
arXiv Detail & Related papers (2025-07-30T21:09:47Z) - Large Language Models are Good Relational Learners [55.40941576497973]
We introduce Rel-LLM, a novel architecture that utilizes a graph neural network (GNN)- based encoder to generate structured relational prompts for large language models (LLMs)<n>Unlike traditional text-based serialization approaches, our method preserves the inherent relational structure of databases while enabling LLMs to process and reason over complex entity relationships.
arXiv Detail & Related papers (2025-06-06T04:07:55Z) - Knowledge Base Construction for Knowledge-Augmented Text-to-SQL [37.87911346522774]
We propose constructing a knowledge base for text-to-one, a foundational source of knowledge, from which we generate necessary knowledge for given queries.<n>Our knowledge base is comprehensive, which is constructed based on a combination of all available questions and their associated database schemas.<n>We validate our approach on multiple text-to-one datasets, considering both overlapping and non-overlapping database scenarios.
arXiv Detail & Related papers (2025-05-28T08:17:58Z) - Datrics Text2SQL: A Framework for Natural Language to SQL Query Generation [0.0]
This paper introduces a Retrieval-Augmented Generation (RAG)-based framework designed to generate accuratesql queries by leveraging structured documentation, example-based learning, and domain-specific rules.<n>The paper details the architecture, training methodology, and retrieval logic, highlighting how the system bridges the gap between user intent and database structure without requiringsql expertise.
arXiv Detail & Related papers (2025-04-03T21:09:59Z) - SQL-PaLM: Improved Large Language Model Adaptation for Text-to-SQL (extended) [53.95151604061761]
This paper introduces the framework for enhancing Text-to- filtering using large language models (LLMs)
With few-shot prompting, we explore the effectiveness of consistency decoding with execution-based error analyses.
With instruction fine-tuning, we delve deep in understanding the critical paradigms that influence the performance of tuned LLMs.
arXiv Detail & Related papers (2023-05-26T21:39:05Z) - Improving Text-to-SQL Semantic Parsing with Fine-grained Query
Understanding [84.04706075621013]
We present a general-purpose, modular neural semantic parsing framework based on token-level fine-grained query understanding.
Our framework consists of three modules: named entity recognizer (NER), neural entity linker (NEL) and neural entity linker (NSP)
arXiv Detail & Related papers (2022-09-28T21:00:30Z) - SUN: Exploring Intrinsic Uncertainties in Text-to-SQL Parsers [61.48159785138462]
This paper aims to improve the performance of text-to-dependence by exploring the intrinsic uncertainties in the neural network based approaches (called SUN)
Extensive experiments on five benchmark datasets demonstrate that our method significantly outperforms competitors and achieves new state-of-the-art results.
arXiv Detail & Related papers (2022-09-14T06:27:51Z) - Proton: Probing Schema Linking Information from Pre-trained Language
Models for Text-to-SQL Parsing [66.55478402233399]
We propose a framework to elicit relational structures via a probing procedure based on Poincar'e distance metric.
Compared with commonly-used rule-based methods for schema linking, we found that probing relations can robustly capture semantic correspondences.
Our framework sets new state-of-the-art performance on three benchmarks.
arXiv Detail & Related papers (2022-06-28T14:05:25Z) - Bridging Textual and Tabular Data for Cross-Domain Text-to-SQL Semantic
Parsing [110.97778888305506]
BRIDGE represents the question and DB schema in a tagged sequence where a subset of the fields are augmented with cell values mentioned in the question.
BRIDGE attained state-of-the-art performance on popular cross-DB text-to- relational benchmarks.
Our analysis shows that BRIDGE effectively captures the desired cross-modal dependencies and has the potential to generalize to more text-DB related tasks.
arXiv Detail & Related papers (2020-12-23T12:33:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.