ODIN: A NL2SQL Recommender to Handle Schema Ambiguity
- URL: http://arxiv.org/abs/2505.19302v1
- Date: Sun, 25 May 2025 20:22:32 GMT
- Title: ODIN: A NL2SQL Recommender to Handle Schema Ambiguity
- Authors: Kapil Vaidya, Abishek Sankararaman, Jialin Ding, Chuan Lei, Xiao Qin, Balakrishnan Narayanaswamy, Tim Kraska,
- Abstract summary: ODIN generates queries based on different interpretations of ambiguous schema components.<n>Our evaluation shows that ODIN improves the likelihood of generating the correct query by 1.5-2$times$ compared to baselines.
- Score: 21.483551391764944
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: NL2SQL (natural language to SQL) systems translate natural language into SQL queries, allowing users with no technical background to interact with databases and create tools like reports or visualizations. While recent advancements in large language models (LLMs) have significantly improved NL2SQL accuracy, schema ambiguity remains a major challenge in enterprise environments with complex schemas, where multiple tables and columns with semantically similar names often co-exist. To address schema ambiguity, we introduce ODIN, a NL2SQL recommendation engine. Instead of producing a single SQL query given a natural language question, ODIN generates a set of potential SQL queries by accounting for different interpretations of ambiguous schema components. ODIN dynamically adjusts the number of suggestions based on the level of ambiguity, and ODIN learns from user feedback to personalize future SQL query recommendations. Our evaluation shows that ODIN improves the likelihood of generating the correct SQL query by 1.5-2$\times$ compared to baselines.
Related papers
- TailorSQL: An NL2SQL System Tailored to Your Query Workload [16.48291142955493]
State-of-the-art NL2 techniques typically perform translation by retrieving database-specific information.<n>We introduce Tailor, a NL2 system that takes advantage of information in the past query workload.<n> Tailor achieves up to 2$times$ improvement in execution accuracy on standardized benchmarks.
arXiv Detail & Related papers (2025-05-29T03:27:22Z) - Text-to-SQL Domain Adaptation via Human-LLM Collaborative Data Annotation [26.834687657847454]
Text-to-sql models are increasingly adopted in real-world applications.<n> deploying such models in the real world often requires adapting them to the highly specialized database schemas used in specific applications.<n>We find that existing text-to-sql models experience significant performance drops when applied to new schemas.<n> Continuously obtaining high-quality text-to-sql data for evolving schemas is prohibitively expensive in real-world scenarios.
arXiv Detail & Related papers (2025-02-21T22:32:35Z) - Is Long Context All You Need? Leveraging LLM's Extended Context for NL2SQL [1.1694928565998557]
Large Language Models (LLMs) have demonstrated impressive capabilities across a range of natural language processing tasks.<n>One approach to this semantic ambiguous problem is to provide more and sufficient contextual information.<n>We show that long context LLMs are robust and do not get lost in the extended contextual information.
arXiv Detail & Related papers (2025-01-21T18:52:15Z) - A Survey of NL2SQL with Large Language Models: Where are we, and where are we going? [32.84561352339466]
We provide a review of NL2 techniques powered by Large Language Models (LLMs)<n>We discuss the research challenges and open problems of NL2 in the LLMs era.
arXiv Detail & Related papers (2024-08-09T14:59:36Z) - DBCopilot: Natural Language Querying over Massive Databases via Schema Routing [47.009638761948466]
We present DBCopilot, a framework that addresses challenges by employing a compact and flexible copilot model for routing over massive databases.<n>This framework utilizes a single lightweight differentiable search index to construct semantic mappings for massive database schemata, and navigates natural language questions to their target databases and tables in a relation joint retrieval manner.
arXiv Detail & Related papers (2023-12-06T12:37:28Z) - UNITE: A Unified Benchmark for Text-to-SQL Evaluation [72.72040379293718]
We introduce a UNIfied benchmark for Text-to-domain systems.
It is composed of publicly available text-to-domain datasets and 29K databases.
Compared to the widely used Spider benchmark, we introduce a threefold increase in SQL patterns.
arXiv Detail & Related papers (2023-05-25T17:19:52Z) - Prompting GPT-3.5 for Text-to-SQL with De-semanticization and Skeleton
Retrieval [17.747079214502673]
Text-to- is a task that converts a natural language question into a structured query language () to retrieve information from a database.
In this paper, we propose an LLM-based framework for Text-to- which retrieves helpful demonstration examples to prompt LLMs.
We design a de-semanticization mechanism that extracts question skeletons, allowing us to retrieve similar examples based on their structural similarity.
arXiv Detail & Related papers (2023-04-26T06:02:01Z) - STAR: SQL Guided Pre-Training for Context-dependent Text-to-SQL Parsing [64.80483736666123]
We propose a novel pre-training framework STAR for context-dependent text-to- parsing.
In addition, we construct a large-scale context-dependent text-to-the-art conversation corpus to pre-train STAR.
Extensive experiments show that STAR achieves new state-of-the-art performance on two downstream benchmarks.
arXiv Detail & Related papers (2022-10-21T11:30:07Z) - A Survey on Text-to-SQL Parsing: Concepts, Methods, and Future
Directions [102.8606542189429]
The goal of text-to-corpora parsing is to convert a natural language (NL) question to its corresponding structured query language () based on the evidences provided by databases.
Deep neural networks have significantly advanced this task by neural generation models, which automatically learn a mapping function from an input NL question to an output query.
arXiv Detail & Related papers (2022-08-29T14:24:13Z) - Weakly Supervised Text-to-SQL Parsing through Question Decomposition [53.22128541030441]
We take advantage of the recently proposed question meaning representation called QDMR.
Given questions, their QDMR structures (annotated by non-experts or automatically predicted) and the answers, we are able to automatically synthesizesql queries.
Our results show that the weakly supervised models perform competitively with those trained on NL- benchmark data.
arXiv Detail & Related papers (2021-12-12T20:02:42Z) - Relation Aware Semi-autoregressive Semantic Parsing for NL2SQL [17.605904256822786]
We present a Relation aware Semi-autogressive Semantic Parsing (MODN) framework, which is more adaptable for NL2 backbone.
From empirical results and case study, our model shows its effectiveness in learning better word representation in NL2.
arXiv Detail & Related papers (2021-08-02T12:21:08Z) - Photon: A Robust Cross-Domain Text-to-SQL System [189.1405317853752]
We present Photon, a robust, modular, cross-domain NLIDB that can flag natural language input to which a mapping cannot be immediately determined.
The proposed method effectively improves the robustness of text-to-native system against untranslatable user input.
arXiv Detail & Related papers (2020-07-30T07:44:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.