Related papers: E-SQL: Direct Schema Linking via Question Enrichment in Text-to-SQL

E-SQL: Direct Schema Linking via Question Enrichment in Text-to-SQL

URL: http://arxiv.org/abs/2409.16751v2
Date: Tue, 28 Jan 2025 09:45:41 GMT
Title: E-SQL: Direct Schema Linking via Question Enrichment in Text-to-SQL
Authors: Hasan Alp Caferoğlu, Özgür Ulusoy,
Abstract summary: We introduce E-Seek, a novel pipeline specifically designed to address these challenges through direct schema linking and candidate predicate augmentation.<n>E-Seek enhances the natural language query by incorporating relevant database items (i.e., tables, columns, and values) and conditions directly into the question andsql construction plan, bridging the gap between the query and the database structure.<n> Comprehensive evaluations illustrate that E-Seek achieves competitive performance, particularly excelling in complex queries with a 66.29% execution accuracy on the test set.
Score: 1.187832944550453
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Translating Natural Language Queries into Structured Query Language (Text-to-SQL or NLQ-to-SQL) is a critical task extensively studied by both the natural language processing and database communities, aimed at providing a natural language interface to databases (NLIDB) and lowering the barrier for non-experts. Despite recent advancements made through the use of Large Language Models (LLMs), significant challenges remain. These include handling complex database schemas, resolving ambiguity in user queries, and generating SQL queries with intricate structures that accurately reflect the user's intent. In this work, we introduce E-SQL, a novel pipeline specifically designed to address these challenges through direct schema linking and candidate predicate augmentation. E-SQL enhances the natural language query by incorporating relevant database items (i.e., tables, columns, and values) and conditions directly into the question and SQL construction plan, bridging the gap between the query and the database structure. The pipeline leverages candidate predicate augmentation to mitigate erroneous or incomplete predicates in generated SQLs. Comprehensive evaluations on the BIRD benchmark illustrate that E-SQL achieves competitive performance, particularly excelling in complex queries with a 66.29% execution accuracy on the test set. A further observation from our experiments reveals that incorporating schema filtering into the translation pipeline does not have a positive impact on performance when the most advanced proprietary LLMs are used. Additionally, our experiments with small LLMs highlight the importance and positive impact of enriched questions on their performance. Without fine-tuning, single-prompt SQL generation using enriched questions with DeepSeek Coder 7B Instruct 1.5v achieves 56.45% execution accuracy on the BIRD development set.

Related papers

DB-Explore: Automated Database Exploration and Instruction Synthesis for Text-to-SQL [18.915121803834698]
We propose DB-Explore, a novel framework for database understanding using large language models (LLMs) Our framework enables comprehensive database understanding through diverse sampling strategies and automated instruction generation. Our open-source implementation, based on Qwen2.5-coder-7B model, outperforms multiple GPT-4-driven text-to-coder systems in comparative evaluations.
arXiv Detail & Related papers (2025-03-06T20:46:43Z)
Bridging the Gap: Transforming Natural Language Questions into SQL Queries via Abstract Query Pattern and Contextual Schema Markup [6.249316460506702]
We identify two important gaps: the structural mapping gap and the lexical mapping gap. PAS-related achieves an execution accuracy of 87.9%, and leading results on the BIRD dataset with an execution accuracy of 64.67%. Results on the Spider benchmark set a new state-of-the-art on the Spider benchmark with an execution accuracy of 87.9%, and leading results on the BIRD dataset with an execution accuracy of 64.67%.
arXiv Detail & Related papers (2025-02-20T16:11:27Z)
Bridging the Gap: Enabling Natural Language Queries for NoSQL Databases through Text-to-NoSQL Translation [25.638927795540454]
We introduce the Text-to-No task, which aims to convert natural language queries into accessible queries. To promote research in this area, we released a large-scale and open-source dataset for this task, named TEND (short interfaces for Text-to-No dataset) We also designed a SLM (Small Language Model)-assisted and RAG (Retrieval-augmented Generation)-assisted multi-step framework called SMART, which is specifically designed for Text-to-No conversion.
arXiv Detail & Related papers (2025-02-16T17:01:48Z)
Enhancing LLM Fine-tuning for Text-to-SQLs by SQL Quality Measurement [1.392448435105643]
Text-to-s enables non-expert users to effortlessly retrieve desired information from databases using natural language queries. Current state-of-the-art (SOTA) models like GPT4 and T5 have shown impressive performance on large-scale benchmarks like BIRD. This paper proposed a novel approach that only needs SQL Quality to enhance Text-to-s performance.
arXiv Detail & Related papers (2024-10-02T17:21:51Z)
SelECT-SQL: Self-correcting ensemble Chain-of-Thought for Text-to-SQL [3.422309388045878]
We introduce SelECT-, a novel in-context learning solution that uses an algorithmic combination of chain-of-thought, self-correction, and ensemble methods. Specifically, when configured using GPT as the base LLM, SelECT-Turbo achieves 84.2% execution accuracy on the Spider leaderboard's development set.
arXiv Detail & Related papers (2024-09-16T05:40:18Z)
SQLucid: Grounding Natural Language Database Queries with Interactive Explanations [28.10727203675818]
SQLucid is a novel user interface that bridges the gap between non-expert users and complex database querying processes. Our code is available at https://github.com/magic-YuanTian/ucid.
arXiv Detail & Related papers (2024-09-10T03:14:09Z)
DAC: Decomposed Automation Correction for Text-to-SQL [51.48239006107272]
We introduce De Automation Correction (DAC), which corrects text-to-composed by decomposing entity linking and skeleton parsing. We show that our method improves performance by $3.7%$ on average of Spider, Bird, and KaggleDBQA compared with the baseline method.
arXiv Detail & Related papers (2024-08-16T14:43:15Z)
RB-SQL: A Retrieval-based LLM Framework for Text-to-SQL [48.516004807486745]
Large language models (LLMs) with in-context learning have significantly improved the performance of text-to- task. We propose RB-, a novel retrieval-based framework for in-context prompt engineering. Experiment results demonstrate that our model achieves better performance than several competitive baselines on public datasets BIRD and Spider.
arXiv Detail & Related papers (2024-07-11T08:19:58Z)
SQLformer: Deep Auto-Regressive Query Graph Generation for Text-to-SQL Translation [16.07396492960869]
We introduce a novel Transformer architecture specifically crafted to perform text-to-gressive translation tasks. Our model predicts queries as abstract syntax trees (ASTs) in an autore way, incorporating structural inductive bias in the executable and decoder layers.
arXiv Detail & Related papers (2023-10-27T00:13:59Z)
SQL-PaLM: Improved Large Language Model Adaptation for Text-to-SQL (extended) [53.95151604061761]
This paper introduces the framework for enhancing Text-to- filtering using large language models (LLMs) With few-shot prompting, we explore the effectiveness of consistency decoding with execution-based error analyses. With instruction fine-tuning, we delve deep in understanding the critical paradigms that influence the performance of tuned LLMs.
arXiv Detail & Related papers (2023-05-26T21:39:05Z)
UNITE: A Unified Benchmark for Text-to-SQL Evaluation [72.72040379293718]
We introduce a UNIfied benchmark for Text-to-domain systems. It is composed of publicly available text-to-domain datasets and 29K databases. Compared to the widely used Spider benchmark, we introduce a threefold increase in SQL patterns.
arXiv Detail & Related papers (2023-05-25T17:19:52Z)
Can LLM Already Serve as A Database Interface? A BIg Bench for Large-Scale Database Grounded Text-to-SQLs [89.68522473384522]
We present Bird, a big benchmark for large-scale database grounded in text-to-efficient tasks. Our emphasis on database values highlights the new challenges of dirty database contents. Even the most effective text-to-efficient models, i.e. ChatGPT, achieves only 40.08% in execution accuracy.
arXiv Detail & Related papers (2023-05-04T19:02:29Z)
xDBTagger: Explainable Natural Language Interface to Databases Using Keyword Mappings and Schema Graph [0.17188280334580192]
Translating natural language queries into structured query language (NLQ) in interfaces to relational databases is a challenging task. We propose xDBTagger, an explainable hybrid translation pipeline that explains the decisions made along the way to the user both textually and visually. xDBTagger is effective in terms of accuracy and translates the queries more efficiently compared to other state-of-the-art pipeline-based systems up to 10000 times.
arXiv Detail & Related papers (2022-10-07T18:17:09Z)
Proton: Probing Schema Linking Information from Pre-trained Language Models for Text-to-SQL Parsing [66.55478402233399]
We propose a framework to elicit relational structures via a probing procedure based on Poincar'e distance metric. Compared with commonly-used rule-based methods for schema linking, we found that probing relations can robustly capture semantic correspondences. Our framework sets new state-of-the-art performance on three benchmarks.
arXiv Detail & Related papers (2022-06-28T14:05:25Z)
Weakly Supervised Text-to-SQL Parsing through Question Decomposition [53.22128541030441]
We take advantage of the recently proposed question meaning representation called QDMR. Given questions, their QDMR structures (annotated by non-experts or automatically predicted) and the answers, we are able to automatically synthesizesql queries. Our results show that the weakly supervised models perform competitively with those trained on NL- benchmark data.
arXiv Detail & Related papers (2021-12-12T20:02:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.