E-SQL: Direct Schema Linking via Question Enrichment in Text-to-SQL
- URL: http://arxiv.org/abs/2409.16751v1
- Date: Wed, 25 Sep 2024 09:02:48 GMT
- Title: E-SQL: Direct Schema Linking via Question Enrichment in Text-to-SQL
- Authors: Hasan Alp Caferoğlu, Özgür Ulusoy,
- Abstract summary: We introduce E- repository, a novel pipeline designed to address challenges through direct schema linking and candidate predicate augmentation.
E- enhances the natural language query by incorporating relevant database items (i.e. tables, columns, and values) and conditions directly into the question, bridging the gap between the query and the database structure.
We investigate the impact of schema filtering, a technique widely explored in previous work, and demonstrate its diminishing returns when applied alongside advanced large language models.
- Score: 1.187832944550453
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Translating Natural Language Queries into Structured Query Language (Text-to-SQL or NLQ-to-SQL) is a critical task extensively studied by both the natural language processing and database communities, aimed at providing a natural language interface to databases (NLIDB) and lowering the barrier for non-experts. Despite recent advancements made through the use of Large Language Models (LLMs), significant challenges remain. These include handling complex database schemas, resolving ambiguity in user queries, and generating SQL queries with intricate structures that accurately reflect the user's intent. In this work, we introduce E-SQL, a novel pipeline specifically designed to address these challenges through direct schema linking and candidate predicate augmentation. E-SQL enhances the natural language query by incorporating relevant database items (i.e., tables, columns, and values) and conditions directly into the question, bridging the gap between the query and the database structure. The pipeline leverages candidate predicate augmentation to mitigate erroneous or incomplete predicates in generated SQLs. We further investigate the impact of schema filtering, a technique widely explored in previous work, and demonstrate its diminishing returns when applied alongside advanced large language models. Comprehensive evaluations on the BIRD benchmark illustrate that E-SQL achieves competitive performance, particularly excelling in complex queries with a 66.29% execution accuracy on the test set. All code required to reproduce the reported results is publicly available on our GitHub repository.
Related papers
- SelECT-SQL: Self-correcting ensemble Chain-of-Thought for Text-to-SQL [3.422309388045878]
We introduce SelECT-, a novel in-context learning solution that uses an algorithmic combination of chain-of-thought, self-correction, and ensemble methods.
Specifically, when configured using GPT as the base LLM, SelECT-Turbo achieves 84.2% execution accuracy on the Spider leaderboard's development set.
arXiv Detail & Related papers (2024-09-16T05:40:18Z) - SQLucid: Grounding Natural Language Database Queries with Interactive Explanations [28.10727203675818]
SQLucid is a novel user interface that bridges the gap between non-expert users and complex database querying processes.
Our code is available at https://github.com/magic-YuanTian/ucid.
arXiv Detail & Related papers (2024-09-10T03:14:09Z) - RB-SQL: A Retrieval-based LLM Framework for Text-to-SQL [48.516004807486745]
Large language models (LLMs) with in-context learning have significantly improved the performance of text-to- task.
We propose RB-, a novel retrieval-based framework for in-context prompt engineering.
Experiment results demonstrate that our model achieves better performance than several competitive baselines on public datasets BIRD and Spider.
arXiv Detail & Related papers (2024-07-11T08:19:58Z) - SQLformer: Deep Auto-Regressive Query Graph Generation for Text-to-SQL Translation [16.07396492960869]
We introduce a novel Transformer architecture specifically crafted to perform text-to-gressive translation tasks.
Our model predicts queries as abstract syntax trees (ASTs) in an autore way, incorporating structural inductive bias in the executable and decoder layers.
arXiv Detail & Related papers (2023-10-27T00:13:59Z) - SQL-PaLM: Improved Large Language Model Adaptation for Text-to-SQL (extended) [53.95151604061761]
This paper introduces the framework for enhancing Text-to- filtering using large language models (LLMs)
With few-shot prompting, we explore the effectiveness of consistency decoding with execution-based error analyses.
With instruction fine-tuning, we delve deep in understanding the critical paradigms that influence the performance of tuned LLMs.
arXiv Detail & Related papers (2023-05-26T21:39:05Z) - UNITE: A Unified Benchmark for Text-to-SQL Evaluation [72.72040379293718]
We introduce a UNIfied benchmark for Text-to-domain systems.
It is composed of publicly available text-to-domain datasets and 29K databases.
Compared to the widely used Spider benchmark, we introduce a threefold increase in SQL patterns.
arXiv Detail & Related papers (2023-05-25T17:19:52Z) - Can LLM Already Serve as A Database Interface? A BIg Bench for
Large-Scale Database Grounded Text-to-SQLs [89.68522473384522]
We present Bird, a big benchmark for large-scale database grounded in text-to-efficient tasks.
Our emphasis on database values highlights the new challenges of dirty database contents.
Even the most effective text-to-efficient models, i.e. ChatGPT, achieves only 40.08% in execution accuracy.
arXiv Detail & Related papers (2023-05-04T19:02:29Z) - xDBTagger: Explainable Natural Language Interface to Databases Using
Keyword Mappings and Schema Graph [0.17188280334580192]
Translating natural language queries into structured query language (NLQ) in interfaces to relational databases is a challenging task.
We propose xDBTagger, an explainable hybrid translation pipeline that explains the decisions made along the way to the user both textually and visually.
xDBTagger is effective in terms of accuracy and translates the queries more efficiently compared to other state-of-the-art pipeline-based systems up to 10000 times.
arXiv Detail & Related papers (2022-10-07T18:17:09Z) - Proton: Probing Schema Linking Information from Pre-trained Language
Models for Text-to-SQL Parsing [66.55478402233399]
We propose a framework to elicit relational structures via a probing procedure based on Poincar'e distance metric.
Compared with commonly-used rule-based methods for schema linking, we found that probing relations can robustly capture semantic correspondences.
Our framework sets new state-of-the-art performance on three benchmarks.
arXiv Detail & Related papers (2022-06-28T14:05:25Z) - Weakly Supervised Text-to-SQL Parsing through Question Decomposition [53.22128541030441]
We take advantage of the recently proposed question meaning representation called QDMR.
Given questions, their QDMR structures (annotated by non-experts or automatically predicted) and the answers, we are able to automatically synthesizesql queries.
Our results show that the weakly supervised models perform competitively with those trained on NL- benchmark data.
arXiv Detail & Related papers (2021-12-12T20:02:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.