Related papers: Bridging the Gap: Transforming Natural Language Questions into SQL Queries via Abstract Query Pattern and Contextual Schema Markup

Bridging the Gap: Transforming Natural Language Questions into SQL Queries via Abstract Query Pattern and Contextual Schema Markup

URL: http://arxiv.org/abs/2502.14682v1
Date: Thu, 20 Feb 2025 16:11:27 GMT
Title: Bridging the Gap: Transforming Natural Language Questions into SQL Queries via Abstract Query Pattern and Contextual Schema Markup
Authors: Yonghui Kong, Hongbing Hu, Dan Zhang, Siyuan Chai, Fan Zhang, Wei Wang,
Abstract summary: We identify two important gaps: the structural mapping gap and the lexical mapping gap.<n> PAS-related achieves an execution accuracy of 87.9%, and leading results on the BIRD dataset with an execution accuracy of 64.67%.<n>Results on the Spider benchmark set a new state-of-the-art on the Spider benchmark with an execution accuracy of 87.9%, and leading results on the BIRD dataset with an execution accuracy of 64.67%.
Score: 6.249316460506702
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models have demonstrated excellent performance in many tasks, including Text-to-SQL, due to their powerful in-context learning capabilities. They are becoming the mainstream approach for Text-to-SQL. However, these methods still have a significant gap compared to human performance, especially on complex questions. As the complexity of questions increases, the gap between questions and SQLs increases. We identify two important gaps: the structural mapping gap and the lexical mapping gap. To tackle these two gaps, we propose PAS-SQL, an efficient SQL generation pipeline based on LLMs, which alleviates gaps through Abstract Query Pattern (AQP) and Contextual Schema Markup (CSM). AQP aims to obtain the structural pattern of the question by removing database-related information, which enables us to find structurally similar demonstrations. CSM aims to associate database-related text span in the question with specific tables or columns in the database, which alleviates the lexical mapping gap. Experimental results on the Spider and BIRD datasets demonstrate the effectiveness of our proposed method. Specifically, PAS-SQL + GPT-4o sets a new state-of-the-art on the Spider benchmark with an execution accuracy of 87.9\%, and achieves leading results on the BIRD dataset with an execution accuracy of 64.67\%.

Related papers

X-SQL: Expert Schema Linking and Understanding of Text-to-SQL with Multi-LLMs [0.0]
We find that database schema information plays a significant or even dominant role in the Text-to-Admin task.<n>We introduce X-Supervised Finetuning (SFT) that achieves superior Linking results compared to existing open-source Text-to-Admin methods.<n>In addition, we propose an X- component that focuses on Understanding by bridging the gap between abstract information and the user's natural language question.
arXiv Detail & Related papers (2025-09-07T02:51:43Z)
UNJOIN: Enhancing Multi-Table Text-to-SQL Generation via Schema Simplification [50.59009084277447]
We introduce UNJOIN, a framework that decouples the retrieval of schema elements from logic generation.<n>In the first stage, we merge the column names of all tables in the database into a single-table representation by prefixing each column with its table name.<n>In the second stage, the query is generated on this simplified schema and mapped back to the original schema by reconstructing JOINs, UNIONs, and relational logic.
arXiv Detail & Related papers (2025-05-23T17:28:43Z)
DB-Explore: Automated Database Exploration and Instruction Synthesis for Text-to-SQL [18.915121803834698]
We propose DB-Explore, a novel framework for database understanding using large language models (LLMs) Our framework enables comprehensive database understanding through diverse sampling strategies and automated instruction generation. Our open-source implementation, based on Qwen2.5-coder-7B model, outperforms multiple GPT-4-driven text-to-coder systems in comparative evaluations.
arXiv Detail & Related papers (2025-03-06T20:46:43Z)
ReFoRCE: A Text-to-SQL Agent with Self-Refinement, Consensus Enforcement, and Column Exploration [32.83579488224367]
We present ReFoRCE, a Text-to-confidence agent that tops the Spider 2.0 leaderboard.<n>ReFoRCE achieves state-of-the-art results, with scores of 35.83 Spider 2.0-Snow and 36.56 on Spider 2.0-Lite.
arXiv Detail & Related papers (2025-02-02T05:25:03Z)
RSL-SQL: Robust Schema Linking in Text-to-SQL Generation [51.00761167842468]
We propose a novel framework called RSL- that combines bidirectional schema linking, contextual information augmentation, binary selection strategy, and multi-turn self-correction. benchmarks demonstrate that our approach achieves SOTA execution accuracy among open-source solutions, with 67.2% on BIRD and 87.9% on GPT-4ocorrection. Our approach outperforms a series of GPT-4 based Text-to-Seek systems when adopting DeepSeek (much cheaper) with same intact prompts.
arXiv Detail & Related papers (2024-10-31T16:22:26Z)
E-SQL: Direct Schema Linking via Question Enrichment in Text-to-SQL [1.187832944550453]
We introduce E-Seek, a novel pipeline specifically designed to address these challenges through direct schema linking and candidate predicate augmentation.<n>E-Seek enhances the natural language query by incorporating relevant database items (i.e., tables, columns, and values) and conditions directly into the question andsql construction plan, bridging the gap between the query and the database structure.<n> Comprehensive evaluations illustrate that E-Seek achieves competitive performance, particularly excelling in complex queries with a 66.29% execution accuracy on the test set.
arXiv Detail & Related papers (2024-09-25T09:02:48Z)
DAC: Decomposed Automation Correction for Text-to-SQL [51.48239006107272]
We introduce De Automation Correction (DAC), which corrects text-to-composed by decomposing entity linking and skeleton parsing. We show that our method improves performance by $3.7%$ on average of Spider, Bird, and KaggleDBQA compared with the baseline method.
arXiv Detail & Related papers (2024-08-16T14:43:15Z)
MAG-SQL: Multi-Agent Generative Approach with Soft Schema Linking and Iterative Sub-SQL Refinement for Text-to-SQL [15.824894030016187]
Recent In-Context Learning based methods have achieved remarkable success in Text-to-Context task. There is still a large gap between the performance of these models and human performance on datasets with complex database schema and difficult questions, such as. In our framework, an entity-based method with tables' summary is used to select the columns in database, and a novel targets-conditions decomposition method is introduced to decompose those complex questions.
arXiv Detail & Related papers (2024-08-15T04:57:55Z)
MAC-SQL: A Multi-Agent Collaborative Framework for Text-to-SQL [47.120862170230566]
Recent Text-to-yourself methods usually suffer from significant performance degradation on "huge" databases. We introduce MAC, a novel Text-to-yourself LLM-based multi-agent collaborative framework. In our framework, we leverage GPT-4 as the strong backbone for all agent tasks to determine the upper bound of our framework. We then fine-tune an open-sourced instruction-followed model,sql-Llama, by leveraging Code 7B, to accomplish all tasks as GPT-4 does.
arXiv Detail & Related papers (2023-12-18T14:40:20Z)
SQL-PaLM: Improved Large Language Model Adaptation for Text-to-SQL (extended) [53.95151604061761]
This paper introduces the framework for enhancing Text-to- filtering using large language models (LLMs) With few-shot prompting, we explore the effectiveness of consistency decoding with execution-based error analyses. With instruction fine-tuning, we delve deep in understanding the critical paradigms that influence the performance of tuned LLMs.
arXiv Detail & Related papers (2023-05-26T21:39:05Z)
UNITE: A Unified Benchmark for Text-to-SQL Evaluation [72.72040379293718]
We introduce a UNIfied benchmark for Text-to-domain systems. It is composed of publicly available text-to-domain datasets and 29K databases. Compared to the widely used Spider benchmark, we introduce a threefold increase in SQL patterns.
arXiv Detail & Related papers (2023-05-25T17:19:52Z)
Prompting GPT-3.5 for Text-to-SQL with De-semanticization and Skeleton Retrieval [17.747079214502673]
Text-to- is a task that converts a natural language question into a structured query language () to retrieve information from a database. In this paper, we propose an LLM-based framework for Text-to- which retrieves helpful demonstration examples to prompt LLMs. We design a de-semanticization mechanism that extracts question skeletons, allowing us to retrieve similar examples based on their structural similarity.
arXiv Detail & Related papers (2023-04-26T06:02:01Z)
S$^2$SQL: Injecting Syntax to Question-Schema Interaction Graph Encoder for Text-to-SQL Parsers [66.78665327694625]
We propose S$2$, injecting Syntax to question- encoder graph for Text-to- relational parsing. We also employ the decoupling constraint to induce diverse edge embedding, which further improves the network's performance. Experiments on the Spider and robustness setting Spider-Syn demonstrate that the proposed approach outperforms all existing methods when pre-training models are used.
arXiv Detail & Related papers (2022-03-14T09:49:15Z)
Bridging Textual and Tabular Data for Cross-Domain Text-to-SQL Semantic Parsing [110.97778888305506]
BRIDGE represents the question and DB schema in a tagged sequence where a subset of the fields are augmented with cell values mentioned in the question. BRIDGE attained state-of-the-art performance on popular cross-DB text-to- relational benchmarks. Our analysis shows that BRIDGE effectively captures the desired cross-modal dependencies and has the potential to generalize to more text-DB related tasks.
arXiv Detail & Related papers (2020-12-23T12:33:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.