Related papers: LinkAlign: Scalable Schema Linking for Real-World Large-Scale Multi-Database Text-to-SQL

LinkAlign: Scalable Schema Linking for Real-World Large-Scale Multi-Database Text-to-SQL

URL: http://arxiv.org/abs/2503.18596v2
Date: Tue, 25 Mar 2025 11:04:18 GMT
Title: LinkAlign: Scalable Schema Linking for Real-World Large-Scale Multi-Database Text-to-SQL
Authors: Yihan Wang, Peiyu Liu,
Abstract summary: LinkAlign is a novel framework that can effectively adapt existing baselines to real-world environments.<n>We evaluate our method performance on the SPIDER and BIRD benchmarks.<n>LinkAlign ranks highest among models excluding those using long chain-of-thought reasoning LLMs.
Score: 14.677024710675838
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Schema linking is a critical bottleneck in achieving human-level performance in Text-to-SQL tasks, particularly in real-world large-scale multi-database scenarios. Addressing schema linking faces two major challenges: (1) Database Retrieval: selecting the correct database from a large schema pool in multi-database settings, while filtering out irrelevant ones. (2) Schema Item Grounding: accurately identifying the relevant tables and columns from within a large and redundant schema for SQL generation. To address this, we introduce LinkAlign, a novel framework that can effectively adapt existing baselines to real-world environments by systematically addressing schema linking. Our framework comprises three key steps: multi-round semantic enhanced retrieval and irrelevant information isolation for Challenge 1, and schema extraction enhancement for Challenge 2. We evaluate our method performance of schema linking on the SPIDER and BIRD benchmarks, and the ability to adapt existing Text-to-SQL models to real-world environments on the SPIDER 2.0-lite benchmark. Experiments show that LinkAlign outperforms existing baselines in multi-database settings, demonstrating its effectiveness and robustness. On the other hand, our method ranks highest among models excluding those using long chain-of-thought reasoning LLMs. This work bridges the gap between current research and real-world scenarios, providing a practical solution for robust and scalable schema linking. The codes are available at https://github.com/Satissss/LinkAlign.

Related papers

Knapsack Optimization-based Schema Linking for LLM-based Text-to-SQL Generation [15.888784472807775]
We introduce Knapsack optimization-based Linking Agent (KaSLA)<n>KaSLA is a plug-in schema linking agent designed to prevent the missing of relevant schema elements while minimizing the inclusion of redundant ones.<n>Experiments on Spider and BIRD benchmarks verify that KaSLA can significantly improve the generation performance of SOTA models.
arXiv Detail & Related papers (2025-02-18T14:53:45Z)
PSM-SQL: Progressive Schema Learning with Multi-granularity Semantics for Text-to-SQL [8.416319689644556]
It is challenging to convert tasks due to the vast number of database schemas with redundancy. We propose a progressive schema linking with multi-granularity semantics (PSM-) PSM- learns the schema semantics at the column, table, and database levels.
arXiv Detail & Related papers (2025-02-07T08:31:57Z)
Extractive Schema Linking for Text-to-SQL [17.757832644216446]
Text-to-one is emerging as a practical interface for real world databases.<n>We introduce a new approach to adapt decoder-only LLMs to schema linking.
arXiv Detail & Related papers (2025-01-23T19:57:08Z)
V-SQL: A View-based Two-stage Text-to-SQL Framework [0.9719868595277401]
Text-to-coupling methods based on large language models (LLMs) have garnered significant attention. The core of mainstream text-to-coupling frameworks is schema linking, which aligns user queries with relevant tables and columns in the database. Previous methods focused on schema linking while to enhance LLMs' understanding of database schema.
arXiv Detail & Related papers (2024-12-17T02:27:50Z)
RSL-SQL: Robust Schema Linking in Text-to-SQL Generation [51.00761167842468]
We propose a novel framework called RSL- that combines bidirectional schema linking, contextual information augmentation, binary selection strategy, and multi-turn self-correction. benchmarks demonstrate that our approach achieves SOTA execution accuracy among open-source solutions, with 67.2% on BIRD and 87.9% on GPT-4ocorrection. Our approach outperforms a series of GPT-4 based Text-to-Seek systems when adopting DeepSeek (much cheaper) with same intact prompts.
arXiv Detail & Related papers (2024-10-31T16:22:26Z)
MAG-SQL: Multi-Agent Generative Approach with Soft Schema Linking and Iterative Sub-SQL Refinement for Text-to-SQL [15.824894030016187]
Recent In-Context Learning based methods have achieved remarkable success in Text-to-Context task. There is still a large gap between the performance of these models and human performance on datasets with complex database schema and difficult questions, such as. In our framework, an entity-based method with tables' summary is used to select the columns in database, and a novel targets-conditions decomposition method is introduced to decompose those complex questions.
arXiv Detail & Related papers (2024-08-15T04:57:55Z)
RB-SQL: A Retrieval-based LLM Framework for Text-to-SQL [48.516004807486745]
Large language models (LLMs) with in-context learning have significantly improved the performance of text-to- task. We propose RB-, a novel retrieval-based framework for in-context prompt engineering. Experiment results demonstrate that our model achieves better performance than several competitive baselines on public datasets BIRD and Spider.
arXiv Detail & Related papers (2024-07-11T08:19:58Z)
CHESS: Contextual Harnessing for Efficient SQL Synthesis [1.9506402593665235]
We introduce CHESS, a framework for efficient and scalable text-to- queries. It comprises four specialized agents, each targeting one of the aforementioned challenges. Our framework offers features that adapt to various deployment constraints.
arXiv Detail & Related papers (2024-05-27T01:54:16Z)
DBCopilot: Natural Language Querying over Massive Databases via Schema Routing [47.009638761948466]
We present DBCopilot, a framework that addresses challenges by employing a compact and flexible copilot model for routing over massive databases. This framework utilizes a single lightweight differentiable search index to construct semantic mappings for massive database schemata, and navigates natural language questions to their target databases and tables in a relation joint retrieval manner.
arXiv Detail & Related papers (2023-12-06T12:37:28Z)
SQL-PaLM: Improved Large Language Model Adaptation for Text-to-SQL (extended) [53.95151604061761]
This paper introduces the framework for enhancing Text-to- filtering using large language models (LLMs) With few-shot prompting, we explore the effectiveness of consistency decoding with execution-based error analyses. With instruction fine-tuning, we delve deep in understanding the critical paradigms that influence the performance of tuned LLMs.
arXiv Detail & Related papers (2023-05-26T21:39:05Z)
Proton: Probing Schema Linking Information from Pre-trained Language Models for Text-to-SQL Parsing [66.55478402233399]
We propose a framework to elicit relational structures via a probing procedure based on Poincar'e distance metric. Compared with commonly-used rule-based methods for schema linking, we found that probing relations can robustly capture semantic correspondences. Our framework sets new state-of-the-art performance on three benchmarks.
arXiv Detail & Related papers (2022-06-28T14:05:25Z)
Bridging Textual and Tabular Data for Cross-Domain Text-to-SQL Semantic Parsing [110.97778888305506]
BRIDGE represents the question and DB schema in a tagged sequence where a subset of the fields are augmented with cell values mentioned in the question. BRIDGE attained state-of-the-art performance on popular cross-DB text-to- relational benchmarks. Our analysis shows that BRIDGE effectively captures the desired cross-modal dependencies and has the potential to generalize to more text-DB related tasks.
arXiv Detail & Related papers (2020-12-23T12:33:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.