UNJOIN: Enhancing Multi-Table Text-to-SQL Generation via Schema Simplification
- URL: http://arxiv.org/abs/2505.18122v1
- Date: Fri, 23 May 2025 17:28:43 GMT
- Title: UNJOIN: Enhancing Multi-Table Text-to-SQL Generation via Schema Simplification
- Authors: Poojah Ganesan, Rajat Aayush Jha, Dan Roth, Vivek Gupta,
- Abstract summary: We introduce UNJOIN, a framework that decouples the retrieval of schema elements from logic generation.<n>In the first stage, we merge the column names of all tables in the database into a single-table representation by prefixing each column with its table name.<n>In the second stage, the query is generated on this simplified schema and mapped back to the original schema by reconstructing JOINs, UNIONs, and relational logic.
- Score: 50.59009084277447
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent advances in large language models (LLMs) have greatly improved Text-to-SQL performance for single-table queries. But, it remains challenging in multi-table databases due to complex schema and relational operations. Existing methods often struggle with retrieving the right tables and columns, generating accurate JOINs and UNIONs, and generalizing across diverse schemas. To address these issues, we introduce UNJOIN, a two-stage framework that decouples the retrieval of schema elements from SQL logic generation. In the first stage, we merge the column names of all tables in the database into a single-table representation by prefixing each column with its table name. This allows the model to focus purely on accurate retrieval without being distracted by the need to write complex SQL logic. In the second stage, the SQL query is generated on this simplified schema and mapped back to the original schema by reconstructing JOINs, UNIONs, and relational logic. Evaluations on SPIDER and BIRD datasets show that UNJOIN matches or exceeds the state-of-the-art baselines. UNJOIN uses only schema information, which does not require data access or fine-tuning, making it scalable and adaptable across databases.
Related papers
- Weaver: Interweaving SQL and LLM for Table Reasoning [63.09519234853953]
Weaver generates a flexible, step-by-step plan that combinessql for structured data retrieval with LLMs for semantic processing.<n>Weaver consistently outperforms state-of-the-art methods across four TableQA datasets, reducing both API calls and error rates.
arXiv Detail & Related papers (2025-05-25T03:27:37Z) - SchemaGraphSQL: Efficient Schema Linking with Pathfinding Graph Algorithms for Text-to-SQL on Large-Scale Databases [1.6544167074080365]
We present a zero-shot, training-free schema linking approach that first constructs a schema graph based on foreign key relations.<n>We apply classical path-finding algorithms and post-processing to identify the optimal sequence of tables and columns that should be joined.<n>Our method achieves state-of-the-art results on the BIRD benchmark, outperforming previous specialized, fine-tuned, and complex multi-step LLM-based approaches.
arXiv Detail & Related papers (2025-05-23T20:42:36Z) - Extractive Schema Linking for Text-to-SQL [17.757832644216446]
Text-to-one is emerging as a practical interface for real world databases.<n>We introduce a new approach to adapt decoder-only LLMs to schema linking.
arXiv Detail & Related papers (2025-01-23T19:57:08Z) - V-SQL: A View-based Two-stage Text-to-SQL Framework [0.9719868595277401]
Text-to-coupling methods based on large language models (LLMs) have garnered significant attention.<n>The core of mainstream text-to-coupling frameworks is schema linking, which aligns user queries with relevant tables and columns in the database.<n>Previous methods focused on schema linking while to enhance LLMs' understanding of database schema.
arXiv Detail & Related papers (2024-12-17T02:27:50Z) - RB-SQL: A Retrieval-based LLM Framework for Text-to-SQL [48.516004807486745]
Large language models (LLMs) with in-context learning have significantly improved the performance of text-to- task.
We propose RB-, a novel retrieval-based framework for in-context prompt engineering.
Experiment results demonstrate that our model achieves better performance than several competitive baselines on public datasets BIRD and Spider.
arXiv Detail & Related papers (2024-07-11T08:19:58Z) - SQL-to-Schema Enhances Schema Linking in Text-to-SQL [15.6857201570992]
In text-to-speech methods, there is a need to filter out unnecessary tables and columns.
Previous approaches have involved sorting tables and columns based on their relevance to the question.
We propose an inventive schema linking method in two steps.
arXiv Detail & Related papers (2024-05-15T12:22:48Z) - Schema-Aware Multi-Task Learning for Complex Text-to-SQL [4.913409359995421]
We present a schema-aware multi-task learning framework (named MT) for complicatedsql queries.
Specifically, we design a schema linking discriminator module to distinguish the valid question-schema linkings.
On the decoder side, we define 6-type relationships to describe the connections between tables and columns.
arXiv Detail & Related papers (2024-03-09T01:13:37Z) - MURRE: Multi-Hop Table Retrieval with Removal for Open-Domain Text-to-SQL [51.48239006107272]
Multi-hop table retrieval with removal (MURRE) removes previously retrieved information from the question to guide towards unretrieved relevant tables.
Experiments on two open-domain text-to- retriever datasets demonstrate an average improvement of 5.7% over the previous state-of-the-art results.
arXiv Detail & Related papers (2024-02-16T13:14:35Z) - Proton: Probing Schema Linking Information from Pre-trained Language
Models for Text-to-SQL Parsing [66.55478402233399]
We propose a framework to elicit relational structures via a probing procedure based on Poincar'e distance metric.
Compared with commonly-used rule-based methods for schema linking, we found that probing relations can robustly capture semantic correspondences.
Our framework sets new state-of-the-art performance on three benchmarks.
arXiv Detail & Related papers (2022-06-28T14:05:25Z) - UniSAr: A Unified Structure-Aware Autoregressive Language Model for
Text-to-SQL [48.21638676148253]
We present UniSAr (Unified Structure-Aware Autoregressive Language Model), which benefits from using an off-the-shelf language model.
Specifically, UniSAr extends existing autoregressive models to incorporate three non-invasive extensions to make them structure-aware.
arXiv Detail & Related papers (2022-03-15T11:02:55Z) - Retrieving Complex Tables with Multi-Granular Graph Representation
Learning [20.72341939868327]
The task of natural language table retrieval seeks to retrieve semantically relevant tables based on natural language queries.
Existing learning systems treat tables as plain text based on the assumption that tables are structured as dataframes.
We propose Graph-based Table Retrieval (GTR), a generalizable NLTR framework with multi-granular graph representation learning.
arXiv Detail & Related papers (2021-05-04T20:19:03Z) - Bridging Textual and Tabular Data for Cross-Domain Text-to-SQL Semantic
Parsing [110.97778888305506]
BRIDGE represents the question and DB schema in a tagged sequence where a subset of the fields are augmented with cell values mentioned in the question.
BRIDGE attained state-of-the-art performance on popular cross-DB text-to- relational benchmarks.
Our analysis shows that BRIDGE effectively captures the desired cross-modal dependencies and has the potential to generalize to more text-DB related tasks.
arXiv Detail & Related papers (2020-12-23T12:33:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.