SQL-to-Schema Enhances Schema Linking in Text-to-SQL
- URL: http://arxiv.org/abs/2405.09593v1
- Date: Wed, 15 May 2024 12:22:48 GMT
- Title: SQL-to-Schema Enhances Schema Linking in Text-to-SQL
- Authors: Sun Yang, Qiong Su, Zhishuai Li, Ziyue Li, Hangyu Mao, Chenxi Liu, Rui Zhao,
- Abstract summary: In text-to-speech methods, there is a need to filter out unnecessary tables and columns.
Previous approaches have involved sorting tables and columns based on their relevance to the question.
We propose an inventive schema linking method in two steps.
- Score: 15.6857201570992
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In sophisticated existing Text-to-SQL methods exhibit errors in various proportions, including schema-linking errors (incorrect columns, tables, or extra columns), join errors, nested errors, and group-by errors. Consequently, there is a critical need to filter out unnecessary tables and columns, directing the language models attention to relevant tables and columns with schema-linking, to reduce errors during SQL generation. Previous approaches have involved sorting tables and columns based on their relevance to the question, selecting the top-ranked ones for sorting, or directly identifying the necessary tables and columns for SQL generation. However, these methods face challenges such as lengthy model training times, high consumption of expensive GPT-4 tokens in few-shot prompts, or suboptimal performance in schema linking. Therefore, we propose an inventive schema linking method in two steps: Firstly, generate an initial SQL query by utilizing the complete database schema. Subsequently, extract tables and columns from the initial SQL query to create a concise schema. Using CodeLlama-34B, when comparing the schemas obtained by mainstream methods with ours for SQL generation, our schema performs optimally. Leveraging GPT4, our SQL generation method achieved results that are comparable to mainstream Text-to-SQL methods on the Spider dataset.
Related papers
- RSL-SQL: Robust Schema Linking in Text-to-SQL Generation [12.765849111313614]
We propose a novel framework called RSL- that combines bidirectional schema linking, contextual information augmentation, binary selection strategy, and multi-turn self-correction.
Experiments on the BIRD and Spider benchmarks demonstrate that our approach achieves state-of-the-art execution accuracy among open-source solutions.
arXiv Detail & Related papers (2024-10-31T16:22:26Z) - DAC: Decomposed Automation Correction for Text-to-SQL [51.48239006107272]
We introduce De Automation Correction (DAC), which corrects text-to-composed by decomposing entity linking and skeleton parsing.
We show that our method improves performance by $3.7%$ on average of Spider, Bird, and KaggleDBQA compared with the baseline method.
arXiv Detail & Related papers (2024-08-16T14:43:15Z) - The Death of Schema Linking? Text-to-SQL in the Age of Well-Reasoned Language Models [0.9149661171430259]
We revisit schema linking when using the latest generation of large language models (LLMs)
We find empirically that newer models are adept at utilizing relevant schema elements during generation even in the presence of large numbers of irrelevant ones.
Instead of filtering contextual information, we highlight techniques such as augmentation, selection, and correction, and adopt them to improve the accuracy of our Text-to-BIRD pipeline.
arXiv Detail & Related papers (2024-08-14T17:59:04Z) - Schema-Aware Multi-Task Learning for Complex Text-to-SQL [4.913409359995421]
We present a schema-aware multi-task learning framework (named MT) for complicatedsql queries.
Specifically, we design a schema linking discriminator module to distinguish the valid question-schema linkings.
On the decoder side, we define 6-type relationships to describe the connections between tables and columns.
arXiv Detail & Related papers (2024-03-09T01:13:37Z) - SQLPrompt: In-Context Text-to-SQL with Minimal Labeled Data [54.69489315952524]
"Prompt" is designed to improve the few-shot prompting capabilities of Text-to-LLMs.
"Prompt" outperforms previous approaches for in-context learning with few labeled data by a large margin.
We show that emphPrompt outperforms previous approaches for in-context learning with few labeled data by a large margin.
arXiv Detail & Related papers (2023-11-06T05:24:06Z) - CRUSH4SQL: Collective Retrieval Using Schema Hallucination For Text2SQL [47.14954737590405]
Existing text-to-text generators require the entire schema to be encoded with user text.
Standard dense retrieval techniques are inadequate for schema subsetting a large structured database.
We introduce three benchmarks for schema subsetting on large databases.
arXiv Detail & Related papers (2023-11-02T12:13:52Z) - Semantic Enhanced Text-to-SQL Parsing via Iteratively Learning Schema
Linking Graph [6.13728903057727]
The generalizability to new databases is of vital importance to Text-to- systems which aim to parse human utterances intosql statements.
In this paper, we propose a framework named IS ESL to iteratively build a enhanced semantic schema-linking graph between question tokens and database schemas.
Extensive experiments on three benchmarks demonstrate that IS ESL could consistently outperform the baselines and further investigations show its generalizability and robustness.
arXiv Detail & Related papers (2022-08-08T03:59:33Z) - Proton: Probing Schema Linking Information from Pre-trained Language
Models for Text-to-SQL Parsing [66.55478402233399]
We propose a framework to elicit relational structures via a probing procedure based on Poincar'e distance metric.
Compared with commonly-used rule-based methods for schema linking, we found that probing relations can robustly capture semantic correspondences.
Our framework sets new state-of-the-art performance on three benchmarks.
arXiv Detail & Related papers (2022-06-28T14:05:25Z) - UniSAr: A Unified Structure-Aware Autoregressive Language Model for
Text-to-SQL [48.21638676148253]
We present UniSAr (Unified Structure-Aware Autoregressive Language Model), which benefits from using an off-the-shelf language model.
Specifically, UniSAr extends existing autoregressive models to incorporate three non-invasive extensions to make them structure-aware.
arXiv Detail & Related papers (2022-03-15T11:02:55Z) - S$^2$SQL: Injecting Syntax to Question-Schema Interaction Graph Encoder
for Text-to-SQL Parsers [66.78665327694625]
We propose S$2$, injecting Syntax to question- encoder graph for Text-to- relational parsing.
We also employ the decoupling constraint to induce diverse edge embedding, which further improves the network's performance.
Experiments on the Spider and robustness setting Spider-Syn demonstrate that the proposed approach outperforms all existing methods when pre-training models are used.
arXiv Detail & Related papers (2022-03-14T09:49:15Z) - Mention Extraction and Linking for SQL Query Generation [6.186311061181687]
On the Wiki benchmark, state-of-the-art text-to-text systems typically take a slot-filling approach by building several dedicated models for each type of slots.
This paper proposes a novel extraction-linking approach, where a unified extractor recognizes all types of slot mentions appearing in the question sentence.
Trained with automatically generated annotations, the proposed method achieves the first place on the Wiki benchmark.
arXiv Detail & Related papers (2020-12-18T06:51:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.