DFIN-SQL: Integrating Focused Schema with DIN-SQL for Superior Accuracy
in Large-Scale Databases
- URL: http://arxiv.org/abs/2403.00872v1
- Date: Fri, 1 Mar 2024 07:14:45 GMT
- Title: DFIN-SQL: Integrating Focused Schema with DIN-SQL for Superior Accuracy
in Large-Scale Databases
- Authors: Shai Volvovsky, Marco Marcassa, Mustafa Panbiharwala
- Abstract summary: This paper introduces DFIN, an innovative extension of DIN-composed (Decomposed-In-Context)
DFIN enhances Text-to-composed conversion by addressing schema linking errors, which are a major source of inaccuracies.
Our evaluation on the BIRD dataset, a challenging real-world benchmark, demonstrates that DFIN not only efficiently but also improves accuracy, achieving a score of 51.69.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The task of converting natural language queries into SQL queries is
intricate, necessitating a blend of precise techniques for an accurate
translation. The DIN-SQL (Decomposed-In-Context SQL) methodology represents a
significant development in this domain. This paper introduces DFIN (Decomposed
Focused-In-Context), an innovative extension of DIN-SQL that enhances
Text-to-SQL conversion by addressing schema linking errors, which are a major
source of inaccuracies. DFIN uniquely alternates between prompting techniques
and Retrieval-Augmented Generation (RAG), adapting to the size and complexity
of the database schema. A preprocessing phase embeds database definitions and
leverages annotated files, akin to those in the BIRD dataset, facilitating the
runtime retrieval of pertinent schema information. This strategy significantly
reduces the token count for schema linking prompts, enabling the use of a
standard GPT-4 model over its larger context variant, thus handling large-scale
databases more effectively and economically. Our evaluation on the BIRD
dataset, a challenging real-world benchmark, demonstrates that DFIN not only
scales efficiently but also improves accuracy, achieving a score of 51.69. This
improvement surpasses DIN-SQL method (the current third-place), which is the
highest-ranked model employing in-context learning rather than fine-tuning,
previously scoring 50.72. The advancement of DFIN underscores the evolving
capabilities of in-context learning methodologies combined with advanced
language models, offering a promising avenue for future research in complex
Text-to-SQL conversion tasks.
Related papers
- RB-SQL: A Retrieval-based LLM Framework for Text-to-SQL [48.516004807486745]
Large language models (LLMs) with in-context learning have significantly improved the performance of text-to- task.
We propose RB-, a novel retrieval-based framework for in-context prompt engineering.
Experiment results demonstrate that our model achieves better performance than several competitive baselines on public datasets BIRD and Spider.
arXiv Detail & Related papers (2024-07-11T08:19:58Z) - RH-SQL: Refined Schema and Hardness Prompt for Text-to-SQL [1.734218686180302]
This paper introduces a method for Text-to- Execute based on Refined Execution Model and Hardness Prompt.
It reduces storage and training costs while maintaining performance.
Our experiments on the Spider dataset, specifically with large-scale LMs, achieved an exceptional accuracy (EX) of 82.6%.
arXiv Detail & Related papers (2024-06-13T14:04:34Z) - CHESS: Contextual Harnessing for Efficient SQL Synthesis [1.9506402593665235]
We propose a new pipeline that retrieves relevant data and context, selects an efficient schema, and synthesizes correct and efficient queries.
Our method achieves new state-of-the-art performance on the cross-domain challenging BIRD dataset.
arXiv Detail & Related papers (2024-05-27T01:54:16Z) - CodeS: Towards Building Open-source Language Models for Text-to-SQL [42.11113113574589]
We introduce CodeS, a series of pre-trained language models with parameters ranging from 1B to 15B.
CodeS is a fully open language model, which achieves superior accuracy with much smaller parameter sizes.
We conduct comprehensive evaluations on multiple datasets, including the widely used Spider benchmark.
arXiv Detail & Related papers (2024-02-26T07:00:58Z) - SQL-PaLM: Improved Large Language Model Adaptation for Text-to-SQL (extended) [53.95151604061761]
This paper introduces the framework for enhancing Text-to- filtering using large language models (LLMs)
With few-shot prompting, we explore the effectiveness of consistency decoding with execution-based error analyses.
With instruction fine-tuning, we delve deep in understanding the critical paradigms that influence the performance of tuned LLMs.
arXiv Detail & Related papers (2023-05-26T21:39:05Z) - Graphix-T5: Mixing Pre-Trained Transformers with Graph-Aware Layers for
Text-to-SQL Parsing [56.232873134174056]
One of the major challenges in text-to-text parsing is domain generalization, i.e., how to well generalize to unseen databases.
In this work, we explore ways to further augment the pre-trained text-to-text transformer model with specialized components for text-to-text parsing.
To this end, we propose a new architecture GRAPHIX-T5, augmented by some specially-designed graph-aware model with layers.
arXiv Detail & Related papers (2023-01-18T13:29:05Z) - N-Best Hypotheses Reranking for Text-To-SQL Systems [6.966624873109535]
Text-to- task maps natural language utterances to structured queries.
State-of-the-art (SOTA) systems rely on finetuning large, pre-trained language models.
Findings show significant potential improvements with reranking.
arXiv Detail & Related papers (2022-10-19T15:35:06Z) - SUN: Exploring Intrinsic Uncertainties in Text-to-SQL Parsers [61.48159785138462]
This paper aims to improve the performance of text-to-dependence by exploring the intrinsic uncertainties in the neural network based approaches (called SUN)
Extensive experiments on five benchmark datasets demonstrate that our method significantly outperforms competitors and achieves new state-of-the-art results.
arXiv Detail & Related papers (2022-09-14T06:27:51Z) - Proton: Probing Schema Linking Information from Pre-trained Language
Models for Text-to-SQL Parsing [66.55478402233399]
We propose a framework to elicit relational structures via a probing procedure based on Poincar'e distance metric.
Compared with commonly-used rule-based methods for schema linking, we found that probing relations can robustly capture semantic correspondences.
Our framework sets new state-of-the-art performance on three benchmarks.
arXiv Detail & Related papers (2022-06-28T14:05:25Z) - GraPPa: Grammar-Augmented Pre-Training for Table Semantic Parsing [117.98107557103877]
We present GraPPa, an effective pre-training approach for table semantic parsing.
We construct synthetic question-pairs over high-free tables via a synchronous context-free grammar.
To maintain the model's ability to represent real-world data, we also include masked language modeling.
arXiv Detail & Related papers (2020-09-29T08:17:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.