Related papers: On Repairing Natural Language to SQL Queries

On Repairing Natural Language to SQL Queries

URL: http://arxiv.org/abs/2310.03866v1
Date: Thu, 5 Oct 2023 19:50:52 GMT
Title: On Repairing Natural Language to SQL Queries
Authors: Aidan Z.H. Yang, Ricardo Brancas, Pedro Esteves, Sofia Aparicio, Joao Pedro Nadkarni, Miguel Terra-Neves, Vasco Manquinho, Ruben Martins
Abstract summary: We analyze when text-to- tools fail to return the correct query. It is often the case that the returned query is close to a correct query. We propose to repair these failing queries using a mutation-based approach.
Score: 2.5442795971328307
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Data analysts use SQL queries to access and manipulate data on their databases. However, these queries are often challenging to write, and small mistakes can lead to unexpected data output. Recent work has explored several ways to automatically synthesize queries based on a user-provided specification. One promising technique called text-to-SQL consists of the user providing a natural language description of the intended behavior and the database's schema. Even though text-to-SQL tools are becoming more accurate, there are still many instances where they fail to produce the correct query. In this paper, we analyze when text-to-SQL tools fail to return the correct query and show that it is often the case that the returned query is close to a correct query. We propose to repair these failing queries using a mutation-based approach that is agnostic to the text-to-SQL tool being used. We evaluate our approach on two recent text-to-SQL tools, RAT-SQL and SmBoP, and show that our approach can repair a significant number of failing queries.

Related papers

SQLens: An End-to-End Framework for Error Detection and Correction in Text-to-SQL [20.93676525997898]
We propose an end-to-end framework for fine-grained detection and correction of semantic errors in large language models (LLMs) generated by text-to-the-box systems.<n>We show that our framework outperforms the best LLM-based self-evaluation method by 25.78% in F1 for error detection, and improves execution accuracy of out-of-the-box systems by up to 20%.
arXiv Detail & Related papers (2025-06-04T22:25:47Z)
E-SQL: Direct Schema Linking via Question Enrichment in Text-to-SQL [1.187832944550453]
We introduce E- repository, a novel pipeline designed to address challenges through direct schema linking and candidate predicate augmentation. E- enhances the natural language query by incorporating relevant database items (i.e. tables, columns, and values) and conditions directly into the question, bridging the gap between the query and the database structure. We investigate the impact of schema filtering, a technique widely explored in previous work, and demonstrate its diminishing returns when applied alongside advanced large language models.
arXiv Detail & Related papers (2024-09-25T09:02:48Z)
DAC: Decomposed Automation Correction for Text-to-SQL [51.48239006107272]
We introduce De Automation Correction (DAC), which corrects text-to-composed by decomposing entity linking and skeleton parsing. We show that our method improves performance by $3.7%$ on average of Spider, Bird, and KaggleDBQA compared with the baseline method.
arXiv Detail & Related papers (2024-08-16T14:43:15Z)
SQLFixAgent: Towards Semantic-Accurate Text-to-SQL Parsing via Consistency-Enhanced Multi-Agent Collaboration [26.193588535592767]
We introduce a new consistency-enhanced multi-agent collaborative framework designed for detecting and repairing erroneous SQL. We evaluate our proposed framework on five Text-to-text benchmarks, specifically achieving an improvement of over 3% on the Bird benchmark. Our framework also has a higher token efficiency compared to other advanced methods, making it more competitive.
arXiv Detail & Related papers (2024-06-19T09:57:19Z)
SQLPrompt: In-Context Text-to-SQL with Minimal Labeled Data [54.69489315952524]
"Prompt" is designed to improve the few-shot prompting capabilities of Text-to-LLMs. "Prompt" outperforms previous approaches for in-context learning with few labeled data by a large margin. We show that emphPrompt outperforms previous approaches for in-context learning with few labeled data by a large margin.
arXiv Detail & Related papers (2023-11-06T05:24:06Z)
SQLformer: Deep Auto-Regressive Query Graph Generation for Text-to-SQL Translation [16.07396492960869]
We introduce a novel Transformer architecture specifically crafted to perform text-to-gressive translation tasks. Our model predicts queries as abstract syntax trees (ASTs) in an autore way, incorporating structural inductive bias in the executable and decoder layers.
arXiv Detail & Related papers (2023-10-27T00:13:59Z)
SQL-PaLM: Improved Large Language Model Adaptation for Text-to-SQL (extended) [53.95151604061761]
This paper introduces the framework for enhancing Text-to- filtering using large language models (LLMs) With few-shot prompting, we explore the effectiveness of consistency decoding with execution-based error analyses. With instruction fine-tuning, we delve deep in understanding the critical paradigms that influence the performance of tuned LLMs.
arXiv Detail & Related papers (2023-05-26T21:39:05Z)
UNITE: A Unified Benchmark for Text-to-SQL Evaluation [72.72040379293718]
We introduce a UNIfied benchmark for Text-to-domain systems. It is composed of publicly available text-to-domain datasets and 29K databases. Compared to the widely used Spider benchmark, we introduce a threefold increase in SQL patterns.
arXiv Detail & Related papers (2023-05-25T17:19:52Z)
Interactive Text-to-SQL Generation via Editable Step-by-Step Explanations [31.3376894001311]
We introduce a new interaction mechanism that allows users to directly edit a step-by-step explanation of a query to fix errors. Our experiments on multiple datasets, as well as a user with 24 participants, demonstrate that our approach can achieve better than multiple SOTA approaches.
arXiv Detail & Related papers (2023-05-12T10:45:29Z)
Know What I don't Know: Handling Ambiguous and Unanswerable Questions for Text-to-SQL [36.5089235153207]
Existing text-to-yourselfs generate a "plausible" query for an arbitrary user question. We propose a simple yet effective generation approach that automatically produces ambiguous and unanswerable examples. Experimental results show that our model achieves the best result on both real-world examples and generated examples.
arXiv Detail & Related papers (2022-12-17T15:32:00Z)
A Survey on Text-to-SQL Parsing: Concepts, Methods, and Future Directions [102.8606542189429]
The goal of text-to-corpora parsing is to convert a natural language (NL) question to its corresponding structured query language () based on the evidences provided by databases. Deep neural networks have significantly advanced this task by neural generation models, which automatically learn a mapping function from an input NL question to an output query.
arXiv Detail & Related papers (2022-08-29T14:24:13Z)
Weakly Supervised Text-to-SQL Parsing through Question Decomposition [53.22128541030441]
We take advantage of the recently proposed question meaning representation called QDMR. Given questions, their QDMR structures (annotated by non-experts or automatically predicted) and the answers, we are able to automatically synthesizesql queries. Our results show that the weakly supervised models perform competitively with those trained on NL- benchmark data.
arXiv Detail & Related papers (2021-12-12T20:02:42Z)
Bertrand-DR: Improving Text-to-SQL using a Discriminative Re-ranker [1.049360126069332]
We propose a novel discnative re-ranker to improve the performance of generative text-to-rimi models. We analyze relative strengths of the text-to-rimi and re-ranker models for optimal performance. We demonstrate the effectiveness of the re-ranker by applying it to two state-of-the-art text-to-rimi models.
arXiv Detail & Related papers (2020-02-03T04:52:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.