Related papers: Text-to-SQL Error Correction with Language Models of Code

Text-to-SQL Error Correction with Language Models of Code

URL: http://arxiv.org/abs/2305.13073v2
Date: Sun, 28 May 2023 15:32:26 GMT
Title: Text-to-SQL Error Correction with Language Models of Code
Authors: Ziru Chen, Shijie Chen, Michael White, Raymond Mooney, Ali Payani, Jayanth Srinivasa, Yu Su, Huan Sun
Abstract summary: In this paper, we investigate how to build automatic text-to-corpora error correction models. Noticing that token-level edits are out of context and sometimes ambiguous, we propose building clause-level edit models instead.
Score: 24.743066730684742
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Despite recent progress in text-to-SQL parsing, current semantic parsers are still not accurate enough for practical use. In this paper, we investigate how to build automatic text-to-SQL error correction models. Noticing that token-level edits are out of context and sometimes ambiguous, we propose building clause-level edit models instead. Besides, while most language models of code are not specifically pre-trained for SQL, they know common data structures and their operations in programming languages such as Python. Thus, we propose a novel representation for SQL queries and their edits that adheres more closely to the pre-training corpora of language models of code. Our error correction model improves the exact set match accuracy of different parsers by 2.4-6.5 and obtains up to 4.3 point absolute improvement over two strong baselines. Our code and data are available at https://github.com/OSU-NLP-Group/Auto-SQL-Correction.

Related papers

RetrySQL: text-to-SQL training with retry data for self-correcting query generation [1.6707278580444538]
We introduce Retry, a new approach to training text-to-generation models.<n>We demonstrate that retry steps yield an improvement of up to 4 percentage points in both overall and challenging execution accuracy metrics.
arXiv Detail & Related papers (2025-07-03T11:00:49Z)
SQLens: An End-to-End Framework for Error Detection and Correction in Text-to-SQL [20.93676525997898]
We propose an end-to-end framework for fine-grained detection and correction of semantic errors in large language models (LLMs) generated by text-to-the-box systems.<n>We show that our framework outperforms the best LLM-based self-evaluation method by 25.78% in F1 for error detection, and improves execution accuracy of out-of-the-box systems by up to 20%.
arXiv Detail & Related papers (2025-06-04T22:25:47Z)
ExeSQL: Self-Taught Text-to-SQL Models with Execution-Driven Bootstrapping for SQL Dialects [24.450818792474216]
This work introduces Exe, a text-to-guided framework with execution-driven, agentic bootstrapping.<n>We show that Exe bridges the dialect gap in text-to-guided learning, achieving average improvements of 15.2%, 10.38%, and 4.49% over GPT-4o on, and Oracle, respectively.
arXiv Detail & Related papers (2025-05-22T19:13:34Z)
Valid Text-to-SQL Generation with Unification-based DeepStochLog [13.798222228959132]
We propose a neurosymbolic framework that imposes syntax and schema constraints with unification-based definite clause grammars. Our framework also builds a bi-directional interface to language models to leverage their natural language understanding abilities. This work is the first step towards extending language models with unification-based grammars.
arXiv Detail & Related papers (2025-03-17T16:21:10Z)
SQL-GEN: Bridging the Dialect Gap for Text-to-SQL Via Synthetic Data And Model Merging [30.306023265985658]
We introduce a framework for generating high-quality synthetic training data for any dialect. We propose a novel Mixture-of-Experts (MoE) that leverages the shared knowledge across dialects.
arXiv Detail & Related papers (2024-08-22T20:50:48Z)
TrustSQL: Benchmarking Text-to-SQL Reliability with Penalty-Based Scoring [11.78795632771211]
We introduce a novel benchmark designed to evaluate text-to- reliability as a model's ability to correctly handle any type of input question. We evaluate existing methods using a novel penalty-based scoring metric with two modeling approaches.
arXiv Detail & Related papers (2024-03-23T16:12:52Z)
A Multilingual Translator to SQL with Database Schema Pruning to Improve Self-Attention [0.0]
We present techniques that allow long text sequences to be handled by transformers with up to 512 input tokens. In addition, we used a multilingual approach with the mT5-large model fine-tuned with a data-augmented Spider dataset in four languages simultaneously.
arXiv Detail & Related papers (2023-06-25T14:28:12Z)
UNITE: A Unified Benchmark for Text-to-SQL Evaluation [72.72040379293718]
We introduce a UNIfied benchmark for Text-to-domain systems. It is composed of publicly available text-to-domain datasets and 29K databases. Compared to the widely used Spider benchmark, we introduce a threefold increase in SQL patterns.
arXiv Detail & Related papers (2023-05-25T17:19:52Z)
Error Detection for Text-to-SQL Semantic Parsing [18.068244400731366]
Modern text-to- semantics are often over-confident, casting doubt on their trustworthiness when deployed for real use. We propose a-independent error detection model for text-to- semantic parsing.
arXiv Detail & Related papers (2023-05-23T04:44:22Z)
XRICL: Cross-lingual Retrieval-Augmented In-Context Learning for Cross-lingual Text-to-SQL Semantic Parsing [70.40401197026925]
In-context learning using large language models has recently shown surprising results for semantic parsing tasks. This work introduces the XRICL framework, which learns to retrieve relevant English exemplars for a given query. We also include global translation exemplars for a target language to facilitate the translation process for large language models.
arXiv Detail & Related papers (2022-10-25T01:33:49Z)
Towards Generalizable and Robust Text-to-SQL Parsing [77.18724939989647]
We propose a novel TKK framework consisting of Task decomposition, Knowledge acquisition, and Knowledge composition to learn text-to- parsing in stages. We show that our framework is effective in all scenarios and state-of-the-art performance on the Spider, SParC, and Co. datasets.
arXiv Detail & Related papers (2022-10-23T09:21:27Z)
A Survey on Text-to-SQL Parsing: Concepts, Methods, and Future Directions [102.8606542189429]
The goal of text-to-corpora parsing is to convert a natural language (NL) question to its corresponding structured query language () based on the evidences provided by databases. Deep neural networks have significantly advanced this task by neural generation models, which automatically learn a mapping function from an input NL question to an output query.
arXiv Detail & Related papers (2022-08-29T14:24:13Z)
Weakly Supervised Text-to-SQL Parsing through Question Decomposition [53.22128541030441]
We take advantage of the recently proposed question meaning representation called QDMR. Given questions, their QDMR structures (annotated by non-experts or automatically predicted) and the answers, we are able to automatically synthesizesql queries. Our results show that the weakly supervised models perform competitively with those trained on NL- benchmark data.
arXiv Detail & Related papers (2021-12-12T20:02:42Z)
Photon: A Robust Cross-Domain Text-to-SQL System [189.1405317853752]
We present Photon, a robust, modular, cross-domain NLIDB that can flag natural language input to which a mapping cannot be immediately determined. The proposed method effectively improves the robustness of text-to-native system against untranslatable user input.
arXiv Detail & Related papers (2020-07-30T07:44:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.