Text-to-SQL Error Correction with Language Models of Code
- URL: http://arxiv.org/abs/2305.13073v2
- Date: Sun, 28 May 2023 15:32:26 GMT
- Title: Text-to-SQL Error Correction with Language Models of Code
- Authors: Ziru Chen, Shijie Chen, Michael White, Raymond Mooney, Ali Payani,
Jayanth Srinivasa, Yu Su, Huan Sun
- Abstract summary: In this paper, we investigate how to build automatic text-to-corpora error correction models.
Noticing that token-level edits are out of context and sometimes ambiguous, we propose building clause-level edit models instead.
- Score: 24.743066730684742
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite recent progress in text-to-SQL parsing, current semantic parsers are
still not accurate enough for practical use. In this paper, we investigate how
to build automatic text-to-SQL error correction models. Noticing that
token-level edits are out of context and sometimes ambiguous, we propose
building clause-level edit models instead. Besides, while most language models
of code are not specifically pre-trained for SQL, they know common data
structures and their operations in programming languages such as Python. Thus,
we propose a novel representation for SQL queries and their edits that adheres
more closely to the pre-training corpora of language models of code. Our error
correction model improves the exact set match accuracy of different parsers by
2.4-6.5 and obtains up to 4.3 point absolute improvement over two strong
baselines. Our code and data are available at
https://github.com/OSU-NLP-Group/Auto-SQL-Correction.
Related papers
- SQL-GEN: Bridging the Dialect Gap for Text-to-SQL Via Synthetic Data And Model Merging [30.306023265985658]
We introduce a framework for generating high-quality synthetic training data for any dialect.
We propose a novel Mixture-of-Experts (MoE) that leverages the shared knowledge across dialects.
arXiv Detail & Related papers (2024-08-22T20:50:48Z) - TrustSQL: Benchmarking Text-to-SQL Reliability with Penalty-Based Scoring [11.78795632771211]
We introduce a novel benchmark designed to evaluate text-to- reliability as a model's ability to correctly handle any type of input question.
We evaluate existing methods using a novel penalty-based scoring metric with two modeling approaches.
arXiv Detail & Related papers (2024-03-23T16:12:52Z) - A Multilingual Translator to SQL with Database Schema Pruning to Improve
Self-Attention [0.0]
We present techniques that allow long text sequences to be handled by transformers with up to 512 input tokens.
In addition, we used a multilingual approach with the mT5-large model fine-tuned with a data-augmented Spider dataset in four languages simultaneously.
arXiv Detail & Related papers (2023-06-25T14:28:12Z) - UNITE: A Unified Benchmark for Text-to-SQL Evaluation [72.72040379293718]
We introduce a UNIfied benchmark for Text-to-domain systems.
It is composed of publicly available text-to-domain datasets and 29K databases.
Compared to the widely used Spider benchmark, we introduce a threefold increase in SQL patterns.
arXiv Detail & Related papers (2023-05-25T17:19:52Z) - Error Detection for Text-to-SQL Semantic Parsing [18.068244400731366]
Modern text-to- semantics are often over-confident, casting doubt on their trustworthiness when deployed for real use.
We propose a-independent error detection model for text-to- semantic parsing.
arXiv Detail & Related papers (2023-05-23T04:44:22Z) - XRICL: Cross-lingual Retrieval-Augmented In-Context Learning for
Cross-lingual Text-to-SQL Semantic Parsing [70.40401197026925]
In-context learning using large language models has recently shown surprising results for semantic parsing tasks.
This work introduces the XRICL framework, which learns to retrieve relevant English exemplars for a given query.
We also include global translation exemplars for a target language to facilitate the translation process for large language models.
arXiv Detail & Related papers (2022-10-25T01:33:49Z) - Towards Generalizable and Robust Text-to-SQL Parsing [77.18724939989647]
We propose a novel TKK framework consisting of Task decomposition, Knowledge acquisition, and Knowledge composition to learn text-to- parsing in stages.
We show that our framework is effective in all scenarios and state-of-the-art performance on the Spider, SParC, and Co. datasets.
arXiv Detail & Related papers (2022-10-23T09:21:27Z) - A Survey on Text-to-SQL Parsing: Concepts, Methods, and Future
Directions [102.8606542189429]
The goal of text-to-corpora parsing is to convert a natural language (NL) question to its corresponding structured query language () based on the evidences provided by databases.
Deep neural networks have significantly advanced this task by neural generation models, which automatically learn a mapping function from an input NL question to an output query.
arXiv Detail & Related papers (2022-08-29T14:24:13Z) - Weakly Supervised Text-to-SQL Parsing through Question Decomposition [53.22128541030441]
We take advantage of the recently proposed question meaning representation called QDMR.
Given questions, their QDMR structures (annotated by non-experts or automatically predicted) and the answers, we are able to automatically synthesizesql queries.
Our results show that the weakly supervised models perform competitively with those trained on NL- benchmark data.
arXiv Detail & Related papers (2021-12-12T20:02:42Z) - Photon: A Robust Cross-Domain Text-to-SQL System [189.1405317853752]
We present Photon, a robust, modular, cross-domain NLIDB that can flag natural language input to which a mapping cannot be immediately determined.
The proposed method effectively improves the robustness of text-to-native system against untranslatable user input.
arXiv Detail & Related papers (2020-07-30T07:44:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.