Natural SQL: Making SQL Easier to Infer from Natural Language
Specifications
- URL: http://arxiv.org/abs/2109.05153v1
- Date: Sat, 11 Sep 2021 01:53:55 GMT
- Title: Natural SQL: Making SQL Easier to Infer from Natural Language
Specifications
- Authors: Yujian Gan and Xinyun Chen and Jinxia Xie and Matthew Purver and John
R. Woodward and John Drake and Qiaofu Zhang
- Abstract summary: We propose an SQL intermediate representation called Natural SQL (Nat)
On Spider, a challenging text-to- schema benchmark, we demonstrate that Nat outperforms other IRs, and significantly improves the performance of several previous SOTA models.
For existing models that do not support executable generation, Nat easily enables them to generate executable queries, and achieves the new state-of-the-art execution accuracy.
- Score: 15.047104267689052
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Addressing the mismatch between natural language descriptions and the
corresponding SQL queries is a key challenge for text-to-SQL translation. To
bridge this gap, we propose an SQL intermediate representation (IR) called
Natural SQL (NatSQL). Specifically, NatSQL preserves the core functionalities
of SQL, while it simplifies the queries as follows: (1) dispensing with
operators and keywords such as GROUP BY, HAVING, FROM, JOIN ON, which are
usually hard to find counterparts for in the text descriptions; (2) removing
the need for nested subqueries and set operators; and (3) making schema linking
easier by reducing the required number of schema items. On Spider, a
challenging text-to-SQL benchmark that contains complex and nested SQL queries,
we demonstrate that NatSQL outperforms other IRs, and significantly improves
the performance of several previous SOTA models. Furthermore, for existing
models that do not support executable SQL generation, NatSQL easily enables
them to generate executable SQL queries, and achieves the new state-of-the-art
execution accuracy.
Related papers
- KeyInst: Keyword Instruction for Improving SQL Formulation in Text-to-SQL [0.5755004576310334]
KeyInst provides guidance on pivotal keywords likely to be part of the final query.
We develop StrucQL, a benchmark specifically designed for the evaluation of SQL formulation.
arXiv Detail & Related papers (2024-10-18T02:45:36Z) - SQLPrompt: In-Context Text-to-SQL with Minimal Labeled Data [54.69489315952524]
"Prompt" is designed to improve the few-shot prompting capabilities of Text-to-LLMs.
"Prompt" outperforms previous approaches for in-context learning with few labeled data by a large margin.
We show that emphPrompt outperforms previous approaches for in-context learning with few labeled data by a large margin.
arXiv Detail & Related papers (2023-11-06T05:24:06Z) - SQLformer: Deep Auto-Regressive Query Graph Generation for Text-to-SQL Translation [16.07396492960869]
We introduce a novel Transformer architecture specifically crafted to perform text-to-gressive translation tasks.
Our model predicts queries as abstract syntax trees (ASTs) in an autore way, incorporating structural inductive bias in the executable and decoder layers.
arXiv Detail & Related papers (2023-10-27T00:13:59Z) - UNITE: A Unified Benchmark for Text-to-SQL Evaluation [72.72040379293718]
We introduce a UNIfied benchmark for Text-to-domain systems.
It is composed of publicly available text-to-domain datasets and 29K databases.
Compared to the widely used Spider benchmark, we introduce a threefold increase in SQL patterns.
arXiv Detail & Related papers (2023-05-25T17:19:52Z) - Towards Generalizable and Robust Text-to-SQL Parsing [77.18724939989647]
We propose a novel TKK framework consisting of Task decomposition, Knowledge acquisition, and Knowledge composition to learn text-to- parsing in stages.
We show that our framework is effective in all scenarios and state-of-the-art performance on the Spider, SParC, and Co. datasets.
arXiv Detail & Related papers (2022-10-23T09:21:27Z) - A Survey on Text-to-SQL Parsing: Concepts, Methods, and Future
Directions [102.8606542189429]
The goal of text-to-corpora parsing is to convert a natural language (NL) question to its corresponding structured query language () based on the evidences provided by databases.
Deep neural networks have significantly advanced this task by neural generation models, which automatically learn a mapping function from an input NL question to an output query.
arXiv Detail & Related papers (2022-08-29T14:24:13Z) - S$^2$SQL: Injecting Syntax to Question-Schema Interaction Graph Encoder
for Text-to-SQL Parsers [66.78665327694625]
We propose S$2$, injecting Syntax to question- encoder graph for Text-to- relational parsing.
We also employ the decoupling constraint to induce diverse edge embedding, which further improves the network's performance.
Experiments on the Spider and robustness setting Spider-Syn demonstrate that the proposed approach outperforms all existing methods when pre-training models are used.
arXiv Detail & Related papers (2022-03-14T09:49:15Z) - Weakly Supervised Text-to-SQL Parsing through Question Decomposition [53.22128541030441]
We take advantage of the recently proposed question meaning representation called QDMR.
Given questions, their QDMR structures (annotated by non-experts or automatically predicted) and the answers, we are able to automatically synthesizesql queries.
Our results show that the weakly supervised models perform competitively with those trained on NL- benchmark data.
arXiv Detail & Related papers (2021-12-12T20:02:42Z) - RYANSQL: Recursively Applying Sketch-based Slot Fillings for Complex
Text-to-SQL in Cross-Domain Databases [6.349764856675643]
We present a neural network approach called RYAN to solve Text-to- sketch tasks for cross-domain databases.
RYAN achieved 58.2% accuracy on the challenging Spider benchmark, which is a 3.2%p improvement over previous state-of-the-art approaches.
arXiv Detail & Related papers (2020-04-07T04:51:04Z) - Bertrand-DR: Improving Text-to-SQL using a Discriminative Re-ranker [1.049360126069332]
We propose a novel discnative re-ranker to improve the performance of generative text-to-rimi models.
We analyze relative strengths of the text-to-rimi and re-ranker models for optimal performance.
We demonstrate the effectiveness of the re-ranker by applying it to two state-of-the-art text-to-rimi models.
arXiv Detail & Related papers (2020-02-03T04:52:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.