Data Augmentation with Hierarchical SQL-to-Question Generation for
Cross-domain Text-to-SQL Parsing
- URL: http://arxiv.org/abs/2103.02227v1
- Date: Wed, 3 Mar 2021 07:37:38 GMT
- Title: Data Augmentation with Hierarchical SQL-to-Question Generation for
Cross-domain Text-to-SQL Parsing
- Authors: Ao Zhang, Kun Wu, Lijie Wang, Zhenghua Li, Xinyan Xiao, Hua Wu, Min
Zhang, Haifeng Wang
- Abstract summary: This paper presents a simple yet effective data augmentation framework.
First, given a database, we automatically produce a large amount ofsql queries based on an abstract syntax tree grammar citeyintranx.
Second, we propose a hierarchicalsql-to-question generation model to obtain high-quality natural language questions.
- Score: 40.65143087243074
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Data augmentation has attracted a lot of research attention in the deep
learning era for its ability in alleviating data sparseness. The lack of data
for unseen evaluation databases is exactly the major challenge for cross-domain
text-to-SQL parsing. Previous works either require human intervention to
guarantee the quality of generated data \cite{yu2018syntaxsqlnet}, or fail to
handle complex SQL queries \cite{guo2018question}. This paper presents a simple
yet effective data augmentation framework. First, given a database, we
automatically produce a large amount of SQL queries based on an abstract syntax
tree grammar \cite{yin2018tranx}. We require the generated queries cover at
least 80\% of SQL patterns in the training data for better distribution
matching. Second, we propose a hierarchical SQL-to-question generation model to
obtain high-quality natural language questions, which is the major contribution
of this work. Experiments on three cross-domain datasets, i.e., WikiSQL and
Spider in English, and DuSQL in Chinese, show that our proposed data
augmentation framework can consistently improve performance over strong
baselines, and in particular the hierarchical generation model is the key for
the improvement.
Related papers
- SQLformer: Deep Auto-Regressive Query Graph Generation for Text-to-SQL Translation [16.07396492960869]
We introduce a novel Transformer architecture specifically crafted to perform text-to-gressive translation tasks.
Our model predicts queries as abstract syntax trees (ASTs) in an autore way, incorporating structural inductive bias in the executable and decoder layers.
arXiv Detail & Related papers (2023-10-27T00:13:59Z) - UNITE: A Unified Benchmark for Text-to-SQL Evaluation [72.72040379293718]
We introduce a UNIfied benchmark for Text-to-domain systems.
It is composed of publicly available text-to-domain datasets and 29K databases.
Compared to the widely used Spider benchmark, we introduce a threefold increase in SQL patterns.
arXiv Detail & Related papers (2023-05-25T17:19:52Z) - Augmenting Multi-Turn Text-to-SQL Datasets with Self-Play [46.07002748587857]
We explore augmenting the training datasets using self-play, which leverages contextual information to synthesize new interactions.
We find that self-play improves the accuracy of a strong baseline on SParC and Co, two widely used text-to-domain datasets.
arXiv Detail & Related papers (2022-10-21T16:40:07Z) - A Survey on Text-to-SQL Parsing: Concepts, Methods, and Future
Directions [102.8606542189429]
The goal of text-to-corpora parsing is to convert a natural language (NL) question to its corresponding structured query language () based on the evidences provided by databases.
Deep neural networks have significantly advanced this task by neural generation models, which automatically learn a mapping function from an input NL question to an output query.
arXiv Detail & Related papers (2022-08-29T14:24:13Z) - S$^2$SQL: Injecting Syntax to Question-Schema Interaction Graph Encoder
for Text-to-SQL Parsers [66.78665327694625]
We propose S$2$, injecting Syntax to question- encoder graph for Text-to- relational parsing.
We also employ the decoupling constraint to induce diverse edge embedding, which further improves the network's performance.
Experiments on the Spider and robustness setting Spider-Syn demonstrate that the proposed approach outperforms all existing methods when pre-training models are used.
arXiv Detail & Related papers (2022-03-14T09:49:15Z) - Dual Reader-Parser on Hybrid Textual and Tabular Evidence for Open
Domain Question Answering [78.9863753810787]
A large amount of world's knowledge is stored in structured databases.
query languages can answer questions that require complex reasoning, as well as offering full explainability.
arXiv Detail & Related papers (2021-08-05T22:04:13Z) - Bridging Textual and Tabular Data for Cross-Domain Text-to-SQL Semantic
Parsing [110.97778888305506]
BRIDGE represents the question and DB schema in a tagged sequence where a subset of the fields are augmented with cell values mentioned in the question.
BRIDGE attained state-of-the-art performance on popular cross-DB text-to- relational benchmarks.
Our analysis shows that BRIDGE effectively captures the desired cross-modal dependencies and has the potential to generalize to more text-DB related tasks.
arXiv Detail & Related papers (2020-12-23T12:33:52Z) - Bertrand-DR: Improving Text-to-SQL using a Discriminative Re-ranker [1.049360126069332]
We propose a novel discnative re-ranker to improve the performance of generative text-to-rimi models.
We analyze relative strengths of the text-to-rimi and re-ranker models for optimal performance.
We demonstrate the effectiveness of the re-ranker by applying it to two state-of-the-art text-to-rimi models.
arXiv Detail & Related papers (2020-02-03T04:52:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.