Towards Robustness of Text-to-SQL Models against Synonym Substitution
- URL: http://arxiv.org/abs/2106.01065v1
- Date: Wed, 2 Jun 2021 10:36:23 GMT
- Title: Towards Robustness of Text-to-SQL Models against Synonym Substitution
- Authors: Yujian Gan, Xinyun Chen, Qiuping Huang, Matthew Purver, John R.
Woodward, Jinxia Xie, Pengsheng Huang
- Abstract summary: We introduce Spider-Syn, a dataset based on the Spider benchmark for text-to-world question translation.
We observe that the accuracy dramatically drops by eliminating explicit correspondence between NL questions and table schemas.
We present two categories of approaches to improve the model robustness.
- Score: 15.047104267689052
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, there has been significant progress in studying neural networks to
translate text descriptions into SQL queries. Despite achieving good
performance on some public benchmarks, existing text-to-SQL models typically
rely on the lexical matching between words in natural language (NL) questions
and tokens in table schemas, which may render the models vulnerable to attacks
that break the schema linking mechanism. In this work, we investigate the
robustness of text-to-SQL models to synonym substitution. In particular, we
introduce Spider-Syn, a human-curated dataset based on the Spider benchmark for
text-to-SQL translation. NL questions in Spider-Syn are modified from Spider,
by replacing their schema-related words with manually selected synonyms that
reflect real-world question paraphrases. We observe that the accuracy
dramatically drops by eliminating such explicit correspondence between NL
questions and table schemas, even if the synonyms are not adversarially
selected to conduct worst-case adversarial attacks. Finally, we present two
categories of approaches to improve the model robustness. The first category of
approaches utilizes additional synonym annotations for table schemas by
modifying the model input, while the second category is based on adversarial
training. We demonstrate that both categories of approaches significantly
outperform their counterparts without the defense, and the first category of
approaches are more effective.
Related papers
- Towards Robustness of Text-to-SQL Models Against Natural and Realistic
Adversarial Table Perturbation [38.00832631674398]
We introduce the Adversarial Table Perturbation (ATP) as a new attacking paradigm to measure the robustness of Text-to-textual models.
We build a systematic adversarial training example generation framework for better contextualization of data.
Experiments show that our approach not only brings the best improvement against table-side perturbations but also substantially empowers models against NL-side perturbations.
arXiv Detail & Related papers (2022-12-20T04:38:23Z) - Importance of Synthesizing High-quality Data for Text-to-SQL Parsing [71.02856634369174]
State-of-the-art text-to-weighted algorithms did not further improve on popular benchmarks when trained with augmented synthetic data.
We propose a novel framework that incorporates key relationships from schema, imposes strong typing, and schema-weighted column sampling.
arXiv Detail & Related papers (2022-12-17T02:53:21Z) - Improving Text-to-SQL Semantic Parsing with Fine-grained Query
Understanding [84.04706075621013]
We present a general-purpose, modular neural semantic parsing framework based on token-level fine-grained query understanding.
Our framework consists of three modules: named entity recognizer (NER), neural entity linker (NEL) and neural entity linker (NSP)
arXiv Detail & Related papers (2022-09-28T21:00:30Z) - SUN: Exploring Intrinsic Uncertainties in Text-to-SQL Parsers [61.48159785138462]
This paper aims to improve the performance of text-to-dependence by exploring the intrinsic uncertainties in the neural network based approaches (called SUN)
Extensive experiments on five benchmark datasets demonstrate that our method significantly outperforms competitors and achieves new state-of-the-art results.
arXiv Detail & Related papers (2022-09-14T06:27:51Z) - S$^2$SQL: Injecting Syntax to Question-Schema Interaction Graph Encoder
for Text-to-SQL Parsers [66.78665327694625]
We propose S$2$, injecting Syntax to question- encoder graph for Text-to- relational parsing.
We also employ the decoupling constraint to induce diverse edge embedding, which further improves the network's performance.
Experiments on the Spider and robustness setting Spider-Syn demonstrate that the proposed approach outperforms all existing methods when pre-training models are used.
arXiv Detail & Related papers (2022-03-14T09:49:15Z) - ShadowGNN: Graph Projection Neural Network for Text-to-SQL Parser [36.12921337235763]
We propose a new architecture, ShadowGNN, which processes schemas at abstract and semantic levels.
On the challenging Text-to-Spider benchmark, empirical results show that ShadowGNN outperforms state-of-the-art models.
arXiv Detail & Related papers (2021-04-10T05:48:28Z) - MT-Teql: Evaluating and Augmenting Consistency of Text-to-SQL Models
with Metamorphic Testing [11.566463879334862]
We propose MT-Teql, a Metamorphic Testing-based framework for evaluating and augmenting the consistency of text-to-preserving models.
Our framework exposes thousands of prediction errors from SOTA models and enriches existing datasets by order of magnitude, eliminating over 40% inconsistency errors without compromising standard accuracy.
arXiv Detail & Related papers (2020-12-21T07:43:31Z) - Learning Contextual Representations for Semantic Parsing with
Generation-Augmented Pre-Training [86.91380874390778]
We present Generation-Augmented Pre-training (GAP), that jointly learns representations of natural language utterances and table schemas by leveraging generation models to generate pre-train data.
Based on experimental results, neural semantics that leverage GAP MODEL obtain new state-of-the-art results on both SPIDER and CRITERIA-TO-generative benchmarks.
arXiv Detail & Related papers (2020-12-18T15:53:50Z) - GraPPa: Grammar-Augmented Pre-Training for Table Semantic Parsing [117.98107557103877]
We present GraPPa, an effective pre-training approach for table semantic parsing.
We construct synthetic question-pairs over high-free tables via a synchronous context-free grammar.
To maintain the model's ability to represent real-world data, we also include masked language modeling.
arXiv Detail & Related papers (2020-09-29T08:17:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.