Related papers: MT-Teql: Evaluating and Augmenting Consistency of Text-to-SQL Models with Metamorphic Testing

MT-Teql: Evaluating and Augmenting Consistency of Text-to-SQL Models with Metamorphic Testing

URL: http://arxiv.org/abs/2012.11163v1
Date: Mon, 21 Dec 2020 07:43:31 GMT
Title: MT-Teql: Evaluating and Augmenting Consistency of Text-to-SQL Models with Metamorphic Testing
Authors: Pingchuan Ma and Shuai Wang
Abstract summary: We propose MT-Teql, a Metamorphic Testing-based framework for evaluating and augmenting the consistency of text-to-preserving models. Our framework exposes thousands of prediction errors from SOTA models and enriches existing datasets by order of magnitude, eliminating over 40% inconsistency errors without compromising standard accuracy.
Score: 11.566463879334862
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Text-to-SQL is a task to generate SQL queries from human utterances. However, due to the variation of natural language, two semantically equivalent utterances may appear differently in the lexical level. Likewise, user preferences (e.g., the choice of normal forms) can lead to dramatic changes in table structures when expressing conceptually identical schemas. Envisioning the general difficulty for text-to-SQL models to preserve prediction consistency against linguistic and schema variations, we propose MT-Teql, a Metamorphic Testing-based framework for systematically evaluating and augmenting the consistency of TExt-to-SQL models. Inspired by the principles of software metamorphic testing, MT-Teql delivers a model-agnostic framework which implements a comprehensive set of metamorphic relations (MRs) to conduct semantics-preserving transformations toward utterances and schemas. Model Inconsistency can be exposed when the original and transformed inputs induce different SQL queries. In addition, we leverage the transformed inputs to retrain models for further model robustness boost. Our experiments show that our framework exposes thousands of prediction errors from SOTA models and enriches existing datasets by order of magnitude, eliminating over 40% inconsistency errors without compromising standard accuracy.

Related papers

Disambiguate First Parse Later: Generating Interpretations for Ambiguity Resolution in Semantic Parsing [56.82807063333088]
We propose a modular approach that resolves ambiguity using natural language interpretations before mapping these to logical forms. Our approach improves interpretation coverage and generalizes across datasets with different annotation styles, database structures, and ambiguity types.
arXiv Detail & Related papers (2025-02-25T18:42:26Z)
Rationalization Models for Text-to-SQL [13.792561265515003]
We introduce a framework for generating Chain-of-Thought (CoT) rationales to enhance text-to-thought model fine-tuning. The process begins with manually annotating a small set of examples, which are then used to prompt a large language model. A rationalization model is subsequently trained on the validated queries, enabling extensive synthetic CoT annotations.
arXiv Detail & Related papers (2025-02-10T18:38:57Z)
Exploring the Compositional Generalization in Context Dependent Text-to-SQL Parsing [14.644212594593919]
This work is the first exploration of compositional generalization in context-dependent Text-to-the-scenarios. Experiments show that all current models struggle on our proposed benchmarks. We propose a method named textttp-align to improve the compositional generalization of Text-to-the-scenarios.
arXiv Detail & Related papers (2023-05-29T12:36:56Z)
Conversational Text-to-SQL: An Odyssey into State-of-the-Art and Challenges Ahead [6.966624873109535]
State-of-the-art (SOTA) systems use large, pre-trained and finetuned language models, such as the T5-family. With multi-tasking (MT) over coherent tasks with discrete prompts during training, we improve over specialized text-to-three models. We conduct studies to tease apart errors attributable to domain and compositional generalization.
arXiv Detail & Related papers (2023-02-21T23:15:33Z)
Dr.Spider: A Diagnostic Evaluation Benchmark towards Text-to-SQL Robustness [115.66421993459663]
Recent studies reveal that text-to- models are vulnerable to task-specific perturbations. We propose a comprehensive robustness benchmark based on Spider to diagnose the model. We conduct a diagnostic study of the state-of-the-art models on the set.
arXiv Detail & Related papers (2023-01-21T03:57:18Z)
Improving Text-to-SQL Semantic Parsing with Fine-grained Query Understanding [84.04706075621013]
We present a general-purpose, modular neural semantic parsing framework based on token-level fine-grained query understanding. Our framework consists of three modules: named entity recognizer (NER), neural entity linker (NEL) and neural entity linker (NSP)
arXiv Detail & Related papers (2022-09-28T21:00:30Z)
SUN: Exploring Intrinsic Uncertainties in Text-to-SQL Parsers [61.48159785138462]
This paper aims to improve the performance of text-to-dependence by exploring the intrinsic uncertainties in the neural network based approaches (called SUN) Extensive experiments on five benchmark datasets demonstrate that our method significantly outperforms competitors and achieves new state-of-the-art results.
arXiv Detail & Related papers (2022-09-14T06:27:51Z)
Towards Robustness of Text-to-SQL Models against Synonym Substitution [15.047104267689052]
We introduce Spider-Syn, a dataset based on the Spider benchmark for text-to-world question translation. We observe that the accuracy dramatically drops by eliminating explicit correspondence between NL questions and table schemas. We present two categories of approaches to improve the model robustness.
arXiv Detail & Related papers (2021-06-02T10:36:23Z)
Learning Contextual Representations for Semantic Parsing with Generation-Augmented Pre-Training [86.91380874390778]
We present Generation-Augmented Pre-training (GAP), that jointly learns representations of natural language utterances and table schemas by leveraging generation models to generate pre-train data. Based on experimental results, neural semantics that leverage GAP MODEL obtain new state-of-the-art results on both SPIDER and CRITERIA-TO-generative benchmarks.
arXiv Detail & Related papers (2020-12-18T15:53:50Z)
Explicitly Modeling Syntax in Language Models with Incremental Parsing and a Dynamic Oracle [88.65264818967489]
We propose a new syntax-aware language model: Syntactic Ordered Memory (SOM) The model explicitly models the structure with an incremental and maintains the conditional probability setting of a standard language model. Experiments show that SOM can achieve strong results in language modeling, incremental parsing and syntactic generalization tests.
arXiv Detail & Related papers (2020-10-21T17:39:15Z)
GraPPa: Grammar-Augmented Pre-Training for Table Semantic Parsing [117.98107557103877]
We present GraPPa, an effective pre-training approach for table semantic parsing. We construct synthetic question-pairs over high-free tables via a synchronous context-free grammar. To maintain the model's ability to represent real-world data, we also include masked language modeling.
arXiv Detail & Related papers (2020-09-29T08:17:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.