Related papers: SUBS: Subtree Substitution for Compositional Semantic Parsing

SUBS: Subtree Substitution for Compositional Semantic Parsing

URL: http://arxiv.org/abs/2205.01538v1
Date: Tue, 3 May 2022 14:47:35 GMT
Title: SUBS: Subtree Substitution for Compositional Semantic Parsing
Authors: Jingfeng Yang, Le Zhang, Diyi Yang
Abstract summary: We propose to use subtree substitution for compositional data augmentation, where we consider subtrees with similar semantic functions as exchangeable. Experiments showed that such augmented data led to significantly better performance on SCAN and GeoQuery, and reached new SOTA on compositional split of GeoQuery.
Score: 50.63574492655072
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Although sequence-to-sequence models often achieve good performance in semantic parsing for i.i.d. data, their performance is still inferior in compositional generalization. Several data augmentation methods have been proposed to alleviate this problem. However, prior work only leveraged superficial grammar or rules for data augmentation, which resulted in limited improvement. We propose to use subtree substitution for compositional data augmentation, where we consider subtrees with similar semantic functions as exchangeable. Our experiments showed that such augmented data led to significantly better performance on SCAN and GeoQuery, and reached new SOTA on compositional split of GeoQuery.

Related papers

Holistic Exploration on Universal Decompositional Semantic Parsing: Architecture, Data Augmentation, and LLM Paradigm [24.993992573870145]
We introduce a cascade model for UDS parsing that decomposes the complex parsing task into semantically appropriate subtasks. Our approach outperforms the prior models, while significantly reducing inference time. Different ways for data augmentation are explored, which further improve the UDS Parsing.
arXiv Detail & Related papers (2023-07-25T11:44:28Z)
GDA: Generative Data Augmentation Techniques for Relation Extraction Tasks [81.51314139202152]
We propose a dedicated augmentation technique for relational texts, named GDA, which uses two complementary modules to preserve both semantic consistency and syntax structures. Experimental results in three datasets under a low-resource setting showed that GDA could bring em 2.0% F1 improvements compared with no augmentation technique.
arXiv Detail & Related papers (2023-05-26T06:21:01Z)
Recursive Neural Networks with Bottlenecks Diagnose (Non-)Compositionality [65.60002535580298]
Quantifying compositionality of data is a challenging task, which has been investigated primarily for short utterances. We show that comparing data's representations in models with and without a bottleneck can be used to produce a compositionality metric. The procedure is applied to the evaluation of arithmetic expressions using synthetic data, and sentiment classification using natural language data.
arXiv Detail & Related papers (2023-01-31T15:46:39Z)
Syntax-driven Data Augmentation for Named Entity Recognition [3.0603554929274908]
In low resource settings, data augmentation strategies are commonly leveraged to improve performance. We compare simple masked language model replacement and an augmentation method using constituency tree mutations to improve named entity recognition.
arXiv Detail & Related papers (2022-08-15T01:24:55Z)
TreeMix: Compositional Constituency-based Data Augmentation for Natural Language Understanding [56.794981024301094]
We propose a compositional data augmentation approach for natural language understanding called TreeMix. Specifically, TreeMix leverages constituency parsing tree to decompose sentences into constituent sub-structures and the Mixup data augmentation technique to recombine them to generate new sentences. Compared with previous approaches, TreeMix introduces greater diversity to the samples generated and encourages models to learn compositionality of NLP data.
arXiv Detail & Related papers (2022-05-12T15:25:12Z)
Measuring and Improving Compositional Generalization in Text-to-SQL via Component Alignment [23.43452719573272]
We propose a clause-level compositional example generation method to generate compositional generalizations. We construct a dataset Spider-SS and Spider-CG to test the ability of models to generalize compositionally. Experiments show that existing models suffer significant performance degradation when evaluated on Spider-CG. We modify a number of state-of-the-art models to train on the segmented data of Spider-SS, and we show that this method improves the generalization performance.
arXiv Detail & Related papers (2022-05-04T13:29:17Z)
Learning to Synthesize Data for Semantic Parsing [57.190817162674875]
We propose a generative model which models the composition of programs and maps a program to an utterance. Due to the simplicity of PCFG and pre-trained BART, our generative model can be efficiently learned from existing data at hand. We evaluate our method in both in-domain and out-of-domain settings of text-to-Query parsing on the standard benchmarks of GeoQuery and Spider.
arXiv Detail & Related papers (2021-04-12T21:24:02Z)
Substructure Substitution: Structured Data Augmentation for NLP [55.69800855705232]
SUB2 generates new examples by substituting substructures with ones with the same label. For more general tasks, we present variations of SUB2 based on constituency parse trees. For most cases, training with the augmented dataset by SUB2 achieves better performance than training with the original training set.
arXiv Detail & Related papers (2021-01-02T09:54:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.