SUBS: Subtree Substitution for Compositional Semantic Parsing
- URL: http://arxiv.org/abs/2205.01538v1
- Date: Tue, 3 May 2022 14:47:35 GMT
- Title: SUBS: Subtree Substitution for Compositional Semantic Parsing
- Authors: Jingfeng Yang, Le Zhang, Diyi Yang
- Abstract summary: We propose to use subtree substitution for compositional data augmentation, where we consider subtrees with similar semantic functions as exchangeable.
Experiments showed that such augmented data led to significantly better performance on SCAN and GeoQuery, and reached new SOTA on compositional split of GeoQuery.
- Score: 50.63574492655072
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Although sequence-to-sequence models often achieve good performance in
semantic parsing for i.i.d. data, their performance is still inferior in
compositional generalization. Several data augmentation methods have been
proposed to alleviate this problem. However, prior work only leveraged
superficial grammar or rules for data augmentation, which resulted in limited
improvement. We propose to use subtree substitution for compositional data
augmentation, where we consider subtrees with similar semantic functions as
exchangeable. Our experiments showed that such augmented data led to
significantly better performance on SCAN and GeoQuery, and reached new SOTA on
compositional split of GeoQuery.
Related papers
- Holistic Exploration on Universal Decompositional Semantic Parsing:
Architecture, Data Augmentation, and LLM Paradigm [24.993992573870145]
We introduce a cascade model for UDS parsing that decomposes the complex parsing task into semantically appropriate subtasks.
Our approach outperforms the prior models, while significantly reducing inference time.
Different ways for data augmentation are explored, which further improve the UDS Parsing.
arXiv Detail & Related papers (2023-07-25T11:44:28Z) - GDA: Generative Data Augmentation Techniques for Relation Extraction
Tasks [81.51314139202152]
We propose a dedicated augmentation technique for relational texts, named GDA, which uses two complementary modules to preserve both semantic consistency and syntax structures.
Experimental results in three datasets under a low-resource setting showed that GDA could bring em 2.0% F1 improvements compared with no augmentation technique.
arXiv Detail & Related papers (2023-05-26T06:21:01Z) - Recursive Neural Networks with Bottlenecks Diagnose
(Non-)Compositionality [65.60002535580298]
Quantifying compositionality of data is a challenging task, which has been investigated primarily for short utterances.
We show that comparing data's representations in models with and without a bottleneck can be used to produce a compositionality metric.
The procedure is applied to the evaluation of arithmetic expressions using synthetic data, and sentiment classification using natural language data.
arXiv Detail & Related papers (2023-01-31T15:46:39Z) - Syntax-driven Data Augmentation for Named Entity Recognition [3.0603554929274908]
In low resource settings, data augmentation strategies are commonly leveraged to improve performance.
We compare simple masked language model replacement and an augmentation method using constituency tree mutations to improve named entity recognition.
arXiv Detail & Related papers (2022-08-15T01:24:55Z) - TreeMix: Compositional Constituency-based Data Augmentation for Natural
Language Understanding [56.794981024301094]
We propose a compositional data augmentation approach for natural language understanding called TreeMix.
Specifically, TreeMix leverages constituency parsing tree to decompose sentences into constituent sub-structures and the Mixup data augmentation technique to recombine them to generate new sentences.
Compared with previous approaches, TreeMix introduces greater diversity to the samples generated and encourages models to learn compositionality of NLP data.
arXiv Detail & Related papers (2022-05-12T15:25:12Z) - Measuring and Improving Compositional Generalization in Text-to-SQL via
Component Alignment [23.43452719573272]
We propose a clause-level compositional example generation method to generate compositional generalizations.
We construct a dataset Spider-SS and Spider-CG to test the ability of models to generalize compositionally.
Experiments show that existing models suffer significant performance degradation when evaluated on Spider-CG.
We modify a number of state-of-the-art models to train on the segmented data of Spider-SS, and we show that this method improves the generalization performance.
arXiv Detail & Related papers (2022-05-04T13:29:17Z) - Learning to Synthesize Data for Semantic Parsing [57.190817162674875]
We propose a generative model which models the composition of programs and maps a program to an utterance.
Due to the simplicity of PCFG and pre-trained BART, our generative model can be efficiently learned from existing data at hand.
We evaluate our method in both in-domain and out-of-domain settings of text-to-Query parsing on the standard benchmarks of GeoQuery and Spider.
arXiv Detail & Related papers (2021-04-12T21:24:02Z) - Substructure Substitution: Structured Data Augmentation for NLP [55.69800855705232]
SUB2 generates new examples by substituting substructures with ones with the same label.
For more general tasks, we present variations of SUB2 based on constituency parse trees.
For most cases, training with the augmented dataset by SUB2 achieves better performance than training with the original training set.
arXiv Detail & Related papers (2021-01-02T09:54:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.