Learning to Substitute Spans towards Improving Compositional
Generalization
- URL: http://arxiv.org/abs/2306.02840v1
- Date: Mon, 5 Jun 2023 12:44:18 GMT
- Title: Learning to Substitute Spans towards Improving Compositional
Generalization
- Authors: Zhaoyi Li, Ying Wei and Defu Lian
- Abstract summary: We propose a novel compositional augmentation strategy dubbed textbfSpan textbfSubstitution (SpanSub)
We then introduce the textbfL earning textbfSubstitute textbfSpan (L2S2) framework which empowers the learning of span substitution probabilities in SpanSub.
Our results on three standard compositional generalization benchmarks, including SCAN, COGS and GeoQuery, demonstrate the superiority of SpanSub, %the learning framework L2S2 and their combination.
- Score: 26.878616721700485
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite the rising prevalence of neural sequence models, recent empirical
evidences suggest their deficiency in compositional generalization. One of the
current de-facto solutions to this problem is compositional data augmentation,
aiming to incur additional compositional inductive bias. Nonetheless, the
improvement offered by existing handcrafted augmentation strategies is limited
when successful systematic generalization of neural sequence models requires
multi-grained compositional bias (i.e., not limited to either lexical or
structural biases only) or differentiation of training sequences in an
imbalanced difficulty distribution. To address the two challenges, we first
propose a novel compositional augmentation strategy dubbed \textbf{Span}
\textbf{Sub}stitution (SpanSub) that enables multi-grained composition of
substantial substructures in the whole training set. Over and above that, we
introduce the \textbf{L}earning \textbf{to} \textbf{S}ubstitute \textbf{S}pan
(L2S2) framework which empowers the learning of span substitution probabilities
in SpanSub in an end-to-end manner by maximizing the loss of neural sequence
models, so as to outweigh those challenging compositions with elusive concepts
and novel surroundings. Our empirical results on three standard compositional
generalization benchmarks, including SCAN, COGS and GeoQuery (with an
improvement of at most 66.5\%, 10.3\%, 1.2\%, respectively), demonstrate the
superiority of SpanSub, %the learning framework L2S2 and their combination.
Related papers
- Learning to Substitute Components for Compositional Generalization [70.96410435337967]
We propose a novel compositional augmentation strategy called CompSub, which enables multi-grained composition of substantial substructures.
We also introduce the Learning Component Substitution (LCS) framework, which empowers the learning of component substitution probabilities in CompSub.
Our results demonstrate the superiority of CompSub, LCS, and LCS-ICL, with improvements of up to 66.5%, 10.3%, 1.4%, and 8.8%, respectively.
arXiv Detail & Related papers (2025-02-28T08:30:47Z) - Structural generalization in COGS: Supertagging is (almost) all you need [12.991247861348048]
Several recent semantic parsing datasets have put forward important limitations of neural networks in cases where compositional generalization is required.
We extend a neural graph-based semantic parsing framework in several ways to alleviate this issue.
arXiv Detail & Related papers (2023-10-21T21:51:25Z) - Real-World Compositional Generalization with Disentangled
Sequence-to-Sequence Learning [81.24269148865555]
A recently proposed Disentangled sequence-to-sequence model (Dangle) shows promising generalization capability.
We introduce two key modifications to this model which encourage more disentangled representations and improve its compute and memory efficiency.
Specifically, instead of adaptively re-encoding source keys and values at each time step, we disentangle their representations and only re-encode keys periodically.
arXiv Detail & Related papers (2022-12-12T15:40:30Z) - Mutual Exclusivity Training and Primitive Augmentation to Induce
Compositionality [84.94877848357896]
Recent datasets expose the lack of the systematic generalization ability in standard sequence-to-sequence models.
We analyze this behavior of seq2seq models and identify two contributing factors: a lack of mutual exclusivity bias and the tendency to memorize whole examples.
We show substantial empirical improvements using standard sequence-to-sequence models on two widely-used compositionality datasets.
arXiv Detail & Related papers (2022-11-28T17:36:41Z) - Compositional Generalisation with Structured Reordering and Fertility
Layers [121.37328648951993]
Seq2seq models have been shown to struggle with compositional generalisation.
We present a flexible end-to-end differentiable neural model that composes two structural operations.
arXiv Detail & Related papers (2022-10-06T19:51:31Z) - Incremental Few-Shot Learning via Implanting and Compressing [13.122771115838523]
Incremental Few-Shot Learning requires a model to continually learn novel classes from only a few examples.
We propose a two-step learning strategy referred to as textbfImplanting and textbfCompressing.
Specifically, in the textbfImplanting step, we propose to mimic the data distribution of novel classes with the assistance of data-abundant base set.
In the textbf step, we adapt the feature extractor to precisely represent each novel class for enhancing intra-class compactness.
arXiv Detail & Related papers (2022-03-19T11:04:43Z) - Improving Compositional Generalization with Latent Structure and Data
Augmentation [39.24527889685699]
We present a more powerful data recombination method using a model called Compositional Structure Learner (CSL)
CSL is a generative model with a quasi-synchronous context-free grammar backbone.
This procedure effectively transfers most of CSL's compositional bias to T5 for diagnostic tasks.
arXiv Detail & Related papers (2021-12-14T18:03:28Z) - Learning to Generalize Compositionally by Transferring Across Semantic
Parsing Tasks [37.66114618645146]
We investigate learning representations that facilitate transfer learning from one compositional task to another.
We apply this method to semantic parsing, using three very different datasets.
Our method significantly improves compositional generalization over baselines on the test set of the target task.
arXiv Detail & Related papers (2021-11-09T09:10:21Z) - Grounded Graph Decoding Improves Compositional Generalization in
Question Answering [68.72605660152101]
Question answering models struggle to generalize to novel compositions of training patterns, such as longer sequences or more complex test structures.
We propose Grounded Graph Decoding, a method to improve compositional generalization of language representations by grounding structured predictions with an attention mechanism.
Our model significantly outperforms state-of-the-art baselines on the Compositional Freebase Questions (CFQ) dataset, a challenging benchmark for compositional generalization in question answering.
arXiv Detail & Related papers (2021-11-05T17:50:14Z) - Sequence-Level Mixed Sample Data Augmentation [119.94667752029143]
This work proposes a simple data augmentation approach to encourage compositional behavior in neural models for sequence-to-sequence problems.
Our approach, SeqMix, creates new synthetic examples by softly combining input/output sequences from the training set.
arXiv Detail & Related papers (2020-11-18T02:18:04Z) - Dual-constrained Deep Semi-Supervised Coupled Factorization Network with
Enriched Prior [80.5637175255349]
We propose a new enriched prior based Dual-constrained Deep Semi-Supervised Coupled Factorization Network, called DS2CF-Net.
To ex-tract hidden deep features, DS2CF-Net is modeled as a deep-structure and geometrical structure-constrained neural network.
Our network can obtain state-of-the-art performance for representation learning and clustering.
arXiv Detail & Related papers (2020-09-08T13:10:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.