Real-World Compositional Generalization with Disentangled
Sequence-to-Sequence Learning
- URL: http://arxiv.org/abs/2212.05982v1
- Date: Mon, 12 Dec 2022 15:40:30 GMT
- Title: Real-World Compositional Generalization with Disentangled
Sequence-to-Sequence Learning
- Authors: Hao Zheng and Mirella Lapata
- Abstract summary: A recently proposed Disentangled sequence-to-sequence model (Dangle) shows promising generalization capability.
We introduce two key modifications to this model which encourage more disentangled representations and improve its compute and memory efficiency.
Specifically, instead of adaptively re-encoding source keys and values at each time step, we disentangle their representations and only re-encode keys periodically.
- Score: 81.24269148865555
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Compositional generalization is a basic mechanism in human language learning,
which current neural networks struggle with. A recently proposed Disentangled
sequence-to-sequence model (Dangle) shows promising generalization capability
by learning specialized encodings for each decoding step. We introduce two key
modifications to this model which encourage more disentangled representations
and improve its compute and memory efficiency, allowing us to tackle
compositional generalization in a more realistic setting. Specifically, instead
of adaptively re-encoding source keys and values at each time step, we
disentangle their representations and only re-encode keys periodically, at some
interval. Our new architecture leads to better generalization performance
across existing tasks and datasets, and a new machine translation benchmark
which we create by detecting naturally occurring compositional patterns in
relation to a training set. We show this methodology better emulates real-world
requirements than artificial challenges.
Related papers
- Exact, Tractable Gauss-Newton Optimization in Deep Reversible Architectures Reveal Poor Generalization [52.16435732772263]
Second-order optimization has been shown to accelerate the training of deep neural networks in many applications.
However, generalization properties of second-order methods are still being debated.
We show for the first time that exact Gauss-Newton (GN) updates take on a tractable form in a class of deep architectures.
arXiv Detail & Related papers (2024-11-12T17:58:40Z) - On the Regularization of Learnable Embeddings for Time Series Processing [18.069747511100132]
We investigate methods to regularize the learning of local learnable embeddings for time series processing.
We show that methods preventing the co-adaptation of local and global parameters are particularly effective in this context.
arXiv Detail & Related papers (2024-10-18T17:30:20Z) - A Simple Recipe for Language-guided Domain Generalized Segmentation [45.93202559299953]
Generalization to new domains not seen during training is one of the long-standing challenges in deploying neural networks in real-world applications.
We introduce a simple framework for generalizing semantic segmentation networks by employing language as the source of randomization.
Our recipe comprises three key ingredients: (i) the preservation of the intrinsic CLIP robustness through minimal fine-tuning, (ii) language-driven local style augmentation, and (iii) randomization by locally mixing the source and augmented styles during training.
arXiv Detail & Related papers (2023-11-29T18:59:59Z) - Compositional Program Generation for Few-Shot Systematic Generalization [59.57656559816271]
This study on a neuro-symbolic architecture called the Compositional Program Generator (CPG)
CPG has three key features: textitmodularity, textitcomposition, and textitabstraction, in the form of grammar rules.
It perfect achieves generalization on both the SCAN and COGS benchmarks using just 14 examples for SCAN and 22 examples for COGS.
arXiv Detail & Related papers (2023-09-28T14:33:20Z) - ExeDec: Execution Decomposition for Compositional Generalization in Neural Program Synthesis [54.18659323181771]
We characterize several different forms of compositional generalization that are desirable in program synthesis.
We propose ExeDec, a novel decomposition-based strategy that predicts execution subgoals to solve problems step-by-step informed by program execution at each step.
arXiv Detail & Related papers (2023-07-26T01:07:52Z) - Compositional Generalization and Decomposition in Neural Program
Synthesis [59.356261137313275]
In this paper, we focus on measuring the ability of learned program synthesizers to compositionally generalize.
We first characterize several different axes along which program synthesis methods would be desired to generalize.
We introduce a benchmark suite of tasks to assess these abilities based on two popular existing datasets.
arXiv Detail & Related papers (2022-04-07T22:16:05Z) - Recursive Decoding: A Situated Cognition Approach to Compositional
Generation in Grounded Language Understanding [0.0]
We present Recursive Decoding, a novel procedure for training and using seq2seq models.
Rather than generating an entire output sequence in one pass, models are trained to predict one token at a time.
RD yields dramatic improvement on two previously neglected generalization tasks in gSCAN.
arXiv Detail & Related papers (2022-01-27T19:13:42Z) - Disentangled Sequence to Sequence Learning for Compositional
Generalization [62.954842223732435]
We propose an extension to sequence-to-sequence models which allows us to learn disentangled representations by adaptively re-encoding the source input.
Experimental results on semantic parsing and machine translation empirically show that our proposal yields more disentangled representations and better generalization.
arXiv Detail & Related papers (2021-10-09T22:27:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.