Inducing Transformer's Compositional Generalization Ability via
Auxiliary Sequence Prediction Tasks
- URL: http://arxiv.org/abs/2109.15256v1
- Date: Thu, 30 Sep 2021 16:41:19 GMT
- Title: Inducing Transformer's Compositional Generalization Ability via
Auxiliary Sequence Prediction Tasks
- Authors: Yichen Jiang, Mohit Bansal
- Abstract summary: Systematic compositionality is an essential mechanism in human language, allowing the recombination of known parts to create novel expressions.
Existing neural models have been shown to lack this basic ability in learning symbolic structures.
We propose two auxiliary sequence prediction tasks that track the progress of function and argument semantics.
- Score: 86.10875837475783
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Systematic compositionality is an essential mechanism in human language,
allowing the recombination of known parts to create novel expressions. However,
existing neural models have been shown to lack this basic ability in learning
symbolic structures. Motivated by the failure of a Transformer model on the
SCAN compositionality challenge (Lake and Baroni, 2018), which requires parsing
a command into actions, we propose two auxiliary sequence prediction tasks that
track the progress of function and argument semantics, as additional training
supervision. These automatically-generated sequences are more representative of
the underlying compositional symbolic structures of the input data. During
inference, the model jointly predicts the next action and the next tokens in
the auxiliary sequences at each step. Experiments on the SCAN dataset show that
our method encourages the Transformer to understand compositional structures of
the command, improving its accuracy on multiple challenging splits from <= 10%
to 100%. With only 418 (5%) training instances, our approach still achieves
97.8% accuracy on the MCD1 split. Therefore, we argue that compositionality can
be induced in Transformers given minimal but proper guidance. We also show that
a better result is achieved using less contextualized vectors as the
attention's query, providing insights into architecture choices in achieving
systematic compositionality. Finally, we show positive generalization results
on the groundedSCAN task (Ruis et al., 2020). Our code is publicly available
at: https://github.com/jiangycTarheel/compositional-auxseq
Related papers
- Compositional Capabilities of Autoregressive Transformers: A Study on
Synthetic, Interpretable Tasks [23.516986266146855]
We train autoregressive Transformer models on a synthetic data-generating process.
We show that autoregressive Transformers can learn compositional structures from small amounts of training data.
arXiv Detail & Related papers (2023-11-21T21:16:54Z) - Real-World Compositional Generalization with Disentangled
Sequence-to-Sequence Learning [81.24269148865555]
A recently proposed Disentangled sequence-to-sequence model (Dangle) shows promising generalization capability.
We introduce two key modifications to this model which encourage more disentangled representations and improve its compute and memory efficiency.
Specifically, instead of adaptively re-encoding source keys and values at each time step, we disentangle their representations and only re-encode keys periodically.
arXiv Detail & Related papers (2022-12-12T15:40:30Z) - Mutual Exclusivity Training and Primitive Augmentation to Induce
Compositionality [84.94877848357896]
Recent datasets expose the lack of the systematic generalization ability in standard sequence-to-sequence models.
We analyze this behavior of seq2seq models and identify two contributing factors: a lack of mutual exclusivity bias and the tendency to memorize whole examples.
We show substantial empirical improvements using standard sequence-to-sequence models on two widely-used compositionality datasets.
arXiv Detail & Related papers (2022-11-28T17:36:41Z) - Compositional Generalization and Decomposition in Neural Program
Synthesis [59.356261137313275]
In this paper, we focus on measuring the ability of learned program synthesizers to compositionally generalize.
We first characterize several different axes along which program synthesis methods would be desired to generalize.
We introduce a benchmark suite of tasks to assess these abilities based on two popular existing datasets.
arXiv Detail & Related papers (2022-04-07T22:16:05Z) - Recursive Decoding: A Situated Cognition Approach to Compositional
Generation in Grounded Language Understanding [0.0]
We present Recursive Decoding, a novel procedure for training and using seq2seq models.
Rather than generating an entire output sequence in one pass, models are trained to predict one token at a time.
RD yields dramatic improvement on two previously neglected generalization tasks in gSCAN.
arXiv Detail & Related papers (2022-01-27T19:13:42Z) - Iterative Decoding for Compositional Generalization in Transformers [5.269770493488338]
In sequence-to-sequence learning, transformers are often unable to predict correct outputs for even marginally longer examples.
This paper introduces iterative decoding, an alternative to seq2seq learning.
We show that transfomers trained via iterative decoding outperform their seq2seq counterparts on the PCFG dataset.
arXiv Detail & Related papers (2021-10-08T14:52:25Z) - Structured Reordering for Modeling Latent Alignments in Sequence
Transduction [86.94309120789396]
We present an efficient dynamic programming algorithm performing exact marginal inference of separable permutations.
The resulting seq2seq model exhibits better systematic generalization than standard models on synthetic problems and NLP tasks.
arXiv Detail & Related papers (2021-06-06T21:53:54Z) - Sequence-Level Mixed Sample Data Augmentation [119.94667752029143]
This work proposes a simple data augmentation approach to encourage compositional behavior in neural models for sequence-to-sequence problems.
Our approach, SeqMix, creates new synthetic examples by softly combining input/output sequences from the training set.
arXiv Detail & Related papers (2020-11-18T02:18:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.