Iterative Decoding for Compositional Generalization in Transformers
- URL: http://arxiv.org/abs/2110.04169v1
- Date: Fri, 8 Oct 2021 14:52:25 GMT
- Title: Iterative Decoding for Compositional Generalization in Transformers
- Authors: Luana Ruiz, Joshua Ainslie, Santiago Onta\~n\'on
- Abstract summary: In sequence-to-sequence learning, transformers are often unable to predict correct outputs for even marginally longer examples.
This paper introduces iterative decoding, an alternative to seq2seq learning.
We show that transfomers trained via iterative decoding outperform their seq2seq counterparts on the PCFG dataset.
- Score: 5.269770493488338
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep learning models do well at generalizing to in-distribution data but
struggle to generalize compositionally, i.e., to combine a set of learned
primitives to solve more complex tasks. In particular, in sequence-to-sequence
(seq2seq) learning, transformers are often unable to predict correct outputs
for even marginally longer examples than those seen during training. This paper
introduces iterative decoding, an alternative to seq2seq learning that (i)
improves transformer compositional generalization and (ii) evidences that, in
general, seq2seq transformers do not learn iterations that are not unrolled.
Inspired by the idea of compositionality -- that complex tasks can be solved by
composing basic primitives -- training examples are broken down into a sequence
of intermediate steps that the transformer then learns iteratively. At
inference time, the intermediate outputs are fed back to the transformer as
intermediate inputs until an end-of-iteration token is predicted. Through
numerical experiments, we show that transfomers trained via iterative decoding
outperform their seq2seq counterparts on the PCFG dataset, and solve the
problem of calculating Cartesian products between vectors longer than those
seen during training with 100% accuracy, a task at which seq2seq models have
been shown to fail. We also illustrate a limitation of iterative decoding,
specifically, that it can make sorting harder to learn on the CFQ dataset.
Related papers
- Algorithmic Capabilities of Random Transformers [49.73113518329544]
We investigate what functions can be learned by randomly transformers in which only the embedding layers are optimized.
We find that these random transformers can perform a wide range of meaningful algorithmic tasks.
Our results indicate that some algorithmic capabilities are present in transformers even before these models are trained.
arXiv Detail & Related papers (2024-10-06T06:04:23Z) - The Expressive Power of Transformers with Chain of Thought [29.839710738657203]
In practice, transformers can be improved by allowing them to use a "chain of thought" or "scratchpad"
We show that the answer is yes, but the amount of increase depends crucially on the amount of intermediate generation.
Our results also imply that linear steps keep transformer decoders within context-sensitive languages.
arXiv Detail & Related papers (2023-10-11T22:35:18Z) - Compositional Generalization without Trees using Multiset Tagging and
Latent Permutations [121.37328648951993]
We phrase semantic parsing as a two-step process: we first tag each input token with a multiset of output tokens.
Then we arrange the tokens into an output sequence using a new way of parameterizing and predicting permutations.
Our model outperforms pretrained seq2seq models and prior work on realistic semantic parsing tasks.
arXiv Detail & Related papers (2023-05-26T14:09:35Z) - Mutual Exclusivity Training and Primitive Augmentation to Induce
Compositionality [84.94877848357896]
Recent datasets expose the lack of the systematic generalization ability in standard sequence-to-sequence models.
We analyze this behavior of seq2seq models and identify two contributing factors: a lack of mutual exclusivity bias and the tendency to memorize whole examples.
We show substantial empirical improvements using standard sequence-to-sequence models on two widely-used compositionality datasets.
arXiv Detail & Related papers (2022-11-28T17:36:41Z) - Systematic Generalization and Emergent Structures in Transformers
Trained on Structured Tasks [6.525090891505941]
We show how a causal transformer can perform a set of algorithmic tasks, including copying, sorting, and hierarchical compositions.
We show that two-layer transformers learn generalizable solutions to multi-level problems and develop signs of systematic task decomposition.
These results provide key insights into how transformer models may be capable of decomposing complex decisions into reusable, multi-level policies.
arXiv Detail & Related papers (2022-10-02T00:46:36Z) - Recursive Decoding: A Situated Cognition Approach to Compositional
Generation in Grounded Language Understanding [0.0]
We present Recursive Decoding, a novel procedure for training and using seq2seq models.
Rather than generating an entire output sequence in one pass, models are trained to predict one token at a time.
RD yields dramatic improvement on two previously neglected generalization tasks in gSCAN.
arXiv Detail & Related papers (2022-01-27T19:13:42Z) - Grounded Graph Decoding Improves Compositional Generalization in
Question Answering [68.72605660152101]
Question answering models struggle to generalize to novel compositions of training patterns, such as longer sequences or more complex test structures.
We propose Grounded Graph Decoding, a method to improve compositional generalization of language representations by grounding structured predictions with an attention mechanism.
Our model significantly outperforms state-of-the-art baselines on the Compositional Freebase Questions (CFQ) dataset, a challenging benchmark for compositional generalization in question answering.
arXiv Detail & Related papers (2021-11-05T17:50:14Z) - Discovering Non-monotonic Autoregressive Orderings with Variational
Inference [67.27561153666211]
We develop an unsupervised parallelizable learner that discovers high-quality generation orders purely from training data.
We implement the encoder as a Transformer with non-causal attention that outputs permutations in one forward pass.
Empirical results in language modeling tasks demonstrate that our method is context-aware and discovers orderings that are competitive with or even better than fixed orders.
arXiv Detail & Related papers (2021-10-27T16:08:09Z) - Inducing Transformer's Compositional Generalization Ability via
Auxiliary Sequence Prediction Tasks [86.10875837475783]
Systematic compositionality is an essential mechanism in human language, allowing the recombination of known parts to create novel expressions.
Existing neural models have been shown to lack this basic ability in learning symbolic structures.
We propose two auxiliary sequence prediction tasks that track the progress of function and argument semantics.
arXiv Detail & Related papers (2021-09-30T16:41:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.