On Provable Length and Compositional Generalization
- URL: http://arxiv.org/abs/2402.04875v3
- Date: Fri, 7 Jun 2024 20:25:05 GMT
- Title: On Provable Length and Compositional Generalization
- Authors: Kartik Ahuja, Amin Mansouri,
- Abstract summary: We provide first provable guarantees on length and compositional generalization for common sequence-to-sequence models.
We show that limited capacity versions of different architectures achieve both length and compositional generalization.
- Score: 7.883808173871223
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Out-of-distribution generalization capabilities of sequence-to-sequence models can be studied from the lens of two crucial forms of generalization: length generalization -- the ability to generalize to longer sequences than ones seen during training, and compositional generalization: the ability to generalize to token combinations not seen during training. In this work, we provide first provable guarantees on length and compositional generalization for common sequence-to-sequence models -- deep sets, transformers, state space models, and recurrent neural nets -- trained to minimize the prediction error. Taking a first principles perspective, we study the realizable case, i.e., the labeling function is realizable on the architecture. We show that limited capacity versions of these different architectures achieve both length and compositional generalization. Across different architectures, we also find that a linear relationship between the learned representation and the representation in the labeling function is necessary for length and compositional generalization.
Related papers
- A Formal Framework for Understanding Length Generalization in Transformers [14.15513446489798]
We introduce a rigorous theoretical framework to analyze length generalization in causal transformers.
We experimentally validate the theory as a predictor of success and failure of length generalization across a range of algorithmic and formal language tasks.
arXiv Detail & Related papers (2024-10-03T01:52:01Z) - Learning Syntax Without Planting Trees: Understanding When and Why Transformers Generalize Hierarchically [74.96551626420188]
Transformers trained on natural language data have been shown to learn its hierarchical structure and generalize to sentences with unseen syntactic structures.
We investigate sources of inductive bias in transformer models and their training that could cause such generalization behavior to emerge.
arXiv Detail & Related papers (2024-04-25T07:10:29Z) - Real-World Compositional Generalization with Disentangled
Sequence-to-Sequence Learning [81.24269148865555]
A recently proposed Disentangled sequence-to-sequence model (Dangle) shows promising generalization capability.
We introduce two key modifications to this model which encourage more disentangled representations and improve its compute and memory efficiency.
Specifically, instead of adaptively re-encoding source keys and values at each time step, we disentangle their representations and only re-encode keys periodically.
arXiv Detail & Related papers (2022-12-12T15:40:30Z) - Compositional Generalisation with Structured Reordering and Fertility
Layers [121.37328648951993]
Seq2seq models have been shown to struggle with compositional generalisation.
We present a flexible end-to-end differentiable neural model that composes two structural operations.
arXiv Detail & Related papers (2022-10-06T19:51:31Z) - Compositional Generalization Requires Compositional Parsers [69.77216620997305]
We compare sequence-to-sequence models and models guided by compositional principles on the recent COGS corpus.
We show structural generalization is a key measure of compositional generalization and requires models that are aware of complex structure.
arXiv Detail & Related papers (2022-02-24T07:36:35Z) - Disentangled Sequence to Sequence Learning for Compositional
Generalization [62.954842223732435]
We propose an extension to sequence-to-sequence models which allows us to learn disentangled representations by adaptively re-encoding the source input.
Experimental results on semantic parsing and machine translation empirically show that our proposal yields more disentangled representations and better generalization.
arXiv Detail & Related papers (2021-10-09T22:27:19Z) - Improving Compositional Generalization in Classification Tasks via
Structure Annotations [33.90268697120572]
Humans have a great ability to generalize compositionally, but state-of-the-art neural models struggle to do so.
First, we study ways to convert a natural language sequence-to-sequence dataset to a classification dataset that also requires compositional generalization.
Second, we show that providing structural hints (specifically, providing parse trees and entity links as attention masks for a Transformer model) helps compositional generalization.
arXiv Detail & Related papers (2021-06-19T06:07:27Z) - Compositional Generalization via Semantic Tagging [81.24269148865555]
We propose a new decoding framework that preserves the expressivity and generality of sequence-to-sequence models.
We show that the proposed approach consistently improves compositional generalization across model architectures, domains, and semantic formalisms.
arXiv Detail & Related papers (2020-10-22T15:55:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.