Related papers: On Provable Length and Compositional Generalization

On Provable Length and Compositional Generalization

URL: http://arxiv.org/abs/2402.04875v5
Date: Sat, 21 Dec 2024 12:01:19 GMT
Title: On Provable Length and Compositional Generalization
Authors: Kartik Ahuja, Amin Mansouri,
Abstract summary: Out-of-distribution generalization capabilities of sequence-to-sequence models can be studied from the lens of two crucial forms of generalization.<n>We provide first provable guarantees on length and compositional generalization for common sequence-to-sequence models.
Score: 7.883808173871223
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Out-of-distribution generalization capabilities of sequence-to-sequence models can be studied from the lens of two crucial forms of generalization: length generalization -- the ability to generalize to longer sequences than ones seen during training, and compositional generalization: the ability to generalize to token combinations not seen during training. In this work, we provide first provable guarantees on length and compositional generalization for common sequence-to-sequence models -- deep sets, transformers, state space models, and recurrent neural nets -- trained to minimize the prediction error. We show that limited capacity versions of these different architectures achieve both length and compositional generalization provided the training distribution is sufficiently diverse. In the first part, we study structured limited capacity variants of different architectures and arrive at the generalization guarantees with limited diversity requirements on the training distribution. In the second part, we study limited capacity variants with less structural assumptions and arrive at generalization guarantees but with more diversity requirements on the training distribution.

Related papers

The Role of Sparsity for Length Generalization in Transformers [58.65997625433689]
We propose a new theoretical framework to study length generalization for the next-token prediction task. We show that length generalization occurs as long as each predicted token depends on a small (fixed) number of previous tokens. We introduce Predictive Position Coupling, which trains the transformer to predict the position IDs used in a positional coupling approach.
arXiv Detail & Related papers (2025-02-24T03:01:03Z)
GRAM: Generalization in Deep RL with a Robust Adaptation Module [29.303051759538416]
In this work, we present a framework for dynamics generalization in deep reinforcement learning. We introduce a robust adaptation module that provides a mechanism for identifying and reacting to both in-distribution and out-of-distribution environment dynamics. Our algorithm GRAM achieves strong generalization performance across in-distribution and out-of-distribution scenarios upon deployment.
arXiv Detail & Related papers (2024-12-05T16:39:01Z)
A Formal Framework for Understanding Length Generalization in Transformers [14.15513446489798]
We introduce a rigorous theoretical framework to analyze length generalization in causal transformers. We experimentally validate the theory as a predictor of success and failure of length generalization across a range of algorithmic and formal language tasks.
arXiv Detail & Related papers (2024-10-03T01:52:01Z)
Learning Syntax Without Planting Trees: Understanding When and Why Transformers Generalize Hierarchically [74.96551626420188]
Transformers trained on natural language data have been shown to learn its hierarchical structure and generalize to sentences with unseen syntactic structures. We investigate sources of inductive bias in transformer models and their training that could cause such generalization behavior to emerge.
arXiv Detail & Related papers (2024-04-25T07:10:29Z)
On the Generalization Ability of Unsupervised Pretraining [53.06175754026037]
Recent advances in unsupervised learning have shown that unsupervised pre-training, followed by fine-tuning, can improve model generalization. This paper introduces a novel theoretical framework that illuminates the critical factor influencing the transferability of knowledge acquired during unsupervised pre-training to the subsequent fine-tuning phase. Our results contribute to a better understanding of unsupervised pre-training and fine-tuning paradigm, and can shed light on the design of more effective pre-training algorithms.
arXiv Detail & Related papers (2024-03-11T16:23:42Z)
On the generalization capacity of neural networks during generic multimodal reasoning [20.1430673356983]
We evaluate and compare large language models' capacity for multimodal generalization. For multimodal distractor and systematic generalization, either cross-modal attention or models with deeper attention layers are key architectural features required to integrate multimodal inputs.
arXiv Detail & Related papers (2024-01-26T17:42:59Z)
Real-World Compositional Generalization with Disentangled Sequence-to-Sequence Learning [81.24269148865555]
A recently proposed Disentangled sequence-to-sequence model (Dangle) shows promising generalization capability. We introduce two key modifications to this model which encourage more disentangled representations and improve its compute and memory efficiency. Specifically, instead of adaptively re-encoding source keys and values at each time step, we disentangle their representations and only re-encode keys periodically.
arXiv Detail & Related papers (2022-12-12T15:40:30Z)
Compositional Generalisation with Structured Reordering and Fertility Layers [121.37328648951993]
Seq2seq models have been shown to struggle with compositional generalisation. We present a flexible end-to-end differentiable neural model that composes two structural operations.
arXiv Detail & Related papers (2022-10-06T19:51:31Z)
Compositional Generalization Requires Compositional Parsers [69.77216620997305]
We compare sequence-to-sequence models and models guided by compositional principles on the recent COGS corpus. We show structural generalization is a key measure of compositional generalization and requires models that are aware of complex structure.
arXiv Detail & Related papers (2022-02-24T07:36:35Z)
Disentangled Sequence to Sequence Learning for Compositional Generalization [62.954842223732435]
We propose an extension to sequence-to-sequence models which allows us to learn disentangled representations by adaptively re-encoding the source input. Experimental results on semantic parsing and machine translation empirically show that our proposal yields more disentangled representations and better generalization.
arXiv Detail & Related papers (2021-10-09T22:27:19Z)
Improving Compositional Generalization in Classification Tasks via Structure Annotations [33.90268697120572]
Humans have a great ability to generalize compositionally, but state-of-the-art neural models struggle to do so. First, we study ways to convert a natural language sequence-to-sequence dataset to a classification dataset that also requires compositional generalization. Second, we show that providing structural hints (specifically, providing parse trees and entity links as attention masks for a Transformer model) helps compositional generalization.
arXiv Detail & Related papers (2021-06-19T06:07:27Z)
Compositional Generalization via Semantic Tagging [81.24269148865555]
We propose a new decoding framework that preserves the expressivity and generality of sequence-to-sequence models. We show that the proposed approach consistently improves compositional generalization across model architectures, domains, and semantic formalisms.
arXiv Detail & Related papers (2020-10-22T15:55:15Z)
Does syntax need to grow on trees? Sources of hierarchical inductive bias in sequence-to-sequence networks [28.129220683169052]
In neural network models, inductive biases could in theory arise from any aspect of the model architecture. We investigate which architectural factors affect the generalization behavior of neural sequence-to-sequence models trained on two syntactic tasks.
arXiv Detail & Related papers (2020-01-10T19:02:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.