When does compositional structure yield compositional generalization? A kernel theory
- URL: http://arxiv.org/abs/2405.16391v3
- Date: Tue, 08 Apr 2025 15:07:04 GMT
- Title: When does compositional structure yield compositional generalization? A kernel theory
- Authors: Samuel Lippl, Kim Stachenfeld,
- Abstract summary: We present a theory of compositional generalization in kernel models with fixed, compositionally structured representations.<n>We identify novel failure modes in compositional generalization that arise from biases in the training data.<n>This work examines how statistical structure in the training data can affect compositional generalization.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Compositional generalization (the ability to respond correctly to novel combinations of familiar components) is thought to be a cornerstone of intelligent behavior. Compositionally structured (e.g. disentangled) representations support this ability; however, the conditions under which they are sufficient for the emergence of compositional generalization remain unclear. To address this gap, we present a theory of compositional generalization in kernel models with fixed, compositionally structured representations. This provides a tractable framework for characterizing the impact of training data statistics on generalization. We find that these models are limited to functions that assign values to each combination of components seen during training, and then sum up these values ("conjunction-wise additivity"). This imposes fundamental restrictions on the set of tasks compositionally structured kernel models can learn, in particular preventing them from transitively generalizing equivalence relations. Even for compositional tasks that they can learn in principle, we identify novel failure modes in compositional generalization (memorization leak and shortcut bias) that arise from biases in the training data. Finally, we empirically validate our theory, showing that it captures the behavior of deep neural networks (convolutional networks, residual networks, and Vision Transformers) trained on a set of compositional tasks with similarly structured data. Ultimately, this work examines how statistical structure in the training data can affect compositional generalization, with implications for how to identify and remedy failure modes in deep learning models.
Related papers
- What makes Models Compositional? A Theoretical View: With Supplement [60.284698521569936]
We propose a general neuro-symbolic definition of compositional functions and their compositional complexity.
We show how various existing general and special purpose sequence processing models fit this definition and use it to analyze their compositional complexity.
arXiv Detail & Related papers (2024-05-02T20:10:27Z) - Skews in the Phenomenon Space Hinder Generalization in Text-to-Image Generation [59.138470433237615]
We introduce statistical metrics that quantify both the linguistic and visual skew of a dataset for relational learning.
We show that systematically controlled metrics are strongly predictive of generalization performance.
This work informs an important direction towards quality-enhancing the data diversity or balance to scaling up the absolute size.
arXiv Detail & Related papers (2024-03-25T03:18:39Z) - Towards Understanding the Relationship between In-context Learning and Compositional Generalization [7.843029855730508]
We train a causal Transformer in a setting that renders ordinary learning very difficult.
The model can solve the task, however, by utilizing earlier examples to generalize to later ones.
In evaluations on the datasets, SCAN, COGS, and GeoQuery, models trained in this manner indeed show improved compositional generalization.
arXiv Detail & Related papers (2024-03-18T14:45:52Z) - Provable Compositional Generalization for Object-Centric Learning [55.658215686626484]
Learning representations that generalize to novel compositions of known concepts is crucial for bridging the gap between human and machine perception.
We show that autoencoders that satisfy structural assumptions on the decoder and enforce encoder-decoder consistency will learn object-centric representations that provably generalize compositionally.
arXiv Detail & Related papers (2023-10-09T01:18:07Z) - Compositional Generalization in Unsupervised Compositional
Representation Learning: A Study on Disentanglement and Emergent Language [48.37815764394315]
We study three unsupervised representation learning algorithms on two datasets that allow directly testing compositional generalization.
We find that directly using the bottleneck representation with simple models and few labels may lead to worse generalization than using representations from layers before or after the learned representation itself.
Surprisingly, we find that increasing pressure to produce a disentangled representation produces representations with worse generalization, while representations from EL models show strong compositional generalization.
arXiv Detail & Related papers (2022-10-02T10:35:53Z) - On Neural Architecture Inductive Biases for Relational Tasks [76.18938462270503]
We introduce a simple architecture based on similarity-distribution scores which we name Compositional Network generalization (CoRelNet)
We find that simple architectural choices can outperform existing models in out-of-distribution generalizations.
arXiv Detail & Related papers (2022-06-09T16:24:01Z) - Compositional Generalization Requires Compositional Parsers [69.77216620997305]
We compare sequence-to-sequence models and models guided by compositional principles on the recent COGS corpus.
We show structural generalization is a key measure of compositional generalization and requires models that are aware of complex structure.
arXiv Detail & Related papers (2022-02-24T07:36:35Z) - Improving Compositional Generalization in Classification Tasks via
Structure Annotations [33.90268697120572]
Humans have a great ability to generalize compositionally, but state-of-the-art neural models struggle to do so.
First, we study ways to convert a natural language sequence-to-sequence dataset to a classification dataset that also requires compositional generalization.
Second, we show that providing structural hints (specifically, providing parse trees and entity links as attention masks for a Transformer model) helps compositional generalization.
arXiv Detail & Related papers (2021-06-19T06:07:27Z) - Meta-Learning to Compositionally Generalize [34.656819307701156]
We implement a meta-learning augmented version of supervised learning.
We construct pairs of tasks for meta-learning by sub-sampling existing training data.
Experimental results on the COGS and SCAN datasets show that our similarity-driven meta-learning can improve generalization performance.
arXiv Detail & Related papers (2021-06-08T11:21:48Z) - Compositional Processing Emerges in Neural Networks Solving Math
Problems [100.80518350845668]
Recent progress in artificial neural networks has shown that when large models are trained on enough linguistic data, grammatical structure emerges in their representations.
We extend this work to the domain of mathematical reasoning, where it is possible to formulate precise hypotheses about how meanings should be composed.
Our work shows that neural networks are not only able to infer something about the structured relationships implicit in their training data, but can also deploy this knowledge to guide the composition of individual meanings into composite wholes.
arXiv Detail & Related papers (2021-05-19T07:24:42Z) - Compositional Generalization by Learning Analytical Expressions [87.15737632096378]
A memory-augmented neural model is connected with analytical expressions to achieve compositional generalization.
Experiments on the well-known benchmark SCAN demonstrate that our model seizes a great ability of compositional generalization.
arXiv Detail & Related papers (2020-06-18T15:50:57Z) - Does syntax need to grow on trees? Sources of hierarchical inductive
bias in sequence-to-sequence networks [28.129220683169052]
In neural network models, inductive biases could in theory arise from any aspect of the model architecture.
We investigate which architectural factors affect the generalization behavior of neural sequence-to-sequence models trained on two syntactic tasks.
arXiv Detail & Related papers (2020-01-10T19:02:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.