Related papers: When does compositional structure yield compositional generalization? A kernel theory

When does compositional structure yield compositional generalization? A kernel theory

URL: http://arxiv.org/abs/2405.16391v1
Date: Sun, 26 May 2024 00:50:11 GMT
Title: When does compositional structure yield compositional generalization? A kernel theory
Authors: Samuel Lippl, Kim Stachenfeld,
Abstract summary: We present a theory of compositional generalization in kernel models with fixed, potentially nonlinear representations. We show that these models are functionally limited to adding up values assigned to conjunctions/combinations of components that have been seen during training. We validate our theory empirically, showing that it captures the behavior of deep neural networks trained on a set of compositional tasks.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Compositional generalization (the ability to respond correctly to novel combinations of familiar components) is thought to be a cornerstone of intelligent behavior. Compositionally structured (e.g. disentangled) representations are essential for this; however, the conditions under which they yield compositional generalization remain unclear. To address this gap, we present a general theory of compositional generalization in kernel models with fixed, potentially nonlinear representations (which also applies to neural networks in the "lazy regime"). We prove that these models are functionally limited to adding up values assigned to conjunctions/combinations of components that have been seen during training ("conjunction-wise additivity"), and identify novel compositionality failure modes that arise from the data and model structure, even for disentangled inputs. For models in the representation learning (or "rich") regime, we show that networks can generalize on an important non-additive task (associative inference), and give a mechanistic explanation for why. Finally, we validate our theory empirically, showing that it captures the behavior of deep neural networks trained on a set of compositional tasks. In sum, our theory characterizes the principles giving rise to compositional generalization in kernel models and shows how representation learning can overcome their limitations. We further provide a formally grounded, novel generalization class for compositional tasks that highlights fundamental differences in the required learning mechanisms (conjunction-wise additivity).

Related papers

Does Data Scaling Lead to Visual Compositional Generalization? [21.242714408660508]
We find that compositional generalization is driven by data diversity, not mere data scale.<n>We prove this structure is key to efficiency, enabling perfect generalization from few observed combinations.
arXiv Detail & Related papers (2025-07-09T17:59:03Z)
The Coverage Principle: A Framework for Understanding Compositional Generalization [31.762330857169914]
We show that models relying primarily on pattern matching for compositional tasks cannot reliably generalize beyond substituting fragments that yield identical results when used in the same contexts.<n>We demonstrate that this framework has a strong predictive power for the generalization capabilities of Transformers.
arXiv Detail & Related papers (2025-05-26T17:55:15Z)
A Theoretical Analysis of Compositional Generalization in Neural Networks: A Necessary and Sufficient Condition [3.09765163299025]
This paper derives a necessary and sufficient condition for compositional generalization in neural networks.<n> Conceptually, it requires that (i) the computational graph matches the true compositional structure, and (ii) components encode just enough information in training.
arXiv Detail & Related papers (2025-05-05T13:13:46Z)
What makes Models Compositional? A Theoretical View: With Supplement [60.284698521569936]
We propose a general neuro-symbolic definition of compositional functions and their compositional complexity. We show how various existing general and special purpose sequence processing models fit this definition and use it to analyze their compositional complexity.
arXiv Detail & Related papers (2024-05-02T20:10:27Z)
Skews in the Phenomenon Space Hinder Generalization in Text-to-Image Generation [59.138470433237615]
We introduce statistical metrics that quantify both the linguistic and visual skew of a dataset for relational learning. We show that systematically controlled metrics are strongly predictive of generalization performance. This work informs an important direction towards quality-enhancing the data diversity or balance to scaling up the absolute size.
arXiv Detail & Related papers (2024-03-25T03:18:39Z)
Towards Understanding the Relationship between In-context Learning and Compositional Generalization [7.843029855730508]
We train a causal Transformer in a setting that renders ordinary learning very difficult. The model can solve the task, however, by utilizing earlier examples to generalize to later ones. In evaluations on the datasets, SCAN, COGS, and GeoQuery, models trained in this manner indeed show improved compositional generalization.
arXiv Detail & Related papers (2024-03-18T14:45:52Z)
Provable Compositional Generalization for Object-Centric Learning [55.658215686626484]
Learning representations that generalize to novel compositions of known concepts is crucial for bridging the gap between human and machine perception. We show that autoencoders that satisfy structural assumptions on the decoder and enforce encoder-decoder consistency will learn object-centric representations that provably generalize compositionally.
arXiv Detail & Related papers (2023-10-09T01:18:07Z)
Compositional Generalization in Unsupervised Compositional Representation Learning: A Study on Disentanglement and Emergent Language [48.37815764394315]
We study three unsupervised representation learning algorithms on two datasets that allow directly testing compositional generalization. We find that directly using the bottleneck representation with simple models and few labels may lead to worse generalization than using representations from layers before or after the learned representation itself. Surprisingly, we find that increasing pressure to produce a disentangled representation produces representations with worse generalization, while representations from EL models show strong compositional generalization.
arXiv Detail & Related papers (2022-10-02T10:35:53Z)
On Neural Architecture Inductive Biases for Relational Tasks [76.18938462270503]
We introduce a simple architecture based on similarity-distribution scores which we name Compositional Network generalization (CoRelNet) We find that simple architectural choices can outperform existing models in out-of-distribution generalizations.
arXiv Detail & Related papers (2022-06-09T16:24:01Z)
Compositional Generalization Requires Compositional Parsers [69.77216620997305]
We compare sequence-to-sequence models and models guided by compositional principles on the recent COGS corpus. We show structural generalization is a key measure of compositional generalization and requires models that are aware of complex structure.
arXiv Detail & Related papers (2022-02-24T07:36:35Z)
Improving Compositional Generalization in Classification Tasks via Structure Annotations [33.90268697120572]
Humans have a great ability to generalize compositionally, but state-of-the-art neural models struggle to do so. First, we study ways to convert a natural language sequence-to-sequence dataset to a classification dataset that also requires compositional generalization. Second, we show that providing structural hints (specifically, providing parse trees and entity links as attention masks for a Transformer model) helps compositional generalization.
arXiv Detail & Related papers (2021-06-19T06:07:27Z)
Meta-Learning to Compositionally Generalize [34.656819307701156]
We implement a meta-learning augmented version of supervised learning. We construct pairs of tasks for meta-learning by sub-sampling existing training data. Experimental results on the COGS and SCAN datasets show that our similarity-driven meta-learning can improve generalization performance.
arXiv Detail & Related papers (2021-06-08T11:21:48Z)
Compositional Processing Emerges in Neural Networks Solving Math Problems [100.80518350845668]
Recent progress in artificial neural networks has shown that when large models are trained on enough linguistic data, grammatical structure emerges in their representations. We extend this work to the domain of mathematical reasoning, where it is possible to formulate precise hypotheses about how meanings should be composed. Our work shows that neural networks are not only able to infer something about the structured relationships implicit in their training data, but can also deploy this knowledge to guide the composition of individual meanings into composite wholes.
arXiv Detail & Related papers (2021-05-19T07:24:42Z)
Compositional Generalization by Learning Analytical Expressions [87.15737632096378]
A memory-augmented neural model is connected with analytical expressions to achieve compositional generalization. Experiments on the well-known benchmark SCAN demonstrate that our model seizes a great ability of compositional generalization.
arXiv Detail & Related papers (2020-06-18T15:50:57Z)
Does syntax need to grow on trees? Sources of hierarchical inductive bias in sequence-to-sequence networks [28.129220683169052]
In neural network models, inductive biases could in theory arise from any aspect of the model architecture. We investigate which architectural factors affect the generalization behavior of neural sequence-to-sequence models trained on two syntactic tasks.
arXiv Detail & Related papers (2020-01-10T19:02:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.