Related papers: Compositional Generalization Requires Linear, Orthogonal Representations in Vision Embedding Models

Compositional Generalization Requires Linear, Orthogonal Representations in Vision Embedding Models

URL: http://arxiv.org/abs/2602.24264v1
Date: Fri, 27 Feb 2026 18:32:31 GMT
Title: Compositional Generalization Requires Linear, Orthogonal Representations in Vision Embedding Models
Authors: Arnas Uselis, Andrea Dittadi, Seong Joon Oh,
Abstract summary: We formalize three desiderata for compositional generalization under standard training.<n>We show that representations must decompose linearly into per-concept components.<n>We derive dimension bounds linking the number of composable concepts to the embedding geometry.
Score: 26.74984398469168
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Compositional generalization, the ability to recognize familiar parts in novel contexts, is a defining property of intelligent systems. Although modern models are trained on massive datasets, they still cover only a tiny fraction of the combinatorial space of possible inputs, raising the question of what structure representations must have to support generalization to unseen combinations. We formalize three desiderata for compositional generalization under standard training (divisibility, transferability, stability) and show they impose necessary geometric constraints: representations must decompose linearly into per-concept components, and these components must be orthogonal across concepts. This provides theoretical grounding for the Linear Representation Hypothesis: the linear structure widely observed in neural representations is a necessary consequence of compositional generalization. We further derive dimension bounds linking the number of composable concepts to the embedding geometry. Empirically, we evaluate these predictions across modern vision models (CLIP, SigLIP, DINO) and find that representations exhibit partial linear factorization with low-rank, near-orthogonal per-concept factors, and that the degree of this structure correlates with compositional generalization on unseen combinations. As models continue to scale, these conditions predict the representational geometry they may converge to. Code is available at https://github.com/oshapio/necessary-compositionality.

Related papers

The Representational Geometry of Number [1.5994376682356057]
We show that number representations preserve a stable relational structure across tasks.<n>We find that task-specific representations are embedded in distinct subspaces, with low-level features like magnitude encoded along separable linear directions.<n>It suggests that understanding arises when task-specific transformations are applied to a shared underlying relational structure of conceptual representations.
arXiv Detail & Related papers (2026-02-06T16:35:22Z)
Native Logical and Hierarchical Representations with Subspace Embeddings [25.274936769664098]
We introduce a novel paradigm: embedding concepts as linear subspaces.<n>It naturally supports set-theoretic operations like intersection (conjunction) and linear sum (disjunction)<n>Our method achieves state-of-the-art results in reconstruction and link prediction on WordNet.
arXiv Detail & Related papers (2025-08-21T18:29:17Z)
Generalized Linear Mode Connectivity for Transformers [87.32299363530996]
A striking phenomenon is linear mode connectivity (LMC), where independently trained models can be connected by low- or zero-loss paths.<n>Prior work has predominantly focused on neuron re-ordering through permutations, but such approaches are limited in scope.<n>We introduce a unified framework that captures four symmetry classes: permutations, semi-permutations, transformations, and general invertible maps.<n>This generalization enables, for the first time, the discovery of low- and zero-barrier linear paths between independently trained Vision Transformers and GPT-2 models.
arXiv Detail & Related papers (2025-06-28T01:46:36Z)
The Origins of Representation Manifolds in Large Language Models [52.68554895844062]
We show that cosine similarity in representation space may encode the intrinsic geometry of a feature through shortest, on-manifold paths.<n>The critical assumptions and predictions of the theory are validated on text embeddings and token activations of large language models.
arXiv Detail & Related papers (2025-05-23T13:31:22Z)
Directional Non-Commutative Monoidal Structures for Compositional Embeddings in Machine Learning [0.0]
We introduce a new structure for compositional embeddings built on directional non-commutative monoidal operators.<n>Our construction defines a distinct composition operator circ_i for each axis i, ensuring associative combination along each axis without imposing global commutativity.<n>All axis-specific operators commute with one another, enforcing a global interchange law that enables consistent crossaxis compositions.
arXiv Detail & Related papers (2025-05-21T13:27:14Z)
A Theoretical Analysis of Compositional Generalization in Neural Networks: A Necessary and Sufficient Condition [3.09765163299025]
This paper derives a necessary and sufficient condition for compositional generalization in neural networks.<n> Conceptually, it requires that (i) the computational graph matches the true compositional structure, and (ii) components encode just enough information in training.
arXiv Detail & Related papers (2025-05-05T13:13:46Z)
Compositional Structures in Neural Embedding and Interaction Decompositions [101.40245125955306]
We describe a basic correspondence between linear algebraic structures within vector embeddings in artificial neural networks. We introduce a characterization of compositional structures in terms of "interaction decompositions" We establish necessary and sufficient conditions for the presence of such structures within the representations of a model.
arXiv Detail & Related papers (2024-07-12T02:39:50Z)
When does compositional structure yield compositional generalization? A kernel theory [0.0]
We present a theory of compositional generalization in kernel models with fixed, compositionally structured representations.<n>We identify novel failure modes in compositional generalization that arise from biases in the training data.<n>This work examines how statistical structure in the training data can affect compositional generalization.
arXiv Detail & Related papers (2024-05-26T00:50:11Z)
What makes Models Compositional? A Theoretical View: With Supplement [60.284698521569936]
We propose a general neuro-symbolic definition of compositional functions and their compositional complexity. We show how various existing general and special purpose sequence processing models fit this definition and use it to analyze their compositional complexity.
arXiv Detail & Related papers (2024-05-02T20:10:27Z)
On Provable Length and Compositional Generalization [7.883808173871223]
We provide first provable guarantees on length and compositional generalization for common sequence-to-sequence models.<n>We show that emphlimited capacity versions of these different architectures achieve both length and compositional generalization.
arXiv Detail & Related papers (2024-02-07T14:16:28Z)
Linear Spaces of Meanings: Compositional Structures in Vision-Language Models [110.00434385712786]
We investigate compositional structures in data embeddings from pre-trained vision-language models (VLMs) We first present a framework for understanding compositional structures from a geometric perspective. We then explain what these structures entail probabilistically in the case of VLM embeddings, providing intuitions for why they arise in practice.
arXiv Detail & Related papers (2023-02-28T08:11:56Z)
Frame Averaging for Equivariant Shape Space Learning [85.42901997467754]
A natural way to incorporate symmetries in shape space learning is to ask that the mapping to the shape space (encoder) and mapping from the shape space (decoder) are equivariant to the relevant symmetries. We present a framework for incorporating equivariance in encoders and decoders by introducing two contributions.
arXiv Detail & Related papers (2021-12-03T06:41:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.