Towards a Learning Theory of Representation Alignment
- URL: http://arxiv.org/abs/2502.14047v1
- Date: Wed, 19 Feb 2025 19:09:14 GMT
- Title: Towards a Learning Theory of Representation Alignment
- Authors: Francesco Insulla, Shuo Huang, Lorenzo Rosasco,
- Abstract summary: We propose a learning-theoretic perspective to representation alignment.
Our results can be seen as a first step toward casting representation alignment as a learning-theoretic problem.
- Score: 12.166663160280056
- License:
- Abstract: It has recently been argued that AI models' representations are becoming aligned as their scale and performance increase. Empirical analyses have been designed to support this idea and conjecture the possible alignment of different representations toward a shared statistical model of reality. In this paper, we propose a learning-theoretic perspective to representation alignment. First, we review and connect different notions of alignment based on metric, probabilistic, and spectral ideas. Then, we focus on stitching, a particular approach to understanding the interplay between different representations in the context of a task. Our main contribution here is relating properties of stitching to the kernel alignment of the underlying representation. Our results can be seen as a first step toward casting representation alignment as a learning-theoretic problem.
Related papers
- The "Law" of the Unconscious Contrastive Learner: Probabilistic Alignment of Unpaired Modalities [23.188014611990152]
This paper builds on work on the geometry and probabilistic interpretation of contrastive representations.
We show how these representations can answer many of the same inferences as probabilistic graphical models.
Our analysis suggests two new ways of using contrastive representations: in settings with pre-trained contrastive models, and for handling language ambiguity in reinforcement learning.
arXiv Detail & Related papers (2025-01-20T08:10:15Z) - Beyond Scalars: Concept-Based Alignment Analysis in Vision Transformers [10.400355814467401]
Vision transformers (ViTs) can be trained using various learning paradigms, from fully supervised to self-supervised.
We propose a concept-based alignment analysis of representations from four different ViTs.
The concept-based alignment analysis of representations from four different ViTs reveals that increased supervision correlates with a reduction in the semantic structure of learned representations.
arXiv Detail & Related papers (2024-12-09T16:33:28Z) - Take A Step Back: Rethinking the Two Stages in Visual Reasoning [57.16394309170051]
This paper revisits visual reasoning with a two-stage perspective.
It is more efficient to implement symbolization via separated encoders for different data domains while using a shared reasoner.
The proposed two-stage framework achieves impressive generalization ability on various visual reasoning tasks.
arXiv Detail & Related papers (2024-07-29T02:56:19Z) - Learning Visual-Semantic Subspace Representations for Propositional Reasoning [49.17165360280794]
We propose a novel approach for learning visual representations that conform to a specified semantic structure.
Our approach is based on a new nuclear norm-based loss.
We show that its minimum encodes the spectral geometry of the semantics in a subspace lattice.
arXiv Detail & Related papers (2024-05-25T12:51:38Z) - A Probabilistic Model Behind Self-Supervised Learning [53.64989127914936]
In self-supervised learning (SSL), representations are learned via an auxiliary task without annotated labels.
We present a generative latent variable model for self-supervised learning.
We show that several families of discriminative SSL, including contrastive methods, induce a comparable distribution over representations.
arXiv Detail & Related papers (2024-02-02T13:31:17Z) - Understanding Probe Behaviors through Variational Bounds of Mutual
Information [53.520525292756005]
We provide guidelines for linear probing by constructing a novel mathematical framework leveraging information theory.
First, we connect probing with the variational bounds of mutual information (MI) to relax the probe design, equating linear probing with fine-tuning.
We show that the intermediate representations can have the biggest MI estimate because of the tradeoff between better separability and decreasing MI.
arXiv Detail & Related papers (2023-12-15T18:38:18Z) - Pointwise Representational Similarity [14.22332335495585]
Pointwise Normalized Kernel Alignment (PNKA) is a measure that quantifies how similarly an individual input is represented in two representation spaces.
We show how PNKA can be leveraged to develop a deeper understanding of (a) the input examples that are likely to be misclassified, (b) the concepts encoded by (individual) neurons in a layer, and (c) the effects of fairness interventions on learned representations.
arXiv Detail & Related papers (2023-05-30T09:40:08Z) - Resolving label uncertainty with implicit posterior models [71.62113762278963]
We propose a method for jointly inferring labels across a collection of data samples.
By implicitly assuming the existence of a generative model for which a differentiable predictor is the posterior, we derive a training objective that allows learning under weak beliefs.
arXiv Detail & Related papers (2022-02-28T18:09:44Z) - RELAX: Representation Learning Explainability [10.831313203043514]
We propose RELAX, which is the first approach for attribution-based explanations of representations.
ReLAX explains representations by measuring similarities in the representation space between an input and masked out versions of itself.
We provide theoretical interpretations of RELAX and conduct a novel analysis of feature extractors trained using supervised and unsupervised learning.
arXiv Detail & Related papers (2021-12-19T14:51:31Z) - Weakly-Supervised Disentanglement Without Compromises [53.55580957483103]
Intelligent agents should be able to learn useful representations by observing changes in their environment.
We model such observations as pairs of non-i.i.d. images sharing at least one of the underlying factors of variation.
We show that only knowing how many factors have changed, but not which ones, is sufficient to learn disentangled representations.
arXiv Detail & Related papers (2020-02-07T16:39:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.