The Platonic Representation Hypothesis
- URL: http://arxiv.org/abs/2405.07987v5
- Date: Thu, 25 Jul 2024 09:33:50 GMT
- Title: The Platonic Representation Hypothesis
- Authors: Minyoung Huh, Brian Cheung, Tongzhou Wang, Phillip Isola,
- Abstract summary: We argue that representations in AI models, particularly deep networks, are converging.
As vision models and language models get larger, they measure distance between datapoints in a more and more alike way.
We hypothesize that this convergence is driving toward a shared statistical model of reality, akin to Plato's concept of an ideal reality.
- Score: 35.16414255187554
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We argue that representations in AI models, particularly deep networks, are converging. First, we survey many examples of convergence in the literature: over time and across multiple domains, the ways by which different neural networks represent data are becoming more aligned. Next, we demonstrate convergence across data modalities: as vision models and language models get larger, they measure distance between datapoints in a more and more alike way. We hypothesize that this convergence is driving toward a shared statistical model of reality, akin to Plato's concept of an ideal reality. We term such a representation the platonic representation and discuss several possible selective pressures toward it. Finally, we discuss the implications of these trends, their limitations, and counterexamples to our analysis.
Related papers
- Revisiting the Platonic Representation Hypothesis: An Aristotelian View [3.647057737530591]
We show that the existing metrics used to measure representational similarity are confounded by network scale.<n>We introduce a permutation-based null-calibration framework that transforms any representational similarity metric into a calibrated score with statistical guarantees.<n>We propose the Aristotelian Representation Hypothesis: representations in neural networks are converging to shared local neighborhood relationships.
arXiv Detail & Related papers (2026-02-16T06:01:23Z) - When Does Closeness in Distribution Imply Representational Similarity? An Identifiability Perspective [9.871955852117912]
We prove that a small Kullback-Leibler divergence between the model distributions does not guarantee that the corresponding representations are similar.<n>We then define a distributional distance for which closeness implies representational similarity.<n>In synthetic experiments, we find that wider networks learn distributions which are closer with respect to our distance and have more similar representations.
arXiv Detail & Related papers (2025-06-04T09:44:22Z) - Human-Object Interaction Detection Collaborated with Large Relation-driven Diffusion Models [65.82564074712836]
We introduce DIFfusionHOI, a new HOI detector shedding light on text-to-image diffusion models.
We first devise an inversion-based strategy to learn the expression of relation patterns between humans and objects in embedding space.
These learned relation embeddings then serve as textual prompts, to steer diffusion models generate images that depict specific interactions.
arXiv Detail & Related papers (2024-10-26T12:00:33Z) - A Generalized Model for Multidimensional Intransitivity [26.127246746317958]
We propose a probabilistic model that jointly learns each player's d-dimensional representation (d>1) and a dataset-specific metric space.
We show that our proposed method outperforms several competing methods in terms of prediction accuracy.
arXiv Detail & Related papers (2024-09-28T11:48:34Z) - On the Origins of Linear Representations in Large Language Models [51.88404605700344]
We introduce a simple latent variable model to formalize the concept dynamics of the next token prediction.
Experiments show that linear representations emerge when learning from data matching the latent variable model.
We additionally confirm some predictions of the theory using the LLaMA-2 large language model.
arXiv Detail & Related papers (2024-03-06T17:17:36Z) - Towards Theoretical Understandings of Self-Consuming Generative Models [56.84592466204185]
This paper tackles the emerging challenge of training generative models within a self-consuming loop.
We construct a theoretical framework to rigorously evaluate how this training procedure impacts the data distributions learned by future models.
We present results for kernel density estimation, delivering nuanced insights such as the impact of mixed data training on error propagation.
arXiv Detail & Related papers (2024-02-19T02:08:09Z) - Beyond DAGs: A Latent Partial Causal Model for Multimodal Learning [80.44084021062105]
We propose a novel latent partial causal model for multimodal data, featuring two latent coupled variables, connected by an undirected edge, to represent the transfer of knowledge across modalities.<n>Under specific statistical assumptions, we establish an identifiability result, demonstrating that representations learned by multimodal contrastive learning correspond to the latent coupled variables up to a trivial transformation.<n>Experiments on a pre-trained CLIP model embodies disentangled representations, enabling few-shot learning and improving domain generalization across diverse real-world datasets.
arXiv Detail & Related papers (2024-02-09T07:18:06Z) - Experimental Observations of the Topology of Convolutional Neural
Network Activations [2.4235626091331737]
Topological data analysis provides compact, noise-robust representations of complex structures.
Deep neural networks (DNNs) learn millions of parameters associated with a series of transformations defined by the model architecture.
In this paper, we apply cutting edge techniques from TDA with the goal of gaining insight into the interpretability of convolutional neural networks used for image classification.
arXiv Detail & Related papers (2022-12-01T02:05:44Z) - Geometric and Topological Inference for Deep Representations of Complex
Networks [13.173307471333619]
We present a class of statistics that emphasize the topology as well as the geometry of representations.
We evaluate these statistics in terms of the sensitivity and specificity that they afford when used for model selection.
These new methods enable brain and computer scientists to visualize the dynamic representational transformations learned by brains and models.
arXiv Detail & Related papers (2022-03-10T17:14:14Z) - Mind the Gap: Understanding the Modality Gap in Multi-modal Contrastive
Representation Learning [35.25854322376364]
We show that different data modalities are embedded at arm's length in their shared representation in multi-modal models such as CLIP.
In contrastive learning keeps the different modalities separate by a certain distance, which is influenced by the temperature parameter in the loss function.
Our experiments further demonstrate that varying the modality gap distance has a significant impact in improving the model's downstream zero-shot classification performance and fairness.
arXiv Detail & Related papers (2022-03-03T22:53:54Z) - Representation Topology Divergence: A Method for Comparing Neural
Network Representations [10.74105109486386]
We introduce the Top Representationology Divergence (RTD), measuring the dissimilarity in multi-scale topology between two point clouds of equal size.
Experiments show that the proposed RTD agrees with the intuitive assessment of data representation similarity and is sensitive to its topological structure.
arXiv Detail & Related papers (2021-12-31T21:08:56Z) - Perceptual Score: What Data Modalities Does Your Model Perceive? [73.75255606437808]
We introduce the perceptual score, a metric that assesses the degree to which a model relies on the different subsets of the input features.
We find that recent, more accurate multi-modal models for visual question-answering tend to perceive the visual data less than their predecessors.
Using the perceptual score also helps to analyze model biases by decomposing the score into data subset contributions.
arXiv Detail & Related papers (2021-10-27T12:19:56Z) - Deep Co-Attention Network for Multi-View Subspace Learning [73.3450258002607]
We propose a deep co-attention network for multi-view subspace learning.
It aims to extract both the common information and the complementary information in an adversarial setting.
In particular, it uses a novel cross reconstruction loss and leverages the label information to guide the construction of the latent representation.
arXiv Detail & Related papers (2021-02-15T18:46:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.