Related papers: When Does Closeness in Distribution Imply Representational Similarity? An Identifiability Perspective

When Does Closeness in Distribution Imply Representational Similarity? An Identifiability Perspective

URL: http://arxiv.org/abs/2506.03784v1
Date: Wed, 04 Jun 2025 09:44:22 GMT
Title: When Does Closeness in Distribution Imply Representational Similarity? An Identifiability Perspective
Authors: Beatrix M. G. Nielsen, Emanuele Marconato, Andrea Dittadi, Luigi Gresele,
Abstract summary: We prove that a small Kullback-Leibler divergence between the model distributions does not guarantee that the corresponding representations are similar.<n>We then define a distributional distance for which closeness implies representational similarity.<n>In synthetic experiments, we find that wider networks learn distributions which are closer with respect to our distance and have more similar representations.
Score: 9.871955852117912
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: When and why representations learned by different deep neural networks are similar is an active research topic. We choose to address these questions from the perspective of identifiability theory, which suggests that a measure of representational similarity should be invariant to transformations that leave the model distribution unchanged. Focusing on a model family which includes several popular pre-training approaches, e.g., autoregressive language models, we explore when models which generate distributions that are close have similar representations. We prove that a small Kullback-Leibler divergence between the model distributions does not guarantee that the corresponding representations are similar. This has the important corollary that models arbitrarily close to maximizing the likelihood can still learn dissimilar representations, a phenomenon mirrored in our empirical observations on models trained on CIFAR-10. We then define a distributional distance for which closeness implies representational similarity, and in synthetic experiments, we find that wider networks learn distributions which are closer with respect to our distance and have more similar representations. Our results establish a link between closeness in distribution and representational similarity.

Related papers

Connecting Neural Models Latent Geometries with Relative Geodesic Representations [21.71782603770616]
We show that when a latent structure is shared between distinct latent spaces, relative distances between representations can be preserved, up to distortions.<n>We assume that distinct neural models parametrize approximately the same underlying manifold, and introduce a representation based on the pullback metric.<n>We validate our method on model stitching and retrieval tasks, covering autoencoders and vision foundation discriminative models.
arXiv Detail & Related papers (2025-06-02T12:34:55Z)
Linear Representation Transferability Hypothesis: Leveraging Small Models to Steer Large Models [6.390475802910619]
We show that representations learned across models trained on the same data can be expressed as linear combinations of a emphuniversal set of basis features.<n>These basis features underlie the learning task itself and remain consistent across models, regardless of scale.
arXiv Detail & Related papers (2025-05-31T17:45:18Z)
A solvable generative model with a linear, one-step denoiser [0.0]
We develop an analytically tractable single-step diffusion model based on a linear denoiser.<n>We show that the monotonic fall phase of Kullback-Leibler divergence begins when the training dataset size reaches the dimension of the data points.
arXiv Detail & Related papers (2024-11-26T19:00:01Z)
Conjuring Semantic Similarity [59.18714889874088]
The semantic similarity between two textual expressions measures the distance between their latent'meaning'<n>We propose a novel approach whereby the semantic similarity among textual expressions is based not on other expressions they can be rephrased as, but rather based on the imagery they evoke.<n>Our method contributes a novel perspective on semantic similarity that not only aligns with human-annotated scores, but also opens up new avenues for the evaluation of text-conditioned generative models.
arXiv Detail & Related papers (2024-10-21T18:51:34Z)
InterHandGen: Two-Hand Interaction Generation via Cascaded Reverse Diffusion [53.90516061351706]
We present InterHandGen, a novel framework that learns the generative prior of two-hand interaction. For sampling, we combine anti-penetration and synthesis-free guidance to enable plausible generation. Our method significantly outperforms baseline generative models in terms of plausibility and diversity.
arXiv Detail & Related papers (2024-03-26T06:35:55Z)
Counting Like Human: Anthropoid Crowd Counting on Modeling the Similarity of Objects [92.80955339180119]
mainstream crowd counting methods regress density map and integrate it to obtain counting results. Inspired by this, we propose a rational and anthropoid crowd counting framework.
arXiv Detail & Related papers (2022-12-02T07:00:53Z)
Neural Representations Reveal Distinct Modes of Class Fitting in Residual Convolutional Networks [5.1271832547387115]
We leverage probabilistic models of neural representations to investigate how residual networks fit classes. We find that classes in the investigated models are not fitted in an uniform way. We show that the uncovered structure in neural representations correlate with robustness of training examples and adversarial memorization.
arXiv Detail & Related papers (2022-12-01T18:55:58Z)
Accuracy on the Line: On the Strong Correlation Between Out-of-Distribution and In-Distribution Generalization [89.73665256847858]
We show that out-of-distribution performance is strongly correlated with in-distribution performance for a wide range of models and distribution shifts. Specifically, we demonstrate strong correlations between in-distribution and out-of-distribution performance on variants of CIFAR-10 & ImageNet. We also investigate cases where the correlation is weaker, for instance some synthetic distribution shifts from CIFAR-10-C and the tissue classification dataset Camelyon17-WILDS.
arXiv Detail & Related papers (2021-07-09T19:48:23Z)
Predicting with Confidence on Unseen Distributions [90.68414180153897]
We connect domain adaptation and predictive uncertainty literature to predict model accuracy on challenging unseen distributions. We find that the difference of confidences (DoC) of a classifier's predictions successfully estimates the classifier's performance change over a variety of shifts. We specifically investigate the distinction between synthetic and natural distribution shifts and observe that despite its simplicity DoC consistently outperforms other quantifications of distributional difference.
arXiv Detail & Related papers (2021-07-07T15:50:18Z)
Why do classifier accuracies show linear trends under distribution shift? [58.40438263312526]
accuracies of models on one data distribution are approximately linear functions of the accuracies on another distribution. We assume the probability that two models agree in their predictions is higher than what we can infer from their accuracy levels alone. We show that a linear trend must occur when evaluating models on two distributions unless the size of the distribution shift is large.
arXiv Detail & Related papers (2020-12-31T07:24:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.