Related papers: Logit Distance Bounds Representational Similarity

Logit Distance Bounds Representational Similarity

URL: http://arxiv.org/abs/2602.15438v2
Date: Wed, 18 Feb 2026 23:15:54 GMT
Title: Logit Distance Bounds Representational Similarity
Authors: Beatrix M. G. Nielsen, Emanuele Marconato, Luigi Gresele, Andrea Dittadi, Simon Buchholz,
Abstract summary: We study a distributional distance based on logit differences and show that closeness in this distance does yield linear similarity guarantees.<n>We further show that, when model probabilities are bounded away from zero, KL divergence upper-bounds logit distance; yet the resulting bound fails to provide nontrivial control in practice.
Score: 18.79873056204737
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: For a broad family of discriminative models that includes autoregressive language models, identifiability results imply that if two models induce the same conditional distributions, then their internal representations agree up to an invertible linear transformation. We ask whether an analogous conclusion holds approximately when the distributions are close instead of equal. Building on the observation of Nielsen et al. (2025) that closeness in KL divergence need not imply high linear representational similarity, we study a distributional distance based on logit differences and show that closeness in this distance does yield linear similarity guarantees. Specifically, we define a representational dissimilarity measure based on the models' identifiability class and prove that it is bounded by the logit distance. We further show that, when model probabilities are bounded away from zero, KL divergence upper-bounds logit distance; yet the resulting bound fails to provide nontrivial control in practice. As a consequence, KL-based distillation can match a teacher's predictions while failing to preserve linear representational properties, such as linear-probe recoverability of human-interpretable concepts. In distillation experiments on synthetic and image datasets, logit-distance distillation yields students with higher linear representational similarity and better preservation of the teacher's linearly recoverable concepts.

Related papers

Nonparametric Data Attribution for Diffusion Models [57.820618036556084]
Data attribution for generative models seeks to quantify the influence of individual training examples on model outputs.<n>We propose a nonparametric attribution method that operates entirely on data, measuring influence via patch-level similarity between generated and training images.
arXiv Detail & Related papers (2025-10-16T03:37:16Z)
When Does Closeness in Distribution Imply Representational Similarity? An Identifiability Perspective [9.578534178372829]
We prove that a small Kullback--Leibler divergence between the model distributions does not guarantee that the corresponding representations are similar.<n>We then define a distributional distance for which closeness implies representational similarity.<n>In synthetic experiments, we find that wider networks learn distributions which are closer with respect to our distance and have more similar representations.
arXiv Detail & Related papers (2025-06-04T09:44:22Z)
All or None: Identifiable Linear Properties of Next-token Predictors in Language Modeling [7.334847424898197]
We analyze identifiability as a possible explanation for the ubiquity of linear properties across language models.<n>We show that under suitable conditions, these linear properties either hold in all or none distribution-equivalent next-token predictors.
arXiv Detail & Related papers (2024-10-30T23:19:29Z)
Understanding Probe Behaviors through Variational Bounds of Mutual Information [53.520525292756005]
We provide guidelines for linear probing by constructing a novel mathematical framework leveraging information theory. First, we connect probing with the variational bounds of mutual information (MI) to relax the probe design, equating linear probing with fine-tuning. We show that the intermediate representations can have the biggest MI estimate because of the tradeoff between better separability and decreasing MI.
arXiv Detail & Related papers (2023-12-15T18:38:18Z)
Sparsified Simultaneous Confidence Intervals for High-Dimensional Linear Models [4.675899216825188]
We propose a notion of simultaneous confidence intervals called the sparsified simultaneous confidence intervals.<n>Our intervals are sparse in the sense that some of the intervals' upper and lower bounds are shrunken to zero.<n>The proposed method can be coupled with various selection procedures, making it ideal for comparing their uncertainty.
arXiv Detail & Related papers (2023-07-14T18:37:57Z)
Score-based Continuous-time Discrete Diffusion Models [102.65769839899315]
We extend diffusion models to discrete variables by introducing a Markov jump process where the reverse process denoises via a continuous-time Markov chain. We show that an unbiased estimator can be obtained via simple matching the conditional marginal distributions. We demonstrate the effectiveness of the proposed method on a set of synthetic and real-world music and image benchmarks.
arXiv Detail & Related papers (2022-11-30T05:33:29Z)
Beyond Instance Discrimination: Relation-aware Contrastive Self-supervised Learning [75.46664770669949]
We present relation-aware contrastive self-supervised learning (ReCo) to integrate instance relations. Our ReCo consistently gains remarkable performance improvements.
arXiv Detail & Related papers (2022-11-02T03:25:28Z)
BELIEF in Dependence: Leveraging Atomic Linearity in Data Bits for Rethinking Generalized Linear Models [5.726186905478233]
We develop a framework called binary expansion linear effect (BELIEF) for understanding arbitrary relationships with a binary outcome.<n>Models from the BELIEF framework are easily interpretable because they describe the association of binary variables in the language of linear models.
arXiv Detail & Related papers (2022-10-19T19:28:09Z)
Why do classifier accuracies show linear trends under distribution shift? [58.40438263312526]
accuracies of models on one data distribution are approximately linear functions of the accuracies on another distribution. We assume the probability that two models agree in their predictions is higher than what we can infer from their accuracy levels alone. We show that a linear trend must occur when evaluating models on two distributions unless the size of the distribution shift is large.
arXiv Detail & Related papers (2020-12-31T07:24:30Z)
Learning Disentangled Representations with Latent Variation Predictability [102.4163768995288]
This paper defines the variation predictability of latent disentangled representations. Within an adversarial generation process, we encourage variation predictability by maximizing the mutual information between latent variations and corresponding image pairs. We develop an evaluation metric that does not rely on the ground-truth generative factors to measure the disentanglement of latent representations.
arXiv Detail & Related papers (2020-07-25T08:54:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.