Barycentric alignment for instance-level comparison of neural representations
- URL: http://arxiv.org/abs/2602.09225v1
- Date: Mon, 09 Feb 2026 21:49:44 GMT
- Title: Barycentric alignment for instance-level comparison of neural representations
- Authors: Shreya Saha, Zoe Wanying He, Meenakshi Khosla,
- Abstract summary: We introduce a barycentric alignment framework that quotients out nuisance symmetries to construct a universal embedding space across many models.<n>We identify systematic input properties that predict representational convergence versus divergence across vision and language model families.<n>We also apply the same barycentric alignment framework to purely unimodal vision and language models and find that post-hoc alignment into a shared space yields image text similarity scores.
- Score: 2.1920579994942164
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Comparing representations across neural networks is challenging because representations admit symmetries, such as arbitrary reordering of units or rotations of activation space, that obscure underlying equivalence between models. We introduce a barycentric alignment framework that quotients out these nuisance symmetries to construct a universal embedding space across many models. Unlike existing similarity measures, which summarize relationships over entire stimulus sets, this framework enables similarity to be defined at the level of individual stimuli, revealing inputs that elicit convergent versus divergent representations across models. Using this instance-level notion of similarity, we identify systematic input properties that predict representational convergence versus divergence across vision and language model families. We also construct universal embedding spaces for brain representations across individuals and cortical regions, enabling instance-level comparison of representational agreement across stages of the human visual hierarchy. Finally, we apply the same barycentric alignment framework to purely unimodal vision and language models and find that post-hoc alignment into a shared space yields image text similarity scores that closely track human cross-modal judgments and approach the performance of contrastively trained vision-language models. This strikingly suggests that independently learned representations already share sufficient geometric structure for human-aligned cross-modal comparison. Together, these results show that resolving representational similarity at the level of individual stimuli reveals phenomena that cannot be detected by set-level comparison metrics.
Related papers
- Revisiting the Platonic Representation Hypothesis: An Aristotelian View [3.647057737530591]
We show that the existing metrics used to measure representational similarity are confounded by network scale.<n>We introduce a permutation-based null-calibration framework that transforms any representational similarity metric into a calibrated score with statistical guarantees.<n>We propose the Aristotelian Representation Hypothesis: representations in neural networks are converging to shared local neighborhood relationships.
arXiv Detail & Related papers (2026-02-16T06:01:23Z) - Representations in vision and language converge in a shared, multidimensional space of perceived similarities [0.0]
We show that visual and linguistic similarity judgements converge at the behavioural level.<n>We also predict a remarkably similar network of fMRI brain responses evoked by viewing the natural scene images.<n>These findings demonstrate that human visual and linguistic similarity judgements are grounded in a shared, modality-agnostic representational structure.
arXiv Detail & Related papers (2025-07-29T14:42:31Z) - Evaluating Representational Similarity Measures from the Lens of Functional Correspondence [3.1883014716361635]
Neuroscience and artificial intelligence (AI) both face the challenge of interpreting high-dimensional neural data.<n>Despite the widespread use of representational comparisons, a critical question remains: which metrics are most suitable for these comparisons?
arXiv Detail & Related papers (2024-11-21T23:53:58Z) - Objective drives the consistency of representational similarity across datasets [19.99817888941361]
We propose a systematic way to measure how representational similarity between models varies with the set of stimuli used to construct the representations.<n>Self-supervised vision models learn representations whose relative pairwise similarities generalize better from one dataset to another.<n>Our work provides a framework for analyzing similarities of model representations across datasets and linking those similarities to differences in task behavior.
arXiv Detail & Related papers (2024-11-08T13:35:45Z) - Conjuring Semantic Similarity [59.18714889874088]
The semantic similarity between two textual expressions measures the distance between their latent'meaning'<n>We propose a novel approach whereby the semantic similarity among textual expressions is based not on other expressions they can be rephrased as, but rather based on the imagery they evoke.<n>Our method contributes a novel perspective on semantic similarity that not only aligns with human-annotated scores, but also opens up new avenues for the evaluation of text-conditioned generative models.
arXiv Detail & Related papers (2024-10-21T18:51:34Z) - Bayesian Unsupervised Disentanglement of Anatomy and Geometry for Deep Groupwise Image Registration [59.062085785106234]
This article presents a general Bayesian learning framework for multi-modal groupwise image registration.<n>We propose a novel hierarchical variational auto-encoding architecture to realise the inference procedure of the latent variables.<n>Experiments were conducted to validate the proposed framework, including four different datasets from cardiac, brain, and abdominal medical images.
arXiv Detail & Related papers (2024-01-04T08:46:39Z) - Counting Like Human: Anthropoid Crowd Counting on Modeling the
Similarity of Objects [92.80955339180119]
mainstream crowd counting methods regress density map and integrate it to obtain counting results.
Inspired by this, we propose a rational and anthropoid crowd counting framework.
arXiv Detail & Related papers (2022-12-02T07:00:53Z) - Attributable Visual Similarity Learning [90.69718495533144]
This paper proposes an attributable visual similarity learning (AVSL) framework for a more accurate and explainable similarity measure between images.
Motivated by the human semantic similarity cognition, we propose a generalized similarity learning paradigm to represent the similarity between two images with a graph.
Experiments on the CUB-200-2011, Cars196, and Stanford Online Products datasets demonstrate significant improvements over existing deep similarity learning methods.
arXiv Detail & Related papers (2022-03-28T17:35:31Z) - Image Synthesis via Semantic Composition [74.68191130898805]
We present a novel approach to synthesize realistic images based on their semantic layouts.
It hypothesizes that for objects with similar appearance, they share similar representation.
Our method establishes dependencies between regions according to their appearance correlation, yielding both spatially variant and associated representations.
arXiv Detail & Related papers (2021-09-15T02:26:07Z) - Cross-Modal Discrete Representation Learning [73.68393416984618]
We present a self-supervised learning framework that learns a representation that captures finer levels of granularity across different modalities.
Our framework relies on a discretized embedding space created via vector quantization that is shared across different modalities.
arXiv Detail & Related papers (2021-06-10T00:23:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.