Quantifying task-relevant representational similarity using decision variable correlation
- URL: http://arxiv.org/abs/2506.02164v1
- Date: Mon, 02 Jun 2025 18:45:05 GMT
- Title: Quantifying task-relevant representational similarity using decision variable correlation
- Authors: Yu, Qian, Wilson S. Geisler, Xue-Xin Wei,
- Abstract summary: We propose a new approach to characterize the similarity of the decision strategies of two observers (models or brains) using decision correlation variable (DVC)<n>We evaluate this method using monkey V4/IT recordings and models trained on image classification tasks.<n>We find that model--model similarity is comparable to monkey--monkey similarity, whereas model--monkey similarity is consistently lower and, surprisingly, decreases with increasing ImageNet-1k performance.
- Score: 6.323629460880451
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Previous studies have compared the brain and deep neural networks trained on image classification. Intriguingly, while some suggest that their representations are highly similar, others argued the opposite. Here, we propose a new approach to characterize the similarity of the decision strategies of two observers (models or brains) using decision variable correlation (DVC). DVC quantifies the correlation between decoded decisions on individual samples in a classification task and thus can capture task-relevant information rather than general representational alignment. We evaluate this method using monkey V4/IT recordings and models trained on image classification tasks. We find that model--model similarity is comparable to monkey--monkey similarity, whereas model--monkey similarity is consistently lower and, surprisingly, decreases with increasing ImageNet-1k performance. While adversarial training enhances robustness, it does not improve model--monkey similarity in task-relevant dimensions; however, it markedly increases model--model similarity. Similarly, pre-training on larger datasets does not improve model--monkey similarity. These results suggest a fundamental divergence between the task-relevant representations in monkey V4/IT and those learned by models trained on image classification tasks.
Related papers
- Similarity of Processing Steps in Vision Model Representations [5.816800536560847]
We study the processes that lead to convergent representations in different models.<n>We find that layers at similar positions in different models have the most similar representations, strong differences remain.<n>CNN and transformer-based models also behave differently, with transformer models applying smoother changes to representations from one layer to the next.
arXiv Detail & Related papers (2026-01-29T12:24:28Z) - Reinforcing Pre-trained Models Using Counterfactual Images [54.26310919385808]
This paper proposes a novel framework to reinforce classification models using language-guided generated counterfactual images.
We identify model weaknesses by testing the model using the counterfactual image dataset.
We employ the counterfactual images as an augmented dataset to fine-tune and reinforce the classification model.
arXiv Detail & Related papers (2024-06-19T08:07:14Z) - LeOCLR: Leveraging Original Images for Contrastive Learning of Visual Representations [4.680881326162484]
Contrastive instance discrimination methods outperform supervised learning in downstream tasks such as image classification and object detection.<n>A common augmentation technique in contrastive learning is random cropping followed by resizing.<n>We introduce LeOCLR, a framework that employs a novel instance discrimination approach and an adapted loss function.
arXiv Detail & Related papers (2024-03-11T15:33:32Z) - Image Similarity using An Ensemble of Context-Sensitive Models [2.9490616593440317]
We present a more intuitive approach to build and compare image similarity models based on labelled data.
We address the challenges of sparse sampling in the image space (R, A, B) and biases in the models trained with context-based data.
Our testing results show that the ensemble model constructed performs 5% better than the best individual context-sensitive models.
arXiv Detail & Related papers (2024-01-15T20:23:05Z) - Equivariant Similarity for Vision-Language Foundation Models [134.77524524140168]
This study focuses on the multimodal similarity function that is not only the major training objective but also the core delivery to support downstream tasks.
We propose EqSim, a regularization loss that can be efficiently calculated from any two matched training pairs.
Compared to the existing evaluation sets, EqBen is the first to focus on "visual-minimal change"
arXiv Detail & Related papers (2023-03-25T13:22:56Z) - Counting Like Human: Anthropoid Crowd Counting on Modeling the
Similarity of Objects [92.80955339180119]
mainstream crowd counting methods regress density map and integrate it to obtain counting results.
Inspired by this, we propose a rational and anthropoid crowd counting framework.
arXiv Detail & Related papers (2022-12-02T07:00:53Z) - IMACS: Image Model Attribution Comparison Summaries [16.80986701058596]
We introduce IMACS, a method that combines gradient-based model attributions with aggregation and visualization techniques.
IMACS extracts salient input features from an evaluation dataset, clusters them based on similarity, then visualizes differences in model attributions for similar input features.
We show how our technique can uncover behavioral differences caused by domain shift between two models trained on satellite images.
arXiv Detail & Related papers (2022-01-26T21:35:14Z) - Instance-Level Relative Saliency Ranking with Graph Reasoning [126.09138829920627]
We present a novel unified model to segment salient instances and infer relative saliency rank order.
A novel loss function is also proposed to effectively train the saliency ranking branch.
experimental results demonstrate that our proposed model is more effective than previous methods.
arXiv Detail & Related papers (2021-07-08T13:10:42Z) - The Effects of Image Distribution and Task on Adversarial Robustness [4.597864989500202]
We propose an adaptation to the area under the curve (AUC) metric to measure the adversarial robustness of a model.
We used this adversarial robustness metric on models of an MNIST, CIFAR-10, and a Fusion dataset.
arXiv Detail & Related papers (2021-02-21T07:15:50Z) - Towards Visually Explaining Similarity Models [29.704524987493766]
We present a method to generate gradient-based visual attention for image similarity predictors.
By relying solely on the learned feature embedding, we show that our approach can be applied to any kind of CNN-based similarity architecture.
We show that our resulting attention maps serve more than just interpretability; they can be infused into the model learning process itself with new trainable constraints.
arXiv Detail & Related papers (2020-08-13T17:47:41Z) - Few-shot Visual Reasoning with Meta-analogical Contrastive Learning [141.2562447971]
We propose to solve a few-shot (or low-shot) visual reasoning problem, by resorting to analogical reasoning.
We extract structural relationships between elements in both domains, and enforce them to be as similar as possible with analogical learning.
We validate our method on RAVEN dataset, on which it outperforms state-of-the-art method, with larger gains when the training data is scarce.
arXiv Detail & Related papers (2020-07-23T14:00:34Z) - Multiscale Deep Equilibrium Models [162.15362280927476]
We propose a new class of implicit networks, the multiscale deep equilibrium model (MDEQ)
An MDEQ directly solves for and backpropagates through the equilibrium points of multiple feature resolutions simultaneously.
We illustrate the effectiveness of this approach on two large-scale vision tasks: ImageNet classification and semantic segmentation on high-resolution images from the Cityscapes dataset.
arXiv Detail & Related papers (2020-06-15T18:07:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.