Related papers: Beyond Accuracy: Uncovering the Role of Similarity Perception and its Alignment with Semantics in Supervised Learning

Beyond Accuracy: Uncovering the Role of Similarity Perception and its Alignment with Semantics in Supervised Learning

URL: http://arxiv.org/abs/2505.21338v1
Date: Tue, 27 May 2025 15:32:10 GMT
Title: Beyond Accuracy: Uncovering the Role of Similarity Perception and its Alignment with Semantics in Supervised Learning
Authors: Katarzyna Filus, Mateusz Żarski,
Abstract summary: We introduce Deep Similarity Inspector (DSI) -- a systematic framework to inspect how deep vision networks develop their similarity perception.<n>Our experiments show that both Convolutional Neural Networks' (CNNs) and Vision Transformers' (ViTs) develop a rich similarity perception during training with 3 phases.
Score: 0.0
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Similarity manifests in various forms, including semantic similarity that is particularly important, serving as an approximation of human object categorization based on e.g. shared functionalities and evolutionary traits. It also offers practical advantages in computational modeling via lexical structures such as WordNet with constant and interpretable similarity. As in the domain of deep vision, there is still not enough focus on the phenomena regarding the similarity perception emergence. We introduce Deep Similarity Inspector (DSI) -- a systematic framework to inspect how deep vision networks develop their similarity perception and its alignment with semantic similarity. Our experiments show that both Convolutional Neural Networks' (CNNs) and Vision Transformers' (ViTs) develop a rich similarity perception during training with 3 phases (initial similarity surge, refinement, stabilization), with clear differences between CNNs and ViTs. Besides the gradual mistakes elimination, the mistakes refinement phenomenon can be observed.

Related papers

Representations in vision and language converge in a shared, multidimensional space of perceived similarities [0.0]
We show that visual and linguistic similarity judgements converge at the behavioural level.<n>We also predict a remarkably similar network of fMRI brain responses evoked by viewing the natural scene images.<n>These findings demonstrate that human visual and linguistic similarity judgements are grounded in a shared, modality-agnostic representational structure.
arXiv Detail & Related papers (2025-07-29T14:42:31Z)
Convergent transformations of visual representation in brains and models [0.0]
A fundamental question in cognitive neuroscience is what shapes visual perception: the external world's structure or the brain's internal architecture.<n>We show a convergent computational solution for visual encoding in both human and artificial vision, driven by the structure of the external world.
arXiv Detail & Related papers (2025-07-18T14:13:54Z)
The Butterfly Effect: Neural Network Training Trajectories Are Highly Sensitive to Initial Conditions [51.68215326304272]
We show that even small perturbations reliably cause otherwise identical training trajectories to diverge-an effect that diminishes rapidly over training time.<n>Our findings provide insights into neural network training stability, with practical implications for fine-tuning, model merging, and diversity of model ensembles.
arXiv Detail & Related papers (2025-06-16T08:35:16Z)
Decoupling Semantic Similarity from Spatial Alignment for Neural Networks [4.801683210246596]
We argue that the spatial location of semantic objects does neither influence human perception nor deep learning classifiers. This should be reflected in the definition of similarity between image responses for computer vision systems. We measure semantic similarity between input responses by formulating it as a set-matching problem.
arXiv Detail & Related papers (2024-10-30T15:17:58Z)
When Does Perceptual Alignment Benefit Vision Representations? [76.32336818860965]
We investigate how aligning vision model representations to human perceptual judgments impacts their usability. We find that aligning models to perceptual judgments yields representations that improve upon the original backbones across many downstream tasks. Our results suggest that injecting an inductive bias about human perceptual knowledge into vision models can contribute to better representations.
arXiv Detail & Related papers (2024-10-14T17:59:58Z)
Hierarchical Invariance for Robust and Interpretable Vision Tasks at Larger Scales [54.78115855552886]
We show how to construct over-complete invariants with a Convolutional Neural Networks (CNN)-like hierarchical architecture. With the over-completeness, discriminative features w.r.t. the task can be adaptively formed in a Neural Architecture Search (NAS)-like manner. For robust and interpretable vision tasks at larger scales, hierarchical invariant representation can be considered as an effective alternative to traditional CNN and invariants.
arXiv Detail & Related papers (2024-02-23T16:50:07Z)
Going Beyond Neural Network Feature Similarity: The Network Feature Complexity and Its Interpretation Using Category Theory [64.06519549649495]
We provide the definition of what we call functionally equivalent features. These features produce equivalent output under certain transformations. We propose an efficient algorithm named Iterative Feature Merging.
arXiv Detail & Related papers (2023-10-10T16:27:12Z)
Similarity of Neural Architectures using Adversarial Attack Transferability [47.66096554602005]
We design a quantitative and scalable similarity measure between neural architectures. We conduct a large-scale analysis on 69 state-of-the-art ImageNet classifiers. Our results provide insights into why developing diverse neural architectures with distinct components is necessary.
arXiv Detail & Related papers (2022-10-20T16:56:47Z)
Attributable Visual Similarity Learning [90.69718495533144]
This paper proposes an attributable visual similarity learning (AVSL) framework for a more accurate and explainable similarity measure between images. Motivated by the human semantic similarity cognition, we propose a generalized similarity learning paradigm to represent the similarity between two images with a graph. Experiments on the CUB-200-2011, Cars196, and Stanford Online Products datasets demonstrate significant improvements over existing deep similarity learning methods.
arXiv Detail & Related papers (2022-03-28T17:35:31Z)
Weak Augmentation Guided Relational Self-Supervised Learning [80.0680103295137]
We introduce a novel relational self-supervised learning (ReSSL) framework that learns representations by modeling the relationship between different instances. Our proposed method employs sharpened distribution of pairwise similarities among different instances as textitrelation metric. Experimental results show that our proposed ReSSL substantially outperforms the state-of-the-art methods across different network architectures.
arXiv Detail & Related papers (2022-03-16T16:14:19Z)
Deconfounded Representation Similarity for Comparison of Neural Networks [16.23053104309891]
Similarity metrics are confounded by the population structure of data items in the input space. We show that deconfounding the similarity metrics increases the resolution of detecting semantically similar neural networks.
arXiv Detail & Related papers (2022-01-31T21:25:02Z)
Comparing Deep Neural Nets with UMAP Tour [12.910602784766562]
UMAP Tour is built to visually inspect and compare internal behavior of real-world neural network models. We find concepts learned in state-of-the-art models and dissimilarities between them, such as GoogLeNet and ResNet.
arXiv Detail & Related papers (2021-10-18T15:59:13Z)
Contrastive Similarity Matching for Supervised Learning [13.750624267664156]
We propose a biologically-plausible solution to the credit assignment problem motivated by observations in the ventral visual pathway and trained deep neural networks. In both, representations of objects in the same category become progressively more similar, while objects belonging to different categories become less similar. We formulate this idea using a contrastive similarity matching objective function and derive from it deep neural networks with feedforward, lateral, and feedback connections.
arXiv Detail & Related papers (2020-02-24T17:10:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.