A Data-driven Typology of Vision Models from Integrated Representational Metrics
- URL: http://arxiv.org/abs/2509.21628v1
- Date: Thu, 25 Sep 2025 21:46:09 GMT
- Title: A Data-driven Typology of Vision Models from Integrated Representational Metrics
- Authors: Jialin Wu, Shreya Saha, Yiqing Bo, Meenakshi Khosla,
- Abstract summary: Large vision models differ widely in architecture and training paradigm, yet we lack principled methods to determine which aspects of their representations are shared across families.<n>We leverage a suite of representational similarity metrics, each capturing a different facet-geometry, unit tuning, or linear decodability-and assess family separability.<n>We adapt Similarity Network Fusion (SNF), a method inspired by multi-omics integration, to integrate these complementary facets.
- Score: 8.045700364123645
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large vision models differ widely in architecture and training paradigm, yet we lack principled methods to determine which aspects of their representations are shared across families and which reflect distinctive computational strategies. We leverage a suite of representational similarity metrics, each capturing a different facet-geometry, unit tuning, or linear decodability-and assess family separability using multiple complementary measures. Metrics preserving geometry or tuning (e.g., RSA, Soft Matching) yield strong family discrimination, whereas flexible mappings such as Linear Predictivity show weaker separation. These findings indicate that geometry and tuning carry family-specific signatures, while linearly decodable information is more broadly shared. To integrate these complementary facets, we adapt Similarity Network Fusion (SNF), a method inspired by multi-omics integration. SNF achieves substantially sharper family separation than any individual metric and produces robust composite signatures. Clustering of the fused similarity matrix recovers both expected and surprising patterns: supervised ResNets and ViTs form distinct clusters, yet all self-supervised models group together across architectural boundaries. Hybrid architectures (ConvNeXt, Swin) cluster with masked autoencoders, suggesting convergence between architectural modernization and reconstruction-based training. This biology-inspired framework provides a principled typology of vision models, showing that emergent computational strategies-shaped jointly by architecture and training objective-define representational structure beyond surface design categories.
Related papers
- Bridging Structure and Appearance: Topological Features for Robust Self-Supervised Segmentation [8.584363058858935]
Self-supervised semantic segmentation methods often fail when faced with appearance ambiguities.<n>We argue that this is due to an over-reliance on unstable, appearance-based features such as shadows, glare, and local textures.<n>We propose textbfGASeg, a novel framework that bridges appearance and geometry by leveraging stable topological information.
arXiv Detail & Related papers (2025-12-30T05:34:28Z) - Multi-label Classification with Panoptic Context Aggregation Networks [61.82285737410154]
This paper introduces the Deep Panoptic Context Aggregation Network (PanCAN), a novel approach that hierarchically integrates multi-order geometric contexts.<n>PanCAN learns multi-order neighborhood relationships at each scale by combining random walks with an attention mechanism.<n>Experiments on NUS-WIDE, PASCAL VOC,2007, and MS-COCO benchmarks demonstrate that PanCAN consistently achieves competitive results.
arXiv Detail & Related papers (2025-12-29T14:16:21Z) - Integrated representational signatures strengthen specificity in brains and models [8.045700364123645]
Similarity Network Fusion (SNF) is a framework originally developed for multi-omics data integration.<n>SNF produces substantially sharper regional and model family-level separation than any single metric.<n>Clustering cortical regions using SNF-derived similarity scores reveals a clearer hierarchical organization.
arXiv Detail & Related papers (2025-10-21T04:37:27Z) - Geometric Embedding Alignment via Curvature Matching in Transfer Learning [4.739852004969771]
We introduce a novel approach to integrate multiple models into a unified transfer learning framework.<n>By aligning the Ricci curvature of latent space of individual models, we construct an interrelated architecture.<n>This framework enables the effective aggregation of knowledge from diverse sources, thereby improving performance on target tasks.
arXiv Detail & Related papers (2025-06-16T00:54:22Z) - Exploring Synergistic Ensemble Learning: Uniting CNNs, MLP-Mixers, and Vision Transformers to Enhance Image Classification [2.907712261410302]
We build upon and improve previous work exploring the complementarity between different architectures.<n>We preserve the integrity of each architecture and combine them using ensemble techniques.<n>A direct outcome of this work is the creation of an ensemble of classification networks that surpasses the accuracy of the previous state-of-the-art single classification network on ImageNet.
arXiv Detail & Related papers (2025-04-12T04:32:52Z) - Mesh Mamba: A Unified State Space Model for Saliency Prediction in Non-Textured and Textured Meshes [50.23625950905638]
Mesh saliency enhances the adaptability of 3D vision by identifying and emphasizing regions that naturally attract visual attention.<n>We introduce mesh Mamba, a unified saliency prediction model based on a state space model (SSM)<n>Mesh Mamba effectively analyzes the geometric structure of the mesh while seamlessly incorporating texture features into the topological framework.
arXiv Detail & Related papers (2025-04-02T08:22:25Z) - Bayesian Unsupervised Disentanglement of Anatomy and Geometry for Deep Groupwise Image Registration [59.062085785106234]
This article presents a general Bayesian learning framework for multi-modal groupwise image registration.<n>We propose a novel hierarchical variational auto-encoding architecture to realise the inference procedure of the latent variables.<n>Experiments were conducted to validate the proposed framework, including four different datasets from cardiac, brain, and abdominal medical images.
arXiv Detail & Related papers (2024-01-04T08:46:39Z) - Enhancing Representations through Heterogeneous Self-Supervised Learning [61.40674648939691]
We propose Heterogeneous Self-Supervised Learning (HSSL), which enforces a base model to learn from an auxiliary head whose architecture is heterogeneous from the base model.
The HSSL endows the base model with new characteristics in a representation learning way without structural changes.
The HSSL is compatible with various self-supervised methods, achieving superior performances on various downstream tasks.
arXiv Detail & Related papers (2023-10-08T10:44:05Z) - On the Symmetries of Deep Learning Models and their Internal
Representations [1.418465438044804]
We seek to connect the symmetries arising from the architecture of a family of models with the symmetries of that family's internal representation of data.
Our work suggests that the symmetries of a network are propagated into the symmetries in that network's representation of data.
arXiv Detail & Related papers (2022-05-27T22:29:08Z) - Deep Relational Metric Learning [84.95793654872399]
This paper presents a deep relational metric learning framework for image clustering and retrieval.
We learn an ensemble of features that characterizes an image from different aspects to model both interclass and intraclass distributions.
Experiments on the widely-used CUB-200-2011, Cars196, and Stanford Online Products datasets demonstrate that our framework improves existing deep metric learning methods and achieves very competitive results.
arXiv Detail & Related papers (2021-08-23T09:31:18Z) - Hermitian Symmetric Spaces for Graph Embeddings [0.0]
We learn continuous representations of graphs in spaces of symmetric matrices over C.
These spaces offer a rich geometry that simultaneously admits hyperbolic and Euclidean subspaces.
The proposed models are able to automatically adapt to very dissimilar arrangements without any apriori estimates of graph features.
arXiv Detail & Related papers (2021-05-11T18:14:52Z) - Tensor Graph Convolutional Networks for Multi-relational and Robust
Learning [74.05478502080658]
This paper introduces a tensor-graph convolutional network (TGCN) for scalable semi-supervised learning (SSL) from data associated with a collection of graphs, that are represented by a tensor.
The proposed architecture achieves markedly improved performance relative to standard GCNs, copes with state-of-the-art adversarial attacks, and leads to remarkable SSL performance over protein-to-protein interaction networks.
arXiv Detail & Related papers (2020-03-15T02:33:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.