Representational Alignment Across Model Layers and Brain Regions with Hierarchical Optimal Transport
- URL: http://arxiv.org/abs/2510.01706v1
- Date: Thu, 02 Oct 2025 06:25:06 GMT
- Title: Representational Alignment Across Model Layers and Brain Regions with Hierarchical Optimal Transport
- Authors: Shaan Shah, Meenakshi Khosla,
- Abstract summary: We propose a unified framework that infers soft, globally consistent layer-to-layer couplings and neuron-level transport plans.<n>We evaluate HOT on vision models, large language models, and human visual cortex recordings.
- Score: 2.4636535146231613
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Standard representational similarity methods align each layer of a network to its best match in another independently, producing asymmetric results, lacking a global alignment score, and struggling with networks of different depths. These limitations arise from ignoring global activation structure and restricting mappings to rigid one-to-one layer correspondences. We propose Hierarchical Optimal Transport (HOT), a unified framework that jointly infers soft, globally consistent layer-to-layer couplings and neuron-level transport plans. HOT allows source neurons to distribute mass across multiple target layers while minimizing total transport cost under marginal constraints. This yields both a single alignment score for the entire network comparison and a soft transport plan that naturally handles depth mismatches through mass distribution. We evaluate HOT on vision models, large language models, and human visual cortex recordings. Across all domains, HOT matches or surpasses standard pairwise matching in alignment quality. Moreover, it reveals smooth, fine-grained hierarchical correspondences: early layers map to early layers, deeper layers maintain relative positions, and depth mismatches are resolved by distributing representations across multiple layers. These structured patterns emerge naturally from global optimization without being imposed, yet are absent in greedy layer-wise methods. HOT thus enables richer, more interpretable comparisons between representations, particularly when networks differ in architecture or depth.
Related papers
- ParaUni: Enhance Generation in Unified Multimodal Model with Reinforcement-driven Hierarchical Parallel Information Interaction [55.21514454560188]
Unified multimodal models significantly improve visual generation by combining vision-grained models (VLMs) with diffusion models.<n>Existing methods struggle to fully balance sufficient interaction and flexible implementation due to vast representation difference.<n>We propose textbfParaUni, which extracts features from variants VLM's layers in a textbfParallel way for comprehensive information interaction.
arXiv Detail & Related papers (2025-12-05T04:41:57Z) - Perfect Clustering in Very Sparse Diverse Multiplex Networks [4.070200285321219]
The paper studies the DIverse MultiPLEx Signed Generalized Random Dot Product Graph (DIMPLE-SGRDPG) network model (Pensky (2024)<n>All layers can be partitioned into groups such that the layers in the same group are embedded in the same ambient subspace.<n>The key task in this model is to recover the groups of layers with unique subspace structures.
arXiv Detail & Related papers (2025-07-25T16:43:42Z) - Towards Optimal Customized Architecture for Heterogeneous Federated
Learning with Contrastive Cloud-Edge Model Decoupling [20.593232086762665]
Federated learning, as a promising distributed learning paradigm, enables collaborative training of a global model across multiple network edge clients without the need for central data collecting.
We propose a novel federated learning framework called FedCMD, a model decoupling tailored to the Cloud-edge supported federated learning.
Our motivation is that, by the deep investigation of the performance of selecting different neural network layers as the personalized head, we found rigidly assigning the last layer as the personalized head in current studies is not always optimal.
arXiv Detail & Related papers (2024-03-04T05:10:28Z) - Hierarchical Multi-Marginal Optimal Transport for Network Alignment [52.206006379563306]
Multi-network alignment is an essential prerequisite for joint learning on multiple networks.
We propose a hierarchical multi-marginal optimal transport framework named HOT for multi-network alignment.
Our proposed HOT achieves significant improvements over the state-of-the-art in both effectiveness and scalability.
arXiv Detail & Related papers (2023-10-06T02:35:35Z) - Federated Deep Equilibrium Learning: Harnessing Compact Global Representations to Enhance Personalization [23.340237814344377]
Federated Learning (FL) has emerged as a groundbreaking distributed learning paradigm enabling clients to train a global model collaboratively without exchanging data.
We introduce FeDEQ, a novel FL framework that incorporates deep equilibrium learning and consensus optimization to harness compact global data representations for efficient personalization.
We show that FeDEQ matches the performance of state-of-the-art personalized FL methods, while significantly reducing communication size by up to 4 times and memory footprint by 1.5 times during training.
arXiv Detail & Related papers (2023-09-27T13:48:12Z) - WLD-Reg: A Data-dependent Within-layer Diversity Regularizer [98.78384185493624]
Neural networks are composed of multiple layers arranged in a hierarchical structure jointly trained with a gradient-based optimization.
We propose to complement this traditional 'between-layer' feedback with additional 'within-layer' feedback to encourage the diversity of the activations within the same layer.
We present an extensive empirical study confirming that the proposed approach enhances the performance of several state-of-the-art neural network models in multiple tasks.
arXiv Detail & Related papers (2023-01-03T20:57:22Z) - On skip connections and normalisation layers in deep optimisation [32.51139594406463]
We introduce a general theoretical framework for the study of optimisation of deep neural networks.
Our framework determines the curvature and regularity properties of multilayer loss landscapes.
We identify a novel causal mechanism by which skip connections accelerate training.
arXiv Detail & Related papers (2022-10-10T06:22:46Z) - Global and Individualized Community Detection in Inhomogeneous
Multilayer Networks [14.191073951237772]
In network applications, it has become increasingly common to obtain datasets in the form of multiple networks observed on the same set of subjects.
Such datasets can be modeled by multilayer networks where each layer is a separate network itself while different layers are associated and share some common information.
The present paper studies community detection in a stylized yet informative inhomogeneous multilayer network model.
arXiv Detail & Related papers (2020-12-02T02:42:52Z) - Dual-constrained Deep Semi-Supervised Coupled Factorization Network with
Enriched Prior [80.5637175255349]
We propose a new enriched prior based Dual-constrained Deep Semi-Supervised Coupled Factorization Network, called DS2CF-Net.
To ex-tract hidden deep features, DS2CF-Net is modeled as a deep-structure and geometrical structure-constrained neural network.
Our network can obtain state-of-the-art performance for representation learning and clustering.
arXiv Detail & Related papers (2020-09-08T13:10:21Z) - Dynamic Hierarchical Mimicking Towards Consistent Optimization
Objectives [73.15276998621582]
We propose a generic feature learning mechanism to advance CNN training with enhanced generalization ability.
Partially inspired by DSN, we fork delicately designed side branches from the intermediate layers of a given neural network.
Experiments on both category and instance recognition tasks demonstrate the substantial improvements of our proposed method.
arXiv Detail & Related papers (2020-03-24T09:56:13Z) - Convolutional Networks with Dense Connectivity [59.30634544498946]
We introduce the Dense Convolutional Network (DenseNet), which connects each layer to every other layer in a feed-forward fashion.
For each layer, the feature-maps of all preceding layers are used as inputs, and its own feature-maps are used as inputs into all subsequent layers.
We evaluate our proposed architecture on four highly competitive object recognition benchmark tasks.
arXiv Detail & Related papers (2020-01-08T06:54:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.