Attention-Based Variational Framework for Joint and Individual Components Learning with Applications in Brain Network Analysis
- URL: http://arxiv.org/abs/2601.17073v1
- Date: Fri, 23 Jan 2026 00:28:43 GMT
- Title: Attention-Based Variational Framework for Joint and Individual Components Learning with Applications in Brain Network Analysis
- Authors: Yifei Zhang, Meimei Liu, Zhengwu Zhang,
- Abstract summary: Cross-Modal Joint-Individual Variational Network (CM-JIVNet) designed to learn factorized latent representations from paired SC-FC datasets.<n>Our model utilizes a multi-head attention fusion module to capture non-linear cross-modal dependencies while isolating independent, modality-specific signals.<n>By effectively disentangling joint and individual feature spaces, CM-JIVNet provides a robust, interpretable, and scalable solution for large-scale multimodal brain analysis.
- Score: 9.090595907330018
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Brain organization is increasingly characterized through multiple imaging modalities, most notably structural connectivity (SC) and functional connectivity (FC). Integrating these inherently distinct yet complementary data sources is essential for uncovering the cross-modal patterns that drive behavioral phenotypes. However, effective integration is hindered by the high dimensionality and non-linearity of connectome data, complex non-linear SC-FC coupling, and the challenge of disentangling shared information from modality-specific variations. To address these issues, we propose the Cross-Modal Joint-Individual Variational Network (CM-JIVNet), a unified probabilistic framework designed to learn factorized latent representations from paired SC-FC datasets. Our model utilizes a multi-head attention fusion module to capture non-linear cross-modal dependencies while isolating independent, modality-specific signals. Validated on Human Connectome Project Young Adult (HCP-YA) data, CM-JIVNet demonstrates superior performance in cross-modal reconstruction and behavioral trait prediction. By effectively disentangling joint and individual feature spaces, CM-JIVNet provides a robust, interpretable, and scalable solution for large-scale multimodal brain analysis.
Related papers
- Modality-Specific Enhancement and Complementary Fusion for Semi-Supervised Multi-Modal Brain Tumor Segmentation [6.302779966909783]
We propose a novel semi-supervised multi-modal framework for medical image segmentation.<n>We introduce a Modality-specific Enhancing Module (MEM) to strengthen semantic unique cues to each modality.<n>We also introduce a learnable Complementary Information Fusion (CIF) module to adaptively exchange complementary knowledge between modalities.
arXiv Detail & Related papers (2025-12-10T16:15:17Z) - Dual-level Modality Debiasing Learning for Unsupervised Visible-Infrared Person Re-Identification [59.59359638389348]
We propose a Dual-level Modality Debiasing Learning framework that implements debiasing at both the model and optimization levels.<n>Experiments on benchmark datasets demonstrate that DMDL could enable modality-invariant feature learning and a more generalized model.
arXiv Detail & Related papers (2025-12-03T12:43:16Z) - Cross-Modal Alignment via Variational Copula Modelling [54.25504956780864]
It is essential to develop multimodal learning methods to aggregate various information from multiple modalities.<n>Existing methods mainly rely on concatenation or the Kronecker product, oversimplifying the interaction structure between modalities.<n>We propose a novel copula-driven multimodal learning framework, which focuses on learning the joint distribution of various modalities.
arXiv Detail & Related papers (2025-11-05T05:28:28Z) - scMRDR: A scalable and flexible framework for unpaired single-cell multi-omics data integration [53.683726781791385]
We introduce a scalable and flexible generative framework called single-cell Multi-omics Regularized Disentangled Representations (scMRDR) for unpaired multi-omics integration.<n>Our method achieves excellent performance on benchmark datasets in terms of batch correction, modality alignment, and biological signal preservation.
arXiv Detail & Related papers (2025-10-28T21:28:39Z) - NExT-OMNI: Towards Any-to-Any Omnimodal Foundation Models with Discrete Flow Matching [64.10695425442164]
We introduce NExT-OMNI, an open-source omnimodal foundation model that achieves unified modeling through discrete flow paradigms.<n>Trained on large-scale interleaved text, image, video, and audio data, NExT-OMNI delivers competitive performance on multimodal generation and understanding benchmarks.<n>To advance further research, we release training details, data protocols, and open-source both the code and model checkpoints.
arXiv Detail & Related papers (2025-10-15T16:25:18Z) - MoCA: Multi-modal Cross-masked Autoencoder for Digital Health Measurements [2.8493802389913694]
We propose the Multi-modal Cross-masked Autoencoder (MoCA), a self-supervised learning framework that combines transformer architecture with masked autoencoder (MAE) methodology.<n>MoCA demonstrates strong performance boosts across reconstruction and downstream classification tasks on diverse benchmark datasets.<n>Our approach offers a novel solution for leveraging unlabeled multi-modal wearable data while handling missing modalities, with broad applications across digital health domains.
arXiv Detail & Related papers (2025-06-02T21:07:25Z) - Selective Complementary Feature Fusion and Modal Feature Compression Interaction for Brain Tumor Segmentation [14.457627015612827]
We propose a complementary feature compression interaction network (CFCI-Net), which realizes the complementary fusion and compression interaction of multi-modal feature information.<n>CFCI-Net achieves superior results compared to state-of-the-art models.
arXiv Detail & Related papers (2025-03-20T13:52:51Z) - Measuring Cross-Modal Interactions in Multimodal Models [9.862551438475666]
Existing AI methods fail to capture cross-modal interactions crucial for understanding the combined impact of multiple data sources.<n>This paper introduces InterSHAP, a cross-modal interaction score that addresses the limitations of existing approaches.<n>We show that InterSHAP accurately measures the presence of cross-modal interactions, can handle multiple modalities, and provides detailed explanations at a local level for individual samples.
arXiv Detail & Related papers (2024-12-20T12:11:20Z) - Online Multi-modal Root Cause Analysis [61.94987309148539]
Root Cause Analysis (RCA) is essential for pinpointing the root causes of failures in microservice systems.
Existing online RCA methods handle only single-modal data overlooking, complex interactions in multi-modal systems.
We introduce OCEAN, a novel online multi-modal causal structure learning method for root cause localization.
arXiv Detail & Related papers (2024-10-13T21:47:36Z) - HEALNet: Multimodal Fusion for Heterogeneous Biomedical Data [10.774128925670183]
This paper presents the Hybrid Early-fusion Attention Learning Network (HEALNet), a flexible multimodal fusion architecture.
We conduct multimodal survival analysis on Whole Slide Images and Multi-omic data on four cancer datasets from The Cancer Genome Atlas (TCGA)
HEALNet achieves state-of-the-art performance compared to other end-to-end trained fusion models.
arXiv Detail & Related papers (2023-11-15T17:06:26Z) - Learning Multiscale Consistency for Self-supervised Electron Microscopy
Instance Segmentation [48.267001230607306]
We propose a pretraining framework that enhances multiscale consistency in EM volumes.
Our approach leverages a Siamese network architecture, integrating strong and weak data augmentations.
It effectively captures voxel and feature consistency, showing promise for learning transferable representations for EM analysis.
arXiv Detail & Related papers (2023-08-19T05:49:13Z) - Bi-Bimodal Modality Fusion for Correlation-Controlled Multimodal
Sentiment Analysis [96.46952672172021]
Bi-Bimodal Fusion Network (BBFN) is a novel end-to-end network that performs fusion on pairwise modality representations.
Model takes two bimodal pairs as input due to known information imbalance among modalities.
arXiv Detail & Related papers (2021-07-28T23:33:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.