Adaptive Redundancy Regulation for Balanced Multimodal Information Refinement
- URL: http://arxiv.org/abs/2511.13755v1
- Date: Fri, 14 Nov 2025 04:44:34 GMT
- Title: Adaptive Redundancy Regulation for Balanced Multimodal Information Refinement
- Authors: Zhe Yang, Wenrui Li, Hongtao Chen, Penghong Wang, Ruiqin Xiong, Xiaopeng Fan,
- Abstract summary: Long-term dominance of the dominant modality weakens representation-output coupling.<n>Previous methods often directly and uniformly adjust the gradients of the advantaged modality.<n>We propose Adaptive Redundancy Regulation for Balanced Multimodal Information Refinement.
- Score: 49.596978957463385
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multimodal learning aims to improve performance by leveraging data from multiple sources. During joint multimodal training, due to modality bias, the advantaged modality often dominates backpropagation, leading to imbalanced optimization. Existing methods still face two problems: First, the long-term dominance of the dominant modality weakens representation-output coupling in the late stages of training, resulting in the accumulation of redundant information. Second, previous methods often directly and uniformly adjust the gradients of the advantaged modality, ignoring the semantics and directionality between modalities. To address these limitations, we propose Adaptive Redundancy Regulation for Balanced Multimodal Information Refinement (RedReg), which is inspired by information bottleneck principle. Specifically, we construct a redundancy phase monitor that uses a joint criterion of effective gain growth rate and redundancy to trigger intervention only when redundancy is high. Furthermore, we design a co-information gating mechanism to estimate the contribution of the current dominant modality based on cross-modal semantics. When the task primarily relies on a single modality, the suppression term is automatically disabled to preserve modality-specific information. Finally, we project the gradient of the dominant modality onto the orthogonal complement of the joint multimodal gradient subspace and suppress the gradient according to redundancy. Experiments show that our method demonstrates superiority among current major methods in most scenarios. Ablation experiments verify the effectiveness of our method. The code is available at https://github.com/xia-zhe/RedReg.git
Related papers
- CaReFlow: Cyclic Adaptive Rectified Flow for Multimodal Fusion [6.3310165899037045]
Modality gap significantly restricts the effectiveness of multimodal fusion.<n>Previous methods often use techniques such as diffusion models and adversarial learning to reduce the modality gap.
arXiv Detail & Related papers (2026-02-22T12:12:05Z) - Dual-Stream Cross-Modal Representation Learning via Residual Semantic Decorrelation [5.272868130772015]
Cross-modal representations often suffer from modality dominance, redundant information coupling, and spurious cross-modal correlations.<n>We propose a Dual-Stream Residual Semantic Decorrelation Network (DSRSD-Net) to disentangle modality-specific and modality-shared information.
arXiv Detail & Related papers (2025-12-08T14:01:16Z) - Modality-Balanced Collaborative Distillation for Multi-Modal Domain Generalization [72.83292830785336]
Weight Averaging (WA) has emerged as a powerful technique for enhancing generalization by promoting convergence to a flat loss landscape.<n>We propose MBCD, a unified collaborative distillation framework that retains WA's flatness-inducing advantages while overcoming its shortcomings in multi-modal contexts.
arXiv Detail & Related papers (2025-11-25T12:38:28Z) - Calibrated Multimodal Representation Learning with Missing Modalities [100.55774771852468]
Multimodal representation learning harmonizes distinct modalities by aligning them into a unified latent space.<n>Recent research generalizes traditional cross-modal alignment to produce enhanced multimodal synergy but requires all modalities to be present for a common instance.<n>We provide theoretical insights into this issue from an anchor shift perspective.<n>We propose CalMRL for multimodal representation learning to calibrate incomplete alignments caused by missing modalities.
arXiv Detail & Related papers (2025-11-15T05:01:43Z) - Principled Multimodal Representation Learning [99.53621521696051]
Multimodal representation learning seeks to create a unified representation space by integrating diverse data modalities.<n>Recent advances have investigated the simultaneous alignment of multiple modalities, yet several challenges remain.<n>We propose Principled Multimodal Representation Learning (PMRL), a novel framework that achieves simultaneous alignment of multiple modalities.
arXiv Detail & Related papers (2025-07-23T09:12:25Z) - Rethinking Explainability in the Era of Multimodal AI [9.57008593971486]
multimodal AI systems have become ubiquitous and achieved remarkable performance across high-stakes applications.<n>Most existing explainability techniques remain unimodal, generating modality-specific feature attributions, concepts, or circuit traces in isolation.<n>This paper argues that such unimodal explanations systematically misrepresent and fail to capture the cross-modal influence that drives multimodal model decisions.
arXiv Detail & Related papers (2025-06-16T03:08:29Z) - RESTORE: Towards Feature Shift for Vision-Language Prompt Learning [33.13407089704543]
We show that prompt tuning along only one branch of CLIP is the reason why the misalignment occurs.
Without proper regularization across the learnable parameters in different modalities, prompt learning violates the original pre-training constraints.
We propose RESTORE, a multi-modal prompt learning method that exerts explicit constraints on cross-modal consistency.
arXiv Detail & Related papers (2024-03-10T08:52:48Z) - Understanding and Constructing Latent Modality Structures in Multi-modal
Representation Learning [53.68371566336254]
We argue that the key to better performance lies in meaningful latent modality structures instead of perfect modality alignment.
Specifically, we design 1) a deep feature separation loss for intra-modality regularization; 2) a Brownian-bridge loss for inter-modality regularization; and 3) a geometric consistency loss for both intra- and inter-modality regularization.
arXiv Detail & Related papers (2023-03-10T14:38:49Z) - Revisiting Modality Imbalance In Multimodal Pedestrian Detection [6.7841188753203046]
We introduce a novel training setup with regularizer in the multimodal architecture to resolve the problem of this disparity between the modalities.
Specifically, our regularizer term helps to make the feature fusion method more robust by considering both the feature extractors equivalently important during the training.
arXiv Detail & Related papers (2023-02-24T11:56:57Z) - Learning Discriminative Shrinkage Deep Networks for Image Deconvolution [122.79108159874426]
We propose an effective non-blind deconvolution approach by learning discriminative shrinkage functions to implicitly model these terms.
Experimental results show that the proposed method performs favorably against the state-of-the-art ones in terms of efficiency and accuracy.
arXiv Detail & Related papers (2021-11-27T12:12:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.