Modality-Balanced Collaborative Distillation for Multi-Modal Domain Generalization
- URL: http://arxiv.org/abs/2511.20258v1
- Date: Tue, 25 Nov 2025 12:38:28 GMT
- Title: Modality-Balanced Collaborative Distillation for Multi-Modal Domain Generalization
- Authors: Xiaohan Wang, Zhangtao Cheng, Ting Zhong, Leiting Chen, Fan Zhou,
- Abstract summary: Weight Averaging (WA) has emerged as a powerful technique for enhancing generalization by promoting convergence to a flat loss landscape.<n>We propose MBCD, a unified collaborative distillation framework that retains WA's flatness-inducing advantages while overcoming its shortcomings in multi-modal contexts.
- Score: 72.83292830785336
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Weight Averaging (WA) has emerged as a powerful technique for enhancing generalization by promoting convergence to a flat loss landscape, which correlates with stronger out-of-distribution performance. However, applying WA directly to multi-modal domain generalization (MMDG) is challenging: differences in optimization speed across modalities lead WA to overfit to faster-converging ones in early stages, suppressing the contribution of slower yet complementary modalities, thereby hindering effective modality fusion and skewing the loss surface toward sharper, less generalizable minima. To address this issue, we propose MBCD, a unified collaborative distillation framework that retains WA's flatness-inducing advantages while overcoming its shortcomings in multi-modal contexts. MBCD begins with adaptive modality dropout in the student model to curb early-stage bias toward dominant modalities. A gradient consistency constraint then aligns learning signals between uni-modal branches and the fused representation, encouraging coordinated and smoother optimization. Finally, a WA-based teacher conducts cross-modal distillation by transferring fused knowledge to each uni-modal branch, which strengthens cross-modal interactions and steer convergence toward flatter solutions. Extensive experiments on MMDG benchmarks show that MBCD consistently outperforms existing methods, achieving superior accuracy and robustness across diverse unseen domains.
Related papers
- Plug, Play, and Fortify: A Low-Cost Module for Robust Multimodal Image Understanding Models [6.350443894942629]
Multimodal Weight Allocation Module (MWAM) is a plug-and-play component that dynamically re-balances the contribution of each branch during training.<n>MWAM delivers consistent performance gains across a wide range of tasks and modality combinations.
arXiv Detail & Related papers (2026-02-26T05:51:41Z) - Calibrated Multimodal Representation Learning with Missing Modalities [100.55774771852468]
Multimodal representation learning harmonizes distinct modalities by aligning them into a unified latent space.<n>Recent research generalizes traditional cross-modal alignment to produce enhanced multimodal synergy but requires all modalities to be present for a common instance.<n>We provide theoretical insights into this issue from an anchor shift perspective.<n>We propose CalMRL for multimodal representation learning to calibrate incomplete alignments caused by missing modalities.
arXiv Detail & Related papers (2025-11-15T05:01:43Z) - Adaptive Redundancy Regulation for Balanced Multimodal Information Refinement [49.596978957463385]
Long-term dominance of the dominant modality weakens representation-output coupling.<n>Previous methods often directly and uniformly adjust the gradients of the advantaged modality.<n>We propose Adaptive Redundancy Regulation for Balanced Multimodal Information Refinement.
arXiv Detail & Related papers (2025-11-14T04:44:34Z) - Improving Multimodal Sentiment Analysis via Modality Optimization and Dynamic Primary Modality Selection [54.10252086842123]
Multimodal Sentiment Analysis (MSA) aims to predict sentiment from language, acoustic, and visual data in videos.<n>This paper proposes a modality optimization and dynamic primary modality selection framework (MODS)<n>Experiments on four benchmark datasets demonstrate that MODS outperforms state-of-the-art methods.
arXiv Detail & Related papers (2025-11-09T11:13:32Z) - Robust Multimodal Semantic Segmentation with Balanced Modality Contributions [13.322334965026684]
We propose EQUISeg, a framework that balances modality contributions through equal encoding of modalities.<n>We show that EQUISeg achieves significant performance gains and effectively alleviates the adverse effects of modality imbalance in segmentation tasks.
arXiv Detail & Related papers (2025-09-29T09:19:10Z) - AIM: Adaptive Intra-Network Modulation for Balanced Multimodal Learning [55.56234913868664]
We propose Adaptive Intra-Network Modulation (AIM) to improve balanced modality learning.<n>AIM accounts for differences in optimization state across parameters and depths within the network during modulation.<n>We show that AIM outperforms state-of-the-art imbalanced modality learning methods across multiple benchmarks.
arXiv Detail & Related papers (2025-08-27T10:53:36Z) - Learning to Fuse: Modality-Aware Adaptive Scheduling for Robust Multimodal Foundation Models [0.0]
Modality-Aware Adaptive Fusion Scheduling (MA-AFS) learns to dynamically modulate the contribution of each modality on a per-instance basis.<n>Our work highlights the importance of adaptive fusion and opens a promising direction toward reliable and uncertainty-aware multimodal learning.
arXiv Detail & Related papers (2025-06-15T05:57:45Z) - Beyond Modality Collapse: Representations Blending for Multimodal Dataset Distillation [41.00894254469267]
We introduce textbfRepBlend, a novel MDD framework that weakens overdominant cross-modal supervision via representation blending.<n> Experiments on Flickr-30K and MS-COCO show that RepBlend consistently outperforms prior state-of-the-art MDD methods.
arXiv Detail & Related papers (2025-05-16T03:00:56Z) - On-the-fly Modulation for Balanced Multimodal Learning [53.616094855778954]
Multimodal learning is expected to boost model performance by integrating information from different modalities.
The widely-used joint training strategy leads to imbalanced and under-optimized uni-modal representations.
We propose On-the-fly Prediction Modulation (OPM) and On-the-fly Gradient Modulation (OGM) strategies to modulate the optimization of each modality.
arXiv Detail & Related papers (2024-10-15T13:15:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.