Balanced Multimodal Learning: An Unidirectional Dynamic Interaction Perspective
- URL: http://arxiv.org/abs/2509.02281v3
- Date: Fri, 05 Sep 2025 05:05:05 GMT
- Title: Balanced Multimodal Learning: An Unidirectional Dynamic Interaction Perspective
- Authors: Shijie Wang, Li Zhang, Xinyan Liang, Yuhua Qian, Shen Hu,
- Abstract summary: Multimodal learning typically utilizes multimodal joint loss to integrate different modalities and enhance model performance.<n>Existing strategies such as dynamic loss weighting, auxiliary objectives and gradient modulation mitigate modality imbalance based on joint loss.<n>We introduce Unidirectional Dynamic Interaction (UDI), a novel strategy that abandons the conventional joint loss in favor of a proactive, sequential training scheme.
- Score: 41.26526184466649
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multimodal learning typically utilizes multimodal joint loss to integrate different modalities and enhance model performance. However, this joint learning strategy can induce modality imbalance, where strong modalities overwhelm weaker ones and limit exploitation of individual information from each modality and the inter-modality interaction information. Existing strategies such as dynamic loss weighting, auxiliary objectives and gradient modulation mitigate modality imbalance based on joint loss. These methods remain fundamentally reactive, detecting and correcting imbalance after it arises, while leaving the competitive nature of the joint loss untouched. This limitation drives us to explore a new strategy for multimodal imbalance learning that does not rely on the joint loss, enabling more effective interactions between modalities and better utilization of information from individual modalities and their interactions. In this paper, we introduce Unidirectional Dynamic Interaction (UDI), a novel strategy that abandons the conventional joint loss in favor of a proactive, sequential training scheme. UDI first trains the anchor modality to convergence, then uses its learned representations to guide the other modality via unsupervised loss. Furthermore, the dynamic adjustment of modality interactions allows the model to adapt to the task at hand, ensuring that each modality contributes optimally. By decoupling modality optimization and enabling directed information flow, UDI prevents domination by any single modality and fosters effective cross-modal feature learning. Our experimental results demonstrate that UDI outperforms existing methods in handling modality imbalance, leading to performance improvement in multimodal learning tasks.
Related papers
- Multimodal Classification via Total Correlation Maximization [11.720319082362629]
Multimodal learning integrates data from diverse sensors to harness information from different modalities.<n>Recent studies reveal that joint learning often overfits certain modalities while neglecting others, leading to performance inferior to that of unimodal learning.<n>We propose a method for multimodal classification by maximizing the total correlation between multimodal features and labels.
arXiv Detail & Related papers (2026-02-13T15:21:45Z) - Balanced Multimodal Learning via Mutual Information [1.9336815376402718]
We propose a novel unified framework designed to address modality imbalance by utilizing mutual information to quantify interactions between modalities.<n>Our approach adopts a balanced multimodal learning strategy comprising two key stages: cross-modal knowledge distillation (KD) and a multitask-like training paradigm.
arXiv Detail & Related papers (2025-11-02T15:58:05Z) - MILES: Modality-Informed Learning Rate Scheduler for Balancing Multimodal Learning [47.487732221767196]
We present Modality-Informed Learning ratE Scheduler (MILES) for training multimodal joint fusion models.<n>MILES balances modality-wise conditional utilization rates during training to effectively balance multimodal learning.<n>Our results show that MILES outperforms all baselines across all tasks and fusion methods considered in our study.
arXiv Detail & Related papers (2025-10-20T10:34:59Z) - AIM: Adaptive Intra-Network Modulation for Balanced Multimodal Learning [55.56234913868664]
We propose Adaptive Intra-Network Modulation (AIM) to improve balanced modality learning.<n>AIM accounts for differences in optimization state across parameters and depths within the network during modulation.<n>We show that AIM outperforms state-of-the-art imbalanced modality learning methods across multiple benchmarks.
arXiv Detail & Related papers (2025-08-27T10:53:36Z) - Asymmetric Reinforcing against Multi-modal Representation Bias [59.685072206359855]
We propose an Asymmetric Reinforcing method against Multimodal representation bias (ARM)<n>Our ARM dynamically reinforces the weak modalities while maintaining the ability to represent dominant modalities through conditional mutual information.<n>We have significantly improved the performance of multimodal learning, making notable progress in mitigating imbalanced multimodal learning.
arXiv Detail & Related papers (2025-01-02T13:00:06Z) - Multimodal Fusion Balancing Through Game-Theoretic Regularization [22.959030061257533]
We show that current balancing methods struggle to train multimodal models that surpass even simple baselines, such as ensembles.<n>This raises the question: how can we ensure that all modalities in multimodal training are sufficiently trained, and that learning from new modalities consistently improves performance?<n>This paper proposes the Multimodal Competition Regularizer (MCR), a new loss component inspired by mutual information (MI) decomposition.
arXiv Detail & Related papers (2024-11-11T19:53:05Z) - On-the-fly Modulation for Balanced Multimodal Learning [53.616094855778954]
Multimodal learning is expected to boost model performance by integrating information from different modalities.
The widely-used joint training strategy leads to imbalanced and under-optimized uni-modal representations.
We propose On-the-fly Prediction Modulation (OPM) and On-the-fly Gradient Modulation (OGM) strategies to modulate the optimization of each modality.
arXiv Detail & Related papers (2024-10-15T13:15:50Z) - Multimodal Classification via Modal-Aware Interactive Enhancement [6.621745547882088]
We propose a novel multimodal learning method, called modal-aware interactive enhancement (MIE)
Specifically, we first utilize an optimization strategy based on sharpness aware minimization (SAM) to smooth the learning objective during the forward phase.
Then, with the help of the geometry property of SAM, we propose a gradient modification strategy to impose the influence between different modalities during the backward phase.
arXiv Detail & Related papers (2024-07-05T15:32:07Z) - Multimodal Representation Learning by Alternating Unimodal Adaptation [73.15829571740866]
We propose MLA (Multimodal Learning with Alternating Unimodal Adaptation) to overcome challenges where some modalities appear more dominant than others during multimodal learning.
MLA reframes the conventional joint multimodal learning process by transforming it into an alternating unimodal learning process.
It captures cross-modal interactions through a shared head, which undergoes continuous optimization across different modalities.
Experiments are conducted on five diverse datasets, encompassing scenarios with complete modalities and scenarios with missing modalities.
arXiv Detail & Related papers (2023-11-17T18:57:40Z) - Learning Unseen Modality Interaction [54.23533023883659]
Multimodal learning assumes all modality combinations of interest are available during training to learn cross-modal correspondences.
We pose the problem of unseen modality interaction and introduce a first solution.
It exploits a module that projects the multidimensional features of different modalities into a common space with rich information preserved.
arXiv Detail & Related papers (2023-06-22T10:53:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.