AIM: Adaptive Intra-Network Modulation for Balanced Multimodal Learning
- URL: http://arxiv.org/abs/2508.19769v2
- Date: Sat, 06 Sep 2025 03:08:42 GMT
- Title: AIM: Adaptive Intra-Network Modulation for Balanced Multimodal Learning
- Authors: Shu Shen, C. L. Philip Chen, Tong Zhang,
- Abstract summary: We propose Adaptive Intra-Network Modulation (AIM) to improve balanced modality learning.<n>AIM accounts for differences in optimization state across parameters and depths within the network during modulation.<n>We show that AIM outperforms state-of-the-art imbalanced modality learning methods across multiple benchmarks.
- Score: 55.56234913868664
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multimodal learning has significantly enhanced machine learning performance but still faces numerous challenges and limitations. Imbalanced multimodal learning is one of the problems extensively studied in recent works and is typically mitigated by modulating the learning of each modality. However, we find that these methods typically hinder the dominant modality's learning to promote weaker modalities, which affects overall multimodal performance. We analyze the cause of this issue and highlight a commonly overlooked problem: optimization bias within networks. To address this, we propose Adaptive Intra-Network Modulation (AIM) to improve balanced modality learning. AIM accounts for differences in optimization state across parameters and depths within the network during modulation, achieving balanced multimodal learning without hindering either dominant or weak modalities for the first time. Specifically, AIM decouples the dominant modality's under-optimized parameters into Auxiliary Blocks and encourages reliance on these performance-degraded blocks for joint training with weaker modalities. This approach effectively prevents suppression of weaker modalities while enabling targeted optimization of under-optimized parameters to improve the dominant modality. Additionally, AIM assesses modality imbalance level across network depths and adaptively adjusts modulation strength at each depth. Experimental results demonstrate that AIM outperforms state-of-the-art imbalanced modality learning methods across multiple benchmarks and exhibits strong generalizability across different backbones, fusion strategies, and optimizers.
Related papers
- Modality-Balanced Collaborative Distillation for Multi-Modal Domain Generalization [72.83292830785336]
Weight Averaging (WA) has emerged as a powerful technique for enhancing generalization by promoting convergence to a flat loss landscape.<n>We propose MBCD, a unified collaborative distillation framework that retains WA's flatness-inducing advantages while overcoming its shortcomings in multi-modal contexts.
arXiv Detail & Related papers (2025-11-25T12:38:28Z) - Improving Multimodal Sentiment Analysis via Modality Optimization and Dynamic Primary Modality Selection [54.10252086842123]
Multimodal Sentiment Analysis (MSA) aims to predict sentiment from language, acoustic, and visual data in videos.<n>This paper proposes a modality optimization and dynamic primary modality selection framework (MODS)<n>Experiments on four benchmark datasets demonstrate that MODS outperforms state-of-the-art methods.
arXiv Detail & Related papers (2025-11-09T11:13:32Z) - Modality-Aware SAM: Sharpness-Aware-Minimization Driven Gradient Modulation for Harmonized Multimodal Learning [19.588844325339355]
Modality-Aware Sharpness-Aware Minimization (M-SAM) is a model-agnostic framework that applies to many modalities and supports early and late fusion scenarios.<n>Extensive experiments on four diverse datasets show that M-SAM outperforms the latest state-of-the-art optimization and gradient manipulation methods.
arXiv Detail & Related papers (2025-10-28T19:44:20Z) - Learning to Fuse: Modality-Aware Adaptive Scheduling for Robust Multimodal Foundation Models [0.0]
Modality-Aware Adaptive Fusion Scheduling (MA-AFS) learns to dynamically modulate the contribution of each modality on a per-instance basis.<n>Our work highlights the importance of adaptive fusion and opens a promising direction toward reliable and uncertainty-aware multimodal learning.
arXiv Detail & Related papers (2025-06-15T05:57:45Z) - Modality Equilibrium Matters: Minor-Modality-Aware Adaptive Alternating for Cross-Modal Memory Enhancement [13.424541949553964]
We propose a Shapley-guided alternating training framework that adaptively prioritizes minor modalities to balance and thus enhance the fusion.<n>We evaluate the performance in both balance and accuracy across four multimodal benchmark datasets, where our method achieves state-of-the-art (SOTA) results.
arXiv Detail & Related papers (2025-05-26T02:02:57Z) - PATS: Process-Level Adaptive Thinking Mode Switching [53.53401063490537]
Current large-language models (LLMs) typically adopt a fixed reasoning strategy, either simple or complex, for all questions, regardless of their difficulty.<n>This neglect of variation in task and reasoning process complexity leads to an imbalance between performance and efficiency.<n>Existing methods attempt to implement training-free fast-slow thinking system switching to handle problems of varying difficulty, but are limited by coarse-grained solution-level strategy adjustments.<n>We propose a novel reasoning paradigm: Process-Level Adaptive Thinking Mode Switching (PATS), which enables LLMs to dynamically adjust their reasoning strategy based on the difficulty of each step, optimizing the balance between
arXiv Detail & Related papers (2025-05-25T17:58:50Z) - Asymmetric Reinforcing against Multi-modal Representation Bias [59.685072206359855]
We propose an Asymmetric Reinforcing method against Multimodal representation bias (ARM)<n>Our ARM dynamically reinforces the weak modalities while maintaining the ability to represent dominant modalities through conditional mutual information.<n>We have significantly improved the performance of multimodal learning, making notable progress in mitigating imbalanced multimodal learning.
arXiv Detail & Related papers (2025-01-02T13:00:06Z) - On-the-fly Modulation for Balanced Multimodal Learning [53.616094855778954]
Multimodal learning is expected to boost model performance by integrating information from different modalities.
The widely-used joint training strategy leads to imbalanced and under-optimized uni-modal representations.
We propose On-the-fly Prediction Modulation (OPM) and On-the-fly Gradient Modulation (OGM) strategies to modulate the optimization of each modality.
arXiv Detail & Related papers (2024-10-15T13:15:50Z) - Multimodal Classification via Modal-Aware Interactive Enhancement [6.621745547882088]
We propose a novel multimodal learning method, called modal-aware interactive enhancement (MIE)
Specifically, we first utilize an optimization strategy based on sharpness aware minimization (SAM) to smooth the learning objective during the forward phase.
Then, with the help of the geometry property of SAM, we propose a gradient modification strategy to impose the influence between different modalities during the backward phase.
arXiv Detail & Related papers (2024-07-05T15:32:07Z) - Learning to Rebalance Multi-Modal Optimization by Adaptively Masking Subnetworks [13.065212096469537]
We propose a novel importance sampling-based, element-wise joint optimization method, called Adaptively Mask Subnetworks Considering Modal Significance(AMSS)
Specifically, we incorporate mutual information rates to determine the modal significance and employ non-uniform adaptive sampling to select foregroundworks from each modality for parameter updates.
Building upon theoretical insights, we further enhance the multi-modal mask subnetwork strategy using unbiased estimation, referred to as AMSS+.
arXiv Detail & Related papers (2024-04-12T09:22:24Z) - Multimodal Representation Learning by Alternating Unimodal Adaptation [73.15829571740866]
We propose MLA (Multimodal Learning with Alternating Unimodal Adaptation) to overcome challenges where some modalities appear more dominant than others during multimodal learning.
MLA reframes the conventional joint multimodal learning process by transforming it into an alternating unimodal learning process.
It captures cross-modal interactions through a shared head, which undergoes continuous optimization across different modalities.
Experiments are conducted on five diverse datasets, encompassing scenarios with complete modalities and scenarios with missing modalities.
arXiv Detail & Related papers (2023-11-17T18:57:40Z) - Improving Discriminative Multi-Modal Learning with Large-Scale
Pre-Trained Models [51.5543321122664]
This paper investigates how to better leverage large-scale pre-trained uni-modal models to enhance discriminative multi-modal learning.
We introduce Multi-Modal Low-Rank Adaptation learning (MMLoRA)
arXiv Detail & Related papers (2023-10-08T15:01:54Z) - Beyond Losses Reweighting: Empowering Multi-Task Learning via the Generalization Perspective [61.10883077161432]
Multi-task learning (MTL) trains deep neural networks to optimize several objectives simultaneously using a shared backbone.<n>We introduce a novel MTL framework that leverages weight perturbation to regulate gradient norms, thus improving generalization.<n>Our method significantly outperforms existing gradient-based MTL techniques in terms of task performance and overall model robustness.
arXiv Detail & Related papers (2022-11-24T17:19:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.