Related papers: DynCIM: Dynamic Curriculum for Imbalanced Multimodal Learning

DynCIM: Dynamic Curriculum for Imbalanced Multimodal Learning

URL: http://arxiv.org/abs/2503.06456v2
Date: Thu, 13 Mar 2025 18:39:49 GMT
Title: DynCIM: Dynamic Curriculum for Imbalanced Multimodal Learning
Authors: Chengxuan Qian, Kai Han, Jingchao Wang, Zhenlong Yuan, Chongwen Lyu, Jun Chen, Zhe Liu,
Abstract summary: DynCIM is a novel dynamic curriculum learning framework designed to quantify the inherent imbalances from both sample and modality perspectives.<n>DynCIM employs a sample-level curriculum to dynamically assess each sample's difficulty according to prediction deviation, consistency, and stability.<n>A modality-level curriculum measures modality contributions from global and local.
Score: 15.524342129628957
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Multimodal learning integrates complementary information from diverse modalities to enhance the decision-making process. However, the potential of multimodal collaboration remains under-exploited due to disparities in data quality and modality representation capabilities. To address this, we introduce DynCIM, a novel dynamic curriculum learning framework designed to quantify the inherent imbalances from both sample and modality perspectives. DynCIM employs a sample-level curriculum to dynamically assess each sample's difficulty according to prediction deviation, consistency, and stability, while a modality-level curriculum measures modality contributions from global and local. Furthermore, a gating-based dynamic fusion mechanism is introduced to adaptively adjust modality contributions, minimizing redundancy and optimizing fusion effectiveness. Extensive experiments on six multimodal benchmarking datasets, spanning both bimodal and trimodal scenarios, demonstrate that DynCIM consistently outperforms state-of-the-art methods. Our approach effectively mitigates modality and sample imbalances while enhancing adaptability and robustness in multimodal learning tasks. Our code is available at https://github.com/Raymond-Qiancx/DynCIM.

Related papers

Balanced Multimodal Learning via Mutual Information [1.9336815376402718]
We propose a novel unified framework designed to address modality imbalance by utilizing mutual information to quantify interactions between modalities.<n>Our approach adopts a balanced multimodal learning strategy comprising two key stages: cross-modal knowledge distillation (KD) and a multitask-like training paradigm.
arXiv Detail & Related papers (2025-11-02T15:58:05Z)
MILES: Modality-Informed Learning Rate Scheduler for Balancing Multimodal Learning [47.487732221767196]
We present Modality-Informed Learning ratE Scheduler (MILES) for training multimodal joint fusion models.<n>MILES balances modality-wise conditional utilization rates during training to effectively balance multimodal learning.<n>Our results show that MILES outperforms all baselines across all tasks and fusion methods considered in our study.
arXiv Detail & Related papers (2025-10-20T10:34:59Z)
BTW: A Non-Parametric Variance Stabilization Framework for Multimodal Model Integration [20.600001069987318]
We propose Beyond Two-modality Weighting (BTW) to dynamically adjust modality importance during training.<n>BTW computes per-example KL weights by measuring the divergence between each unimodal and the current multimodal prediction.<n>Our method significantly improves regression performance and multiclass classification accuracy.
arXiv Detail & Related papers (2025-08-25T23:00:38Z)
VL-Cogito: Progressive Curriculum Reinforcement Learning for Advanced Multimodal Reasoning [69.44871115752055]
We propose an advanced multimodal reasoning model trained via a novel Progressive Curriculum Reinforcement Learning (PCuRL) framework.<n>PCuRL systematically guides the model through tasks of gradually increasing difficulty, substantially improving its reasoning abilities across diverse multimodal contexts.<n>The framework introduces two key innovations: (1) an online difficulty soft weighting mechanism, dynamically adjusting training difficulty across successive RL training stages; and (2) a dynamic length reward mechanism, which encourages the model to adaptively regulate its reasoning path length according to task complexity.
arXiv Detail & Related papers (2025-07-30T12:23:21Z)
A Robust Incomplete Multimodal Low-Rank Adaptation Approach for Emotion Recognition [17.332141776831513]
Multimodal Emotion Recognition (MER) often encounters incomplete multimodality in practical applications.<n>We propose a unimodal decoupled dynamic low-rank adaptation method based on modality combinations, named MCULoRA.
arXiv Detail & Related papers (2025-07-15T11:15:35Z)
Modality Equilibrium Matters: Minor-Modality-Aware Adaptive Alternating for Cross-Modal Memory Enhancement [13.424541949553964]
We propose a Shapley-guided alternating training framework that adaptively prioritizes minor modalities to balance and thus enhance the fusion.<n>We evaluate the performance in both balance and accuracy across four multimodal benchmark datasets, where our method achieves state-of-the-art (SOTA) results.
arXiv Detail & Related papers (2025-05-26T02:02:57Z)
Harmony: A Unified Framework for Modality Incremental Learning [81.13765007314781]
This paper investigates the feasibility of developing a unified model capable of incremental learning across continuously evolving modal sequences. We propose a novel framework named Harmony, designed to achieve modal alignment and knowledge retention. Our approach introduces the adaptive compatible feature modulation and cumulative modal bridging.
arXiv Detail & Related papers (2025-04-17T06:35:01Z)
Rebalanced Multimodal Learning with Data-aware Unimodal Sampling [39.77348232514481]
We propose a novel MML approach called underlineData-aware underlineUnimodal underlineSampling(method)<n>Based on the learning status, we propose a reinforcement learning(RL)-based data-aware unimodal sampling approaches.<n>Our method can be seamlessly incorporated into almost all existing multimodal learning approaches as a plugin.
arXiv Detail & Related papers (2025-03-05T08:19:31Z)
PAL: Prompting Analytic Learning with Missing Modality for Multi-Modal Class-Incremental Learning [42.00851701431368]
Multi-modal class-incremental learning (MMCIL) seeks to leverage multi-modal data, such as audio-visual and image-text pairs.<n>A critical challenge remains: the issue of missing modalities during incremental learning phases.<n>We propose PAL, a novel exemplar-free framework tailored to MMCIL under missing-modality scenarios.
arXiv Detail & Related papers (2025-01-16T08:04:04Z)
Asymmetric Reinforcing against Multi-modal Representation Bias [59.685072206359855]
We propose an Asymmetric Reinforcing method against Multimodal representation bias (ARM)<n>Our ARM dynamically reinforces the weak modalities while maintaining the ability to represent dominant modalities through conditional mutual information.<n>We have significantly improved the performance of multimodal learning, making notable progress in mitigating imbalanced multimodal learning.
arXiv Detail & Related papers (2025-01-02T13:00:06Z)
LLMs Can Evolve Continually on Modality for X-Modal Reasoning [62.2874638875554]
Existing methods rely heavily on modal-specific pretraining and joint-modal tuning, leading to significant computational burdens when expanding to new modalities. We propose PathWeave, a flexible and scalable framework with modal-Path sWitching and ExpAnsion abilities. PathWeave performs comparably to state-of-the-art MLLMs while concurrently reducing parameter training burdens by 98.73%.
arXiv Detail & Related papers (2024-10-26T13:19:57Z)
On-the-fly Modulation for Balanced Multimodal Learning [53.616094855778954]
Multimodal learning is expected to boost model performance by integrating information from different modalities. The widely-used joint training strategy leads to imbalanced and under-optimized uni-modal representations. We propose On-the-fly Prediction Modulation (OPM) and On-the-fly Gradient Modulation (OGM) strategies to modulate the optimization of each modality.
arXiv Detail & Related papers (2024-10-15T13:15:50Z)
Multimodal Classification via Modal-Aware Interactive Enhancement [6.621745547882088]
We propose a novel multimodal learning method, called modal-aware interactive enhancement (MIE) Specifically, we first utilize an optimization strategy based on sharpness aware minimization (SAM) to smooth the learning objective during the forward phase. Then, with the help of the geometry property of SAM, we propose a gradient modification strategy to impose the influence between different modalities during the backward phase.
arXiv Detail & Related papers (2024-07-05T15:32:07Z)
Multimodal Representation Learning by Alternating Unimodal Adaptation [73.15829571740866]
We propose MLA (Multimodal Learning with Alternating Unimodal Adaptation) to overcome challenges where some modalities appear more dominant than others during multimodal learning. MLA reframes the conventional joint multimodal learning process by transforming it into an alternating unimodal learning process. It captures cross-modal interactions through a shared head, which undergoes continuous optimization across different modalities. Experiments are conducted on five diverse datasets, encompassing scenarios with complete modalities and scenarios with missing modalities.
arXiv Detail & Related papers (2023-11-17T18:57:40Z)
Unified Multi-modal Unsupervised Representation Learning for Skeleton-based Action Understanding [62.70450216120704]
Unsupervised pre-training has shown great success in skeleton-based action understanding. We propose a Unified Multimodal Unsupervised Representation Learning framework, called UmURL. UmURL exploits an efficient early-fusion strategy to jointly encode the multi-modal features in a single-stream manner.
arXiv Detail & Related papers (2023-11-06T13:56:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.