Hardness-Aware Dynamic Curriculum Learning for Robust Multimodal Emotion Recognition with Missing Modalities
- URL: http://arxiv.org/abs/2508.06800v2
- Date: Thu, 14 Aug 2025 16:06:55 GMT
- Title: Hardness-Aware Dynamic Curriculum Learning for Robust Multimodal Emotion Recognition with Missing Modalities
- Authors: Rui Liu, Haolin Zuo, Zheng Lian, Hongyu Yuan, Qi Fan,
- Abstract summary: We propose a novel Hardness-Aware Dynamic Curriculum Learning framework, termed HARDY-MER.<n>Our framework operates in two key stages: first, it estimates the hardness level of each sample, and second, it strategically emphasizes hard samples during training.<n>Experiments on benchmark datasets demonstrate that HARDY-MER consistently outperforms existing methods in missing-modality scenarios.
- Score: 15.783261732000883
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Missing modalities have recently emerged as a critical research direction in multimodal emotion recognition (MER). Conventional approaches typically address this issue through missing modality reconstruction. However, these methods fail to account for variations in reconstruction difficulty across different samples, consequently limiting the model's ability to handle hard samples effectively. To overcome this limitation, we propose a novel Hardness-Aware Dynamic Curriculum Learning framework, termed HARDY-MER. Our framework operates in two key stages: first, it estimates the hardness level of each sample, and second, it strategically emphasizes hard samples during training to enhance model performance on these challenging instances. Specifically, we first introduce a Multi-view Hardness Evaluation mechanism that quantifies reconstruction difficulty by considering both Direct Hardness (modality reconstruction errors) and Indirect Hardness (cross-modal mutual information). Meanwhile, we introduce a Retrieval-based Dynamic Curriculum Learning strategy that dynamically adjusts the training curriculum by retrieving samples with similar semantic information and balancing the learning focus between easy and hard instances. Extensive experiments on benchmark datasets demonstrate that HARDY-MER consistently outperforms existing methods in missing-modality scenarios. Our code will be made publicly available at https://github.com/HARDY-MER/HARDY-MER.
Related papers
- Toward Understanding Unlearning Difficulty: A Mechanistic Perspective and Circuit-Guided Difficulty Metric [36.2724900971511]
Circuit-guided Unlearning Difficulty (CUD) is a metric that assigns each sample a continuous difficulty score using circuit-level signals.<n>We identify key circuit-level patterns that reveal a mechanistic signature of difficulty.
arXiv Detail & Related papers (2026-01-14T16:55:58Z) - Tailored Teaching with Balanced Difficulty: Elevating Reasoning in Multimodal Chain-of-Thought via Prompt Curriculum [39.57901536686932]
Multimodal Chain-of-Thought (MCoT) prompting is often limited by the use of randomly or manually selected examples.<n>We propose a novel framework inspired by the pedagogical principle of "tailored teaching with balanced difficulty"<n>Our approach integrates two complementary signals: model-perceived difficulty, quantified through prediction disagreement in an active learning setup, and intrinsic sample complexity, which measures the inherent difficulty of each question-image pair independently of any model.
arXiv Detail & Related papers (2025-08-26T04:32:15Z) - VL-Cogito: Progressive Curriculum Reinforcement Learning for Advanced Multimodal Reasoning [69.44871115752055]
We propose an advanced multimodal reasoning model trained via a novel Progressive Curriculum Reinforcement Learning (PCuRL) framework.<n>PCuRL systematically guides the model through tasks of gradually increasing difficulty, substantially improving its reasoning abilities across diverse multimodal contexts.<n>The framework introduces two key innovations: (1) an online difficulty soft weighting mechanism, dynamically adjusting training difficulty across successive RL training stages; and (2) a dynamic length reward mechanism, which encourages the model to adaptively regulate its reasoning path length according to task complexity.
arXiv Detail & Related papers (2025-07-30T12:23:21Z) - Try Harder: Hard Sample Generation and Learning for Clothes-Changing Person Re-ID [4.256800812615341]
Hard samples pose a significant challenge in person re-identification (ReID) tasks.<n>Their inherent ambiguity or similarity, coupled with the lack of explicit definitions, makes them a fundamental bottleneck.<n>We propose a novel multimodal-guided Hard Sample Generation and Learning framework.
arXiv Detail & Related papers (2025-07-15T09:14:01Z) - Progressive Mastery: Customized Curriculum Learning with Guided Prompting for Mathematical Reasoning [43.12759195699103]
Large Language Models (LLMs) have achieved remarkable performance across various reasoning tasks, yet post-training is constrained by inefficient sample utilization and inflexible difficulty samples processing.<n>We propose Customized Curriculum Learning (CCL), a novel framework with two key innovations.<n>First, we introduce model-adaptive difficulty definition that customizes curriculum datasets based on each model's individual capabilities rather than using predefined difficulty metrics.<n>Second, we develop "Guided Prompting," which dynamically reduces sample difficulty through strategic hints, enabling effective utilization of challenging samples that would otherwise degrade performance.
arXiv Detail & Related papers (2025-06-04T15:31:46Z) - Modality Curation: Building Universal Embeddings for Advanced Multimodal Information Retrieval [30.98084422803278]
We introduce UNITE, a universal framework that tackles challenges through data curation and modality-aware training configurations.<n>Our work provides the first comprehensive analysis of how modality-specific data properties influence downstream task performance.<n>Our framework achieves state-of-the-art results on multiple multimodal retrieval benchmarks, outperforming existing methods by notable margins.
arXiv Detail & Related papers (2025-05-26T08:09:44Z) - Unlocking the Potential of Difficulty Prior in RL-based Multimodal Reasoning [69.64809103333839]
We investigate how explicitly modeling problem's difficulty prior information shapes the effectiveness of reinforcement learning based fine-tuning for multimodal reasoning.<n>Our approach demonstrates significant performances across various multi-modal mathematical reasoning benchmarks with only 2K+0.6K two-stage training data.
arXiv Detail & Related papers (2025-05-19T15:43:10Z) - PAL: Prompting Analytic Learning with Missing Modality for Multi-Modal Class-Incremental Learning [42.00851701431368]
Multi-modal class-incremental learning (MMCIL) seeks to leverage multi-modal data, such as audio-visual and image-text pairs.<n>A critical challenge remains: the issue of missing modalities during incremental learning phases.<n>We propose PAL, a novel exemplar-free framework tailored to MMCIL under missing-modality scenarios.
arXiv Detail & Related papers (2025-01-16T08:04:04Z) - Towards Modality Generalization: A Benchmark and Prospective Analysis [68.20973671493203]
This paper introduces Modality Generalization (MG), which focuses on enabling models to generalize to unseen modalities.<n>We propose a comprehensive benchmark featuring multi-modal algorithms and adapt existing methods that focus on generalization.<n>Our work provides a foundation for advancing robust and adaptable multi-modal models, enabling them to handle unseen modalities in realistic scenarios.
arXiv Detail & Related papers (2024-12-24T08:38:35Z) - Cross-Modal Few-Shot Learning: a Generative Transfer Learning Framework [58.362064122489166]
This paper introduces the Cross-modal Few-Shot Learning task, which aims to recognize instances across multiple modalities while relying on scarce labeled data.<n>We propose a Generative Transfer Learning framework by simulating how humans abstract and generalize concepts.<n>We show that the GTL achieves state-of-the-art performance across seven multi-modal datasets across RGB-Sketch, RGB-Infrared, and RGB-Depth.
arXiv Detail & Related papers (2024-10-14T16:09:38Z) - Dynamic Contrastive Distillation for Image-Text Retrieval [90.05345397400144]
We present a novel plug-in dynamic contrastive distillation (DCD) framework to compress image-text retrieval models.
We successfully apply our proposed DCD strategy to two state-of-the-art vision-language pretrained models, i.e. ViLT and METER.
Experiments on MS-COCO and Flickr30K benchmarks show the effectiveness and efficiency of our DCD framework.
arXiv Detail & Related papers (2022-07-04T14:08:59Z) - Dynamic Dual-Attentive Aggregation Learning for Visible-Infrared Person
Re-Identification [208.1227090864602]
Visible-infrared person re-identification (VI-ReID) is a challenging cross-modality pedestrian retrieval problem.
Existing VI-ReID methods tend to learn global representations, which have limited discriminability and weak robustness to noisy images.
We propose a novel dynamic dual-attentive aggregation (DDAG) learning method by mining both intra-modality part-level and cross-modality graph-level contextual cues for VI-ReID.
arXiv Detail & Related papers (2020-07-18T03:08:13Z) - CurricularFace: Adaptive Curriculum Learning Loss for Deep Face
Recognition [79.92240030758575]
We propose a novel Adaptive Curriculum Learning loss (CurricularFace) that embeds the idea of curriculum learning into the loss function.
Our CurricularFace adaptively adjusts the relative importance of easy and hard samples during different training stages.
arXiv Detail & Related papers (2020-04-01T08:43:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.