Cross-modal Prompting for Balanced Incomplete Multi-modal Emotion Recognition
- URL: http://arxiv.org/abs/2512.11239v1
- Date: Fri, 12 Dec 2025 02:38:03 GMT
- Title: Cross-modal Prompting for Balanced Incomplete Multi-modal Emotion Recognition
- Authors: Wen-Jue He, Xiaofeng Zhu, Zheng Zhang,
- Abstract summary: We devise a novel Cross-modal Prompting (ComP) method, which emphasizes coherent information by enhancing modality-specific features.<n>ComP method improves the overall recognition accuracy by boosting each modality's performance.<n> experiments on 4 datasets with 7 SOTA methods under different missing rates validate the effectiveness of our proposed method.
- Score: 14.469741471849249
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Incomplete multi-modal emotion recognition (IMER) aims at understanding human intentions and sentiments by comprehensively exploring the partially observed multi-source data. Although the multi-modal data is expected to provide more abundant information, the performance gap and modality under-optimization problem hinder effective multi-modal learning in practice, and are exacerbated in the confrontation of the missing data. To address this issue, we devise a novel Cross-modal Prompting (ComP) method, which emphasizes coherent information by enhancing modality-specific features and improves the overall recognition accuracy by boosting each modality's performance. Specifically, a progressive prompt generation module with a dynamic gradient modulator is proposed to produce concise and consistent modality semantic cues. Meanwhile, cross-modal knowledge propagation selectively amplifies the consistent information in modality features with the delivered prompts to enhance the discrimination of the modality-specific output. Additionally, a coordinator is designed to dynamically re-weight the modality outputs as a complement to the balance strategy to improve the model's efficacy. Extensive experiments on 4 datasets with 7 SOTA methods under different missing rates validate the effectiveness of our proposed method.
Related papers
- ADMC: Attention-based Diffusion Model for Missing Modalities Feature Completion [25.1725138364452]
We introduce an Attention-based Diffusion model for Missing Modalities feature Completion (ADMC)<n>Our framework independently trains feature extraction networks for each modality, preserving their unique characteristics and avoiding over-coupling.<n>Our approach achieves state-of-the-art results on the IEMOCAP and MIntRec benchmarks, demonstrating its effectiveness in both missing and complete modality scenarios.
arXiv Detail & Related papers (2025-07-08T03:08:52Z) - Harmony: A Unified Framework for Modality Incremental Learning [81.13765007314781]
This paper investigates the feasibility of developing a unified model capable of incremental learning across continuously evolving modal sequences.<n>We propose a novel framework named Harmony, designed to achieve modal alignment and knowledge retention.<n>Our approach introduces the adaptive compatible feature modulation and cumulative modal bridging.
arXiv Detail & Related papers (2025-04-17T06:35:01Z) - GAMED: Knowledge Adaptive Multi-Experts Decoupling for Multimodal Fake News Detection [18.157900272828602]
Multimodal fake news detection often involves modelling heterogeneous data sources, such as vision and language.<n>This paper develops a significantly novel approach, GAMED, for multimodal modelling.<n>It focuses on generating distinctive and discriminative features through modal decoupling to enhance cross-modal synergies.
arXiv Detail & Related papers (2024-12-11T19:12:22Z) - On-the-fly Modulation for Balanced Multimodal Learning [53.616094855778954]
Multimodal learning is expected to boost model performance by integrating information from different modalities.
The widely-used joint training strategy leads to imbalanced and under-optimized uni-modal representations.
We propose On-the-fly Prediction Modulation (OPM) and On-the-fly Gradient Modulation (OGM) strategies to modulate the optimization of each modality.
arXiv Detail & Related papers (2024-10-15T13:15:50Z) - Modality Prompts for Arbitrary Modality Salient Object Detection [57.610000247519196]
This paper delves into the task of arbitrary modality salient object detection (AM SOD)
It aims to detect salient objects from arbitrary modalities, eg RGB images, RGB-D images, and RGB-D-T images.
A novel modality-adaptive Transformer (MAT) will be proposed to investigate two fundamental challenges of AM SOD.
arXiv Detail & Related papers (2024-05-06T11:02:02Z) - Unified Multi-modal Unsupervised Representation Learning for
Skeleton-based Action Understanding [62.70450216120704]
Unsupervised pre-training has shown great success in skeleton-based action understanding.
We propose a Unified Multimodal Unsupervised Representation Learning framework, called UmURL.
UmURL exploits an efficient early-fusion strategy to jointly encode the multi-modal features in a single-stream manner.
arXiv Detail & Related papers (2023-11-06T13:56:57Z) - Exploiting Modality-Specific Features For Multi-Modal Manipulation
Detection And Grounding [54.49214267905562]
We construct a transformer-based framework for multi-modal manipulation detection and grounding tasks.
Our framework simultaneously explores modality-specific features while preserving the capability for multi-modal alignment.
We propose an implicit manipulation query (IMQ) that adaptively aggregates global contextual cues within each modality.
arXiv Detail & Related papers (2023-09-22T06:55:41Z) - One-stage Modality Distillation for Incomplete Multimodal Learning [6.93254775445168]
This paper presents a one-stage modality distillation framework that unifies the privileged knowledge transfer and modality information fusion.<n>The proposed framework can overcome the problem of incomplete modality input in various scenes and achieve state-of-the-art performance.
arXiv Detail & Related papers (2023-09-15T07:12:27Z) - Exploiting modality-invariant feature for robust multimodal emotion
recognition with missing modalities [76.08541852988536]
We propose to use invariant features for a missing modality imagination network (IF-MMIN)
We show that the proposed model outperforms all baselines and invariantly improves the overall emotion recognition performance under uncertain missing-modality conditions.
arXiv Detail & Related papers (2022-10-27T12:16:25Z) - Robust Latent Representations via Cross-Modal Translation and Alignment [36.67937514793215]
Most multi-modal machine learning methods require that all the modalities used for training are also available for testing.
To address this limitation, we aim to improve the testing performance of uni-modal systems using multiple modalities during training only.
The proposed multi-modal training framework uses cross-modal translation and correlation-based latent space alignment.
arXiv Detail & Related papers (2020-11-03T11:18:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.