Related papers: Cross-modal Prompting for Balanced Incomplete Multi-modal Emotion Recognition

Cross-modal Prompting for Balanced Incomplete Multi-modal Emotion Recognition

URL: http://arxiv.org/abs/2512.11239v1
Date: Fri, 12 Dec 2025 02:38:03 GMT
Title: Cross-modal Prompting for Balanced Incomplete Multi-modal Emotion Recognition
Authors: Wen-Jue He, Xiaofeng Zhu, Zheng Zhang,
Abstract summary: We devise a novel Cross-modal Prompting (ComP) method, which emphasizes coherent information by enhancing modality-specific features.<n>ComP method improves the overall recognition accuracy by boosting each modality's performance.<n> experiments on 4 datasets with 7 SOTA methods under different missing rates validate the effectiveness of our proposed method.
Score: 14.469741471849249
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Incomplete multi-modal emotion recognition (IMER) aims at understanding human intentions and sentiments by comprehensively exploring the partially observed multi-source data. Although the multi-modal data is expected to provide more abundant information, the performance gap and modality under-optimization problem hinder effective multi-modal learning in practice, and are exacerbated in the confrontation of the missing data. To address this issue, we devise a novel Cross-modal Prompting (ComP) method, which emphasizes coherent information by enhancing modality-specific features and improves the overall recognition accuracy by boosting each modality's performance. Specifically, a progressive prompt generation module with a dynamic gradient modulator is proposed to produce concise and consistent modality semantic cues. Meanwhile, cross-modal knowledge propagation selectively amplifies the consistent information in modality features with the delivered prompts to enhance the discrimination of the modality-specific output. Additionally, a coordinator is designed to dynamically re-weight the modality outputs as a complement to the balance strategy to improve the model's efficacy. Extensive experiments on 4 datasets with 7 SOTA methods under different missing rates validate the effectiveness of our proposed method.

Related papers

ADMC: Attention-based Diffusion Model for Missing Modalities Feature Completion [25.1725138364452]
We introduce an Attention-based Diffusion model for Missing Modalities feature Completion (ADMC)<n>Our framework independently trains feature extraction networks for each modality, preserving their unique characteristics and avoiding over-coupling.<n>Our approach achieves state-of-the-art results on the IEMOCAP and MIntRec benchmarks, demonstrating its effectiveness in both missing and complete modality scenarios.
arXiv Detail & Related papers (2025-07-08T03:08:52Z)
Harmony: A Unified Framework for Modality Incremental Learning [81.13765007314781]
This paper investigates the feasibility of developing a unified model capable of incremental learning across continuously evolving modal sequences.<n>We propose a novel framework named Harmony, designed to achieve modal alignment and knowledge retention.<n>Our approach introduces the adaptive compatible feature modulation and cumulative modal bridging.
arXiv Detail & Related papers (2025-04-17T06:35:01Z)
GAMED: Knowledge Adaptive Multi-Experts Decoupling for Multimodal Fake News Detection [18.157900272828602]
Multimodal fake news detection often involves modelling heterogeneous data sources, such as vision and language.<n>This paper develops a significantly novel approach, GAMED, for multimodal modelling.<n>It focuses on generating distinctive and discriminative features through modal decoupling to enhance cross-modal synergies.
arXiv Detail & Related papers (2024-12-11T19:12:22Z)
On-the-fly Modulation for Balanced Multimodal Learning [53.616094855778954]
Multimodal learning is expected to boost model performance by integrating information from different modalities. The widely-used joint training strategy leads to imbalanced and under-optimized uni-modal representations. We propose On-the-fly Prediction Modulation (OPM) and On-the-fly Gradient Modulation (OGM) strategies to modulate the optimization of each modality.
arXiv Detail & Related papers (2024-10-15T13:15:50Z)
Modality Prompts for Arbitrary Modality Salient Object Detection [57.610000247519196]
This paper delves into the task of arbitrary modality salient object detection (AM SOD) It aims to detect salient objects from arbitrary modalities, eg RGB images, RGB-D images, and RGB-D-T images. A novel modality-adaptive Transformer (MAT) will be proposed to investigate two fundamental challenges of AM SOD.
arXiv Detail & Related papers (2024-05-06T11:02:02Z)
Unified Multi-modal Unsupervised Representation Learning for Skeleton-based Action Understanding [62.70450216120704]
Unsupervised pre-training has shown great success in skeleton-based action understanding. We propose a Unified Multimodal Unsupervised Representation Learning framework, called UmURL. UmURL exploits an efficient early-fusion strategy to jointly encode the multi-modal features in a single-stream manner.
arXiv Detail & Related papers (2023-11-06T13:56:57Z)
Exploiting Modality-Specific Features For Multi-Modal Manipulation Detection And Grounding [54.49214267905562]
We construct a transformer-based framework for multi-modal manipulation detection and grounding tasks. Our framework simultaneously explores modality-specific features while preserving the capability for multi-modal alignment. We propose an implicit manipulation query (IMQ) that adaptively aggregates global contextual cues within each modality.
arXiv Detail & Related papers (2023-09-22T06:55:41Z)
One-stage Modality Distillation for Incomplete Multimodal Learning [6.93254775445168]
This paper presents a one-stage modality distillation framework that unifies the privileged knowledge transfer and modality information fusion.<n>The proposed framework can overcome the problem of incomplete modality input in various scenes and achieve state-of-the-art performance.
arXiv Detail & Related papers (2023-09-15T07:12:27Z)
Exploiting modality-invariant feature for robust multimodal emotion recognition with missing modalities [76.08541852988536]
We propose to use invariant features for a missing modality imagination network (IF-MMIN) We show that the proposed model outperforms all baselines and invariantly improves the overall emotion recognition performance under uncertain missing-modality conditions.
arXiv Detail & Related papers (2022-10-27T12:16:25Z)
Robust Latent Representations via Cross-Modal Translation and Alignment [36.67937514793215]
Most multi-modal machine learning methods require that all the modalities used for training are also available for testing. To address this limitation, we aim to improve the testing performance of uni-modal systems using multiple modalities during training only. The proposed multi-modal training framework uses cross-modal translation and correlation-based latent space alignment.
arXiv Detail & Related papers (2020-11-03T11:18:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.