Related papers: Multimodal Prompt Learning with Missing Modalities for Sentiment Analysis and Emotion Recognition

Multimodal Prompt Learning with Missing Modalities for Sentiment Analysis and Emotion Recognition

URL: http://arxiv.org/abs/2407.05374v1
Date: Sun, 7 Jul 2024 13:55:56 GMT
Title: Multimodal Prompt Learning with Missing Modalities for Sentiment Analysis and Emotion Recognition
Authors: Zirun Guo, Tao Jin, Zhou Zhao,
Abstract summary: We propose a novel multimodal Transformer framework using prompt learning to address the issue of missing modalities. Our method introduces three types of prompts: generative prompts, missing-signal prompts, and missing-type prompts. Through prompt learning, we achieve a substantial reduction in the number of trainable parameters.
Score: 52.522244807811894
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The development of multimodal models has significantly advanced multimodal sentiment analysis and emotion recognition. However, in real-world applications, the presence of various missing modality cases often leads to a degradation in the model's performance. In this work, we propose a novel multimodal Transformer framework using prompt learning to address the issue of missing modalities. Our method introduces three types of prompts: generative prompts, missing-signal prompts, and missing-type prompts. These prompts enable the generation of missing modality features and facilitate the learning of intra- and inter-modality information. Through prompt learning, we achieve a substantial reduction in the number of trainable parameters. Our proposed method outperforms other methods significantly across all evaluation metrics. Extensive experiments and ablation studies are conducted to demonstrate the effectiveness and robustness of our method, showcasing its ability to effectively handle missing modalities.

Related papers

From Sparse Decisions to Dense Reasoning: A Multi-attribute Trajectory Paradigm for Multimodal Moderation [59.27094165576015]
We propose a novel learning paradigm (UniMod) that transitions from sparse decision-making to dense reasoning traces.<n>By constructing structured trajectories encompassing evidence grounding, modality assessment, risk mapping, policy decision, and response generation, we reformulate monolithic decision tasks into a multi-dimensional boundary learning process.<n>We introduce specialized optimization strategies to decouple task-specific parameters and rebalance training dynamics, effectively resolving interference between diverse objectives in multi-task learning.
arXiv Detail & Related papers (2026-01-28T09:29:40Z)
Quantifying Cross-Modality Memorization in Vision-Language Models [86.82366725590508]
We study the unique characteristics of cross-modality memorization and conduct a systematic study centered on vision-language models.<n>Our results reveal that facts learned in one modality transfer to the other, but a significant gap exists between recalling information in the source and target modalities.
arXiv Detail & Related papers (2025-06-05T16:10:47Z)
Decoupled Multimodal Prototypes for Visual Recognition with Missing Modalities [3.88369051454137]
Multimodal learning enhances deep learning models by enabling them to perceive and understand information from multiple data modalities.<n>Most existing approaches assume the availability of all modalities, an assumption that often fails in real-world applications.<n>Recent works have introduced learnable missing-case-aware prompts to mitigate performance degradation caused by missing modalities.<n>We propose a novel decoupled prototype-based output head, which leverages missing-case-aware class-wise prototypes tailored for each individual modality.
arXiv Detail & Related papers (2025-05-13T06:53:37Z)
Efficient Prompting for Continual Adaptation to Missing Modalities [7.782217188939437]
We formulate the dynamic missing modality problem as a continual learning task. We introduce three types of prompts: modality-specific, task-aware, and task-specific prompts. These prompts enable the model to learn intra-modality, inter-modality, intra-task, and inter-task features.
arXiv Detail & Related papers (2025-03-01T15:09:37Z)
EPE-P: Evidence-based Parameter-efficient Prompting for Multimodal Learning with Missing Modalities [20.991711160707755]
Missing modalities are a common challenge in real-world multimodal learning scenarios, occurring during both training and testing. Existing methods for managing missing modalities often require the design of separate prompts for each modality or missing case. We propose Evidence-based. Efficient Prompting (EPE-P), a novel and parameter-efficient method for pretrained multimodal networks.
arXiv Detail & Related papers (2024-12-23T16:01:12Z)
Leveraging Retrieval Augment Approach for Multimodal Emotion Recognition Under Missing Modalities [16.77191718894291]
We propose a novel framework of Retrieval Augment for Missing Modality Multimodal Emotion Recognition (RAMER) Our framework is superior to existing state-of-the-art approaches in missing modality MER tasks.
arXiv Detail & Related papers (2024-09-19T02:31:12Z)
MuAP: Multi-step Adaptive Prompt Learning for Vision-Language Model with Missing Modality [11.03329286331929]
We present the first comprehensive investigation into prompt learning behavior when modalities are incomplete. We propose a novel Multi-step Adaptive Prompt Learning framework, aiming to generate multimodal prompts and perform multi-step prompt tuning.
arXiv Detail & Related papers (2024-09-07T03:33:46Z)
Modality Invariant Multimodal Learning to Handle Missing Modalities: A Single-Branch Approach [29.428067329993173]
We propose a modality invariant multimodal learning method, which is less susceptible to the impact of missing modalities. It consists of a single-branch network sharing weights across multiple modalities to learn inter-modality representations to maximize performance. Our proposed method achieves superior performance when all modalities are present as well as in the case of missing modalities during training or testing compared to the existing state-of-the-art methods.
arXiv Detail & Related papers (2024-08-14T10:32:16Z)
Can Text-to-image Model Assist Multi-modal Learning for Visual Recognition with Visual Modality Missing? [37.73329106465031]
We propose a text-to-image framework GTI-MM to enhance the data efficiency and model robustness against missing visual modality. Our findings reveal that synthetic images benefit training data efficiency with visual data missing in training and improve model robustness with visual data missing involving training and testing.
arXiv Detail & Related papers (2024-02-14T09:21:00Z)
Exploring Missing Modality in Multimodal Egocentric Datasets [89.76463983679058]
We introduce a novel concept -Missing Modality Token (MMT)-to maintain performance even when modalities are absent. Our method mitigates the performance loss, reducing it from its original $sim 30%$ drop to only $sim 10%$ when half of the test set is modal-incomplete.
arXiv Detail & Related papers (2024-01-21T11:55:42Z)
Unified Multi-modal Unsupervised Representation Learning for Skeleton-based Action Understanding [62.70450216120704]
Unsupervised pre-training has shown great success in skeleton-based action understanding. We propose a Unified Multimodal Unsupervised Representation Learning framework, called UmURL. UmURL exploits an efficient early-fusion strategy to jointly encode the multi-modal features in a single-stream manner.
arXiv Detail & Related papers (2023-11-06T13:56:57Z)
Mastering Robot Manipulation with Multimodal Prompts through Pretraining and Multi-task Fine-tuning [49.92517970237088]
We tackle the problem of training a robot to understand multimodal prompts. This type of task poses a major challenge to robots' capability to understand the interconnection and complementarity between vision and language signals. We introduce an effective framework that learns a policy to perform robot manipulation with multimodal prompts.
arXiv Detail & Related papers (2023-10-14T22:24:58Z)
Learning Unseen Modality Interaction [54.23533023883659]
Multimodal learning assumes all modality combinations of interest are available during training to learn cross-modal correspondences. We pose the problem of unseen modality interaction and introduce a first solution. It exploits a module that projects the multidimensional features of different modalities into a common space with rich information preserved.
arXiv Detail & Related papers (2023-06-22T10:53:10Z)
Multimodal Prompting with Missing Modalities for Visual Recognition [40.961534960897595]
We tackle two challenges in multimodal learning for visual recognition: 1) when missing-modality occurs during training or testing in real-world situations; and 2) when computation resources are not available to finetune on heavy transformer models. Specifically, our modality-missing-aware prompts can be plugged into multimodal transformers to handle general missing-modality cases, while only requiring less than 1% learnable parameters compared to training the entire model.
arXiv Detail & Related papers (2023-03-06T18:54:46Z)
MEmoBERT: Pre-training Model with Prompt-based Learning for Multimodal Emotion Recognition [118.73025093045652]
We propose a pre-training model textbfMEmoBERT for multimodal emotion recognition. Unlike the conventional "pre-train, finetune" paradigm, we propose a prompt-based method that reformulates the downstream emotion classification task as a masked text prediction. Our proposed MEmoBERT significantly enhances emotion recognition performance.
arXiv Detail & Related papers (2021-10-27T09:57:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.