Multimodal Prompt Learning with Missing Modalities for Sentiment Analysis and Emotion Recognition
- URL: http://arxiv.org/abs/2407.05374v1
- Date: Sun, 7 Jul 2024 13:55:56 GMT
- Title: Multimodal Prompt Learning with Missing Modalities for Sentiment Analysis and Emotion Recognition
- Authors: Zirun Guo, Tao Jin, Zhou Zhao,
- Abstract summary: We propose a novel multimodal Transformer framework using prompt learning to address the issue of missing modalities.
Our method introduces three types of prompts: generative prompts, missing-signal prompts, and missing-type prompts.
Through prompt learning, we achieve a substantial reduction in the number of trainable parameters.
- Score: 52.522244807811894
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The development of multimodal models has significantly advanced multimodal sentiment analysis and emotion recognition. However, in real-world applications, the presence of various missing modality cases often leads to a degradation in the model's performance. In this work, we propose a novel multimodal Transformer framework using prompt learning to address the issue of missing modalities. Our method introduces three types of prompts: generative prompts, missing-signal prompts, and missing-type prompts. These prompts enable the generation of missing modality features and facilitate the learning of intra- and inter-modality information. Through prompt learning, we achieve a substantial reduction in the number of trainable parameters. Our proposed method outperforms other methods significantly across all evaluation metrics. Extensive experiments and ablation studies are conducted to demonstrate the effectiveness and robustness of our method, showcasing its ability to effectively handle missing modalities.
Related papers
- Leveraging Retrieval Augment Approach for Multimodal Emotion Recognition Under Missing Modalities [16.77191718894291]
We propose a novel framework of Retrieval Augment for Missing Modality Multimodal Emotion Recognition (RAMER)
Our framework is superior to existing state-of-the-art approaches in missing modality MER tasks.
arXiv Detail & Related papers (2024-09-19T02:31:12Z) - MuAP: Multi-step Adaptive Prompt Learning for Vision-Language Model with Missing Modality [11.03329286331929]
We present the first comprehensive investigation into prompt learning behavior when modalities are incomplete.
We propose a novel Multi-step Adaptive Prompt Learning framework, aiming to generate multimodal prompts and perform multi-step prompt tuning.
arXiv Detail & Related papers (2024-09-07T03:33:46Z) - Modality Invariant Multimodal Learning to Handle Missing Modalities: A Single-Branch Approach [29.428067329993173]
We propose a modality invariant multimodal learning method, which is less susceptible to the impact of missing modalities.
It consists of a single-branch network sharing weights across multiple modalities to learn inter-modality representations to maximize performance.
Our proposed method achieves superior performance when all modalities are present as well as in the case of missing modalities during training or testing compared to the existing state-of-the-art methods.
arXiv Detail & Related papers (2024-08-14T10:32:16Z) - Can Text-to-image Model Assist Multi-modal Learning for Visual
Recognition with Visual Modality Missing? [37.73329106465031]
We propose a text-to-image framework GTI-MM to enhance the data efficiency and model robustness against missing visual modality.
Our findings reveal that synthetic images benefit training data efficiency with visual data missing in training and improve model robustness with visual data missing involving training and testing.
arXiv Detail & Related papers (2024-02-14T09:21:00Z) - Exploring Missing Modality in Multimodal Egocentric Datasets [89.76463983679058]
We introduce a novel concept -Missing Modality Token (MMT)-to maintain performance even when modalities are absent.
Our method mitigates the performance loss, reducing it from its original $sim 30%$ drop to only $sim 10%$ when half of the test set is modal-incomplete.
arXiv Detail & Related papers (2024-01-21T11:55:42Z) - Unified Multi-modal Unsupervised Representation Learning for
Skeleton-based Action Understanding [62.70450216120704]
Unsupervised pre-training has shown great success in skeleton-based action understanding.
We propose a Unified Multimodal Unsupervised Representation Learning framework, called UmURL.
UmURL exploits an efficient early-fusion strategy to jointly encode the multi-modal features in a single-stream manner.
arXiv Detail & Related papers (2023-11-06T13:56:57Z) - Mastering Robot Manipulation with Multimodal Prompts through Pretraining and Multi-task Fine-tuning [49.92517970237088]
We tackle the problem of training a robot to understand multimodal prompts.
This type of task poses a major challenge to robots' capability to understand the interconnection and complementarity between vision and language signals.
We introduce an effective framework that learns a policy to perform robot manipulation with multimodal prompts.
arXiv Detail & Related papers (2023-10-14T22:24:58Z) - Learning Unseen Modality Interaction [54.23533023883659]
Multimodal learning assumes all modality combinations of interest are available during training to learn cross-modal correspondences.
We pose the problem of unseen modality interaction and introduce a first solution.
It exploits a module that projects the multidimensional features of different modalities into a common space with rich information preserved.
arXiv Detail & Related papers (2023-06-22T10:53:10Z) - Multimodal Prompting with Missing Modalities for Visual Recognition [40.961534960897595]
We tackle two challenges in multimodal learning for visual recognition: 1) when missing-modality occurs during training or testing in real-world situations; and 2) when computation resources are not available to finetune on heavy transformer models.
Specifically, our modality-missing-aware prompts can be plugged into multimodal transformers to handle general missing-modality cases, while only requiring less than 1% learnable parameters compared to training the entire model.
arXiv Detail & Related papers (2023-03-06T18:54:46Z) - MEmoBERT: Pre-training Model with Prompt-based Learning for Multimodal
Emotion Recognition [118.73025093045652]
We propose a pre-training model textbfMEmoBERT for multimodal emotion recognition.
Unlike the conventional "pre-train, finetune" paradigm, we propose a prompt-based method that reformulates the downstream emotion classification task as a masked text prediction.
Our proposed MEmoBERT significantly enhances emotion recognition performance.
arXiv Detail & Related papers (2021-10-27T09:57:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.