Towards Robust Multimodal Prompting With Missing Modalities
- URL: http://arxiv.org/abs/2312.15890v2
- Date: Wed, 27 Dec 2023 03:41:58 GMT
- Title: Towards Robust Multimodal Prompting With Missing Modalities
- Authors: Jaehyuk Jang, Yooseung Wang, Changick Kim
- Abstract summary: multimodal prompting introduces learnable missing-aware prompts for all missing modality cases.
It lacks robustness in scenarios with different missing modality settings between training and inference.
We propose a simple yet effective prompt design to address these challenges.
- Score: 22.176372579439356
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recently, multimodal prompting, which introduces learnable missing-aware
prompts for all missing modality cases, has exhibited impressive performance.
However, it encounters two critical issues: 1) The number of prompts grows
exponentially as the number of modalities increases; and 2) It lacks robustness
in scenarios with different missing modality settings between training and
inference. In this paper, we propose a simple yet effective prompt design to
address these challenges. Instead of using missing-aware prompts, we utilize
prompts as modality-specific tokens, enabling them to capture the unique
characteristics of each modality. Furthermore, our prompt design leverages
orthogonality between prompts as a key element to learn distinct information
across different modalities and promote diversity in the learned
representations. Extensive experiments demonstrate that our prompt design
enhances both performance and robustness while reducing the number of prompts.
Related papers
- Multimodal Prompt Learning with Missing Modalities for Sentiment Analysis and Emotion Recognition [52.522244807811894]
We propose a novel multimodal Transformer framework using prompt learning to address the issue of missing modalities.
Our method introduces three types of prompts: generative prompts, missing-signal prompts, and missing-type prompts.
Through prompt learning, we achieve a substantial reduction in the number of trainable parameters.
arXiv Detail & Related papers (2024-07-07T13:55:56Z) - A Preliminary Empirical Study on Prompt-based Unsupervised Keyphrase Extraction [30.624421412309786]
We study the effectiveness of different prompts on the keyphrase extraction task to verify the impact of cherry-picked prompts on the performance of extracting keyphrases.
Design complex prompts achieve better performance than designing simple prompts when facing long documents.
arXiv Detail & Related papers (2024-05-26T13:37:57Z) - Tuning Multi-mode Token-level Prompt Alignment across Modalities [48.39511580746271]
We propose a multi-mode token-level tuning framework to learn and align a set of prompt tokens across modalities.
Specifically, we rely on two essential factors: 1) multi-mode prompts discovery, which guarantees diverse semantic representations, and 2) token-level alignment, which helps explore fine-grained similarity.
Experiments on popular image recognition benchmarks show the superior generalization and few-shot abilities of our approach.
arXiv Detail & Related papers (2023-09-25T03:20:09Z) - InfoPrompt: Information-Theoretic Soft Prompt Tuning for Natural
Language Understanding [51.48361798508375]
We develop an information-theoretic framework that formulates soft prompt tuning as maximizing mutual information between prompts and other model parameters.
We show that InfoPrompt can significantly accelerate the convergence of the prompt tuning and outperform traditional prompt tuning methods.
arXiv Detail & Related papers (2023-06-08T04:31:48Z) - Multi-Prompt with Depth Partitioned Cross-Modal Learning [25.239388488952375]
Partitioned Multi-modal Prompt (PMPO) is a multi-modal prompting technique that extends the soft prompt from a single learnable prompt to multiple prompts.
Our method divides the visual encoder depths and connects learnable prompts to the separated visual depths, enabling different prompts to capture hierarchical contextual depths.
We evaluate the effectiveness of our approach on three challenging tasks: new class generalization, cross-dataset evaluation, and domain generalization.
arXiv Detail & Related papers (2023-05-10T14:54:29Z) - Multimodal Prompting with Missing Modalities for Visual Recognition [40.961534960897595]
We tackle two challenges in multimodal learning for visual recognition: 1) when missing-modality occurs during training or testing in real-world situations; and 2) when computation resources are not available to finetune on heavy transformer models.
Specifically, our modality-missing-aware prompts can be plugged into multimodal transformers to handle general missing-modality cases, while only requiring less than 1% learnable parameters compared to training the entire model.
arXiv Detail & Related papers (2023-03-06T18:54:46Z) - Demystifying Prompts in Language Models via Perplexity Estimation [100.43627541756524]
Performance of a prompt is coupled with the extent to which the model is familiar with the language it contains.
We show that the lower the perplexity of the prompt is, the better the prompt is able to perform the task.
arXiv Detail & Related papers (2022-12-08T02:21:47Z) - MetaPrompting: Learning to Learn Better Prompts [52.914694884515534]
We propose a new soft prompting method called MetaPrompting, which adopts the well-recognized model-agnostic meta-learning algorithm.
Extensive experiments show MetaPrompting brings significant improvement on four different datasets.
arXiv Detail & Related papers (2022-09-23T09:01:05Z) - Instance-aware Prompt Learning for Language Understanding and Generation [49.22899822734549]
We propose an instance-aware prompt learning method that learns a different prompt for each instance.
Our method achieves the state-of-the-art on the SuperGLUE few-shot learning benchmark.
arXiv Detail & Related papers (2022-01-18T17:03:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.