Conditional Prompt Tuning for Multimodal Fusion
- URL: http://arxiv.org/abs/2312.03734v1
- Date: Tue, 28 Nov 2023 11:05:20 GMT
- Title: Conditional Prompt Tuning for Multimodal Fusion
- Authors: Ruixiang Jiang, Lingbo Liu, Changwen Chen
- Abstract summary: We show that the representation of one modality can effectively guide the prompting of another modality for parameter-efficient multimodal fusion.
This is achieved by disentangling the vanilla prompt vectors into three types of specialized prompts that adaptively capture global-level and instance-level features.
Our method can effectively transfer the pretrained knowledge in unimodal encoders for downstream multimodal tasks.
- Score: 33.11221356852871
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We show that the representation of one modality can effectively guide the
prompting of another modality for parameter-efficient multimodal fusion.
Specifically, we first encode one modality and use its representation as a
prior to conditionally prompt all frozen layers of the other modality. This is
achieved by disentangling the vanilla prompt vectors into three types of
specialized prompts that adaptively capture global-level and instance-level
features. To better produce the instance-wise prompt, we introduce the mixture
of prompt experts (MoPE) to dynamically route each instance to the most
suitable prompt experts for encoding. We further study a regularization term to
avoid degenerated prompt expert routing. Thanks to our design, our method can
effectively transfer the pretrained knowledge in unimodal encoders for
downstream multimodal tasks. Compared with vanilla prompting, we show that our
MoPE-based conditional prompting is more expressive, thereby scales better with
training data and the total number of prompts. We also demonstrate that our
prompt tuning is architecture-agnostic, thereby offering high modularity.
Extensive experiments over three multimodal datasets demonstrate
state-of-the-art results, matching or surpassing the performance achieved
through fine-tuning, while only necessitating 0.7% of the trainable parameters.
Code will be released: https://github.com/songrise/ConditionalPrompt.
Related papers
- MuAP: Multi-step Adaptive Prompt Learning for Vision-Language Model with Missing Modality [11.03329286331929]
We present the first comprehensive investigation into prompt learning behavior when modalities are incomplete.
We propose a novel Multi-step Adaptive Prompt Learning framework, aiming to generate multimodal prompts and perform multi-step prompt tuning.
arXiv Detail & Related papers (2024-09-07T03:33:46Z) - MoPE: Parameter-Efficient and Scalable Multimodal Fusion via Mixture of Prompt Experts [29.46189153751869]
We introduce the mixture of prompt experts (MoPE) technique to enhance the expressiveness of prompt tuning.
Our method achieves state-of-the-art results for prompt fusion, matching or even surpassing the performance of fine-tuning.
arXiv Detail & Related papers (2024-03-14T17:47:10Z) - DialCLIP: Empowering CLIP as Multi-Modal Dialog Retriever [83.33209603041013]
We propose a parameter-efficient prompt-tuning method named DialCLIP for multi-modal dialog retrieval.
Our approach introduces a multi-modal context generator to learn context features which are distilled into prompts within the pre-trained vision-language model CLIP.
To facilitate various types of retrieval, we also design multiple experts to learn mappings from CLIP outputs to multi-modal representation space.
arXiv Detail & Related papers (2024-01-02T07:40:12Z) - COMMA: Co-Articulated Multi-Modal Learning [39.778958624066185]
We propose Co-Articulated Multi-Modal Learning (COMMA) to handle the limitations of previous methods.
Our method considers prompts from both branches to generate the prompts to enhance the representation alignment of both branches.
We evaluate our method across three representative tasks of generalization to novel classes, new target datasets and unseen domain shifts.
arXiv Detail & Related papers (2023-12-30T15:47:36Z) - Bi-directional Adapter for Multi-modal Tracking [67.01179868400229]
We propose a novel multi-modal visual prompt tracking model based on a universal bi-directional adapter.
We develop a simple but effective light feature adapter to transfer modality-specific information from one modality to another.
Our model achieves superior tracking performance in comparison with both the full fine-tuning methods and the prompt learning-based methods.
arXiv Detail & Related papers (2023-12-17T05:27:31Z) - Self-regulating Prompts: Foundational Model Adaptation without
Forgetting [112.66832145320434]
We introduce a self-regularization framework for prompting called PromptSRC.
PromptSRC guides the prompts to optimize for both task-specific and task-agnostic general representations.
arXiv Detail & Related papers (2023-07-13T17:59:35Z) - Diversity-Aware Meta Visual Prompting [111.75306320834629]
We present Diversity-Aware Meta Visual Prompting(DAM-VP), an efficient prompting method for transferring pre-trained models to downstream tasks with frozen backbone.
We cluster the downstream dataset into small subsets in a diversity-strapped way, with each subset has its own prompt separately.
All the prompts are optimized with a meta-prompt, which is learned across several datasets.
arXiv Detail & Related papers (2023-03-14T17:59:59Z) - Multitask Prompt Tuning Enables Parameter-Efficient Transfer Learning [43.639430661322585]
We propose multitask prompt tuning (MPT)
MPT learns a single transferable prompt by distilling knowledge from multiple task-specific source prompts.
We then learn multiplicative low rank updates to this shared prompt to efficiently adapt it to each downstream target task.
arXiv Detail & Related papers (2023-03-06T03:25:59Z) - Prompt-Matched Semantic Segmentation [96.99924127527002]
The objective of this work is to explore how to effectively adapt pre-trained foundation models to various downstream tasks of image semantic segmentation.
We propose a novel Inter-Stage Prompt-Matched Framework, which maintains the original structure of the foundation model while generating visual prompts adaptively for task-oriented tuning.
A lightweight module termed Semantic-aware Prompt Matcher is then introduced to hierarchically interpolate between two stages to learn reasonable prompts for each specific task.
arXiv Detail & Related papers (2022-08-22T09:12:53Z) - IDPG: An Instance-Dependent Prompt Generation Method [58.45110542003139]
Prompt tuning is a new, efficient NLP transfer learning paradigm that adds a task-specific prompt in each input instance during the model training stage.
We propose a conditional prompt generation method to generate prompts for each input instance.
arXiv Detail & Related papers (2022-04-09T15:45:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.