Prompt Tuning for Generative Multimodal Pretrained Models
- URL: http://arxiv.org/abs/2208.02532v1
- Date: Thu, 4 Aug 2022 08:56:38 GMT
- Title: Prompt Tuning for Generative Multimodal Pretrained Models
- Authors: Hao Yang, Junyang Lin, An Yang, Peng Wang, Chang Zhou, Hongxia Yang
- Abstract summary: We implement prompt tuning on the unified sequence-to-sequence pretrained model adaptive to both understanding and generation tasks.
Experimental results demonstrate that the light-weight prompt tuning can achieve comparable performance with finetuning.
In comparison with finetuned models, the prompt-tuned models demonstrate improved robustness against adversarial attacks.
- Score: 75.44457974275154
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Prompt tuning has become a new paradigm for model tuning and it has
demonstrated success in natural language pretraining and even vision
pretraining. In this work, we explore the transfer of prompt tuning to
multimodal pretraining, with a focus on generative multimodal pretrained
models, instead of contrastive ones. Specifically, we implement prompt tuning
on the unified sequence-to-sequence pretrained model adaptive to both
understanding and generation tasks. Experimental results demonstrate that the
light-weight prompt tuning can achieve comparable performance with finetuning
and surpass other light-weight tuning methods. Besides, in comparison with
finetuned models, the prompt-tuned models demonstrate improved robustness
against adversarial attacks. We further figure out that experimental factors,
including the prompt length, prompt depth, and reparameteratization, have great
impacts on the model performance, and thus we empirically provide a
recommendation for the setups of prompt tuning. Despite the observed
advantages, we still find some limitations in prompt tuning, and we
correspondingly point out the directions for future studies. Codes are
available at \url{https://github.com/OFA-Sys/OFA}
Related papers
- Prompt Tuning with Diffusion for Few-Shot Pre-trained Policy Generalization [55.14484317645865]
We develop a conditional diffusion model to produce exceptional quality prompts for offline reinforcement learning tasks.
We show that the Prompt diffuser is a robust and effective tool for the prompt-tuning process, demonstrating strong performance in the meta-RL tasks.
arXiv Detail & Related papers (2024-11-02T07:38:02Z) - Hard Prompts Made Interpretable: Sparse Entropy Regularization for Prompt Tuning with RL [29.01858866450715]
We present RLPrompt, which aims to find optimal prompt tokens leveraging soft Q-learning.
While the results show promise, we have observed that the prompts frequently appear unnatural, which impedes their interpretability.
We address this limitation by using sparse Tsallis entropy regularization, a principled approach to filtering out unlikely tokens from consideration.
arXiv Detail & Related papers (2024-07-20T03:10:19Z) - Semantically-Shifted Incremental Adapter-Tuning is A Continual ViTransformer [44.10678347943115]
Class-incremental learning (CIL) aims to enable models to continuously learn new classes while overcoming catastrophic forgetting.
In this paper, we revisit different parameter-efficient tuning (PET) methods within the context of continual learning.
We observe that adapter tuning demonstrates superiority over prompt-based methods, even without parameter expansion in each learning session.
arXiv Detail & Related papers (2024-03-29T05:23:12Z) - Approximated Prompt Tuning for Vision-Language Pre-trained Models [54.326232586461614]
In vision-language pre-trained models, prompt tuning often requires a large number of learnable tokens to bridge the gap between the pre-training and downstream tasks.
We propose a novel Approximated Prompt Tuning (APT) approach towards efficient VL transfer learning.
arXiv Detail & Related papers (2023-06-27T05:43:47Z) - MuDPT: Multi-modal Deep-symphysis Prompt Tuning for Large Pre-trained Vision-Language Models [12.397136690734865]
We propose a novel approach called Multi-modal Deep-symphysis Prompt Tuning, dubbed as MuDPT.
MuDPT extends independent multi-modal prompt tuning by learning a model-agnostic transformative network to allow deep hierarchical bi-directional prompt fusion.
Compared with the state-of-the-art methods, MuDPT achieves better recognition and generalization ability with an apparent margin.
arXiv Detail & Related papers (2023-06-20T09:15:52Z) - Visual Tuning [143.43997336384126]
Fine-tuning visual models has been widely shown promising performance on many downstream visual tasks.
Recent advances can achieve superior performance than full-tuning the whole pre-trained parameters.
This survey characterizes a large and thoughtful selection of recent works, providing a systematic and comprehensive overview of work and models.
arXiv Detail & Related papers (2023-05-10T11:26:36Z) - Unified Vision and Language Prompt Learning [86.1530128487077]
We present a systematic study on two representative prompt tuning methods, namely text prompt tuning and visual prompt tuning.
A major finding is that text prompt tuning fails on data with high intra-class visual variances while visual prompt tuning cannot handle low inter-class variances.
To combine the best from both worlds, we propose a simple approach called Unified Prompt Tuning (UPT), which essentially learns a tiny neural network to jointly optimize prompts across different modalities.
arXiv Detail & Related papers (2022-10-13T17:50:24Z) - XPrompt: Exploring the Extreme of Prompt Tuning [31.242680485717447]
We propose a novel Prompt tuning model with an eXtremely small scale (XPrompt) under the regime of lottery tickets hypothesis.
XPrompt eliminates the negative prompt tokens at different levels through a hierarchical structured pruning, yielding a more parameter-efficient prompt yet with a competitive performance.
arXiv Detail & Related papers (2022-10-10T06:57:19Z) - Pro-tuning: Unified Prompt Tuning for Vision Tasks [133.12978197265596]
Fine-tuning is the de-facto approach to leverage pre-trained vision models to perform downstream tasks.
In this work, we propose parameter-efficient Prompt tuning (Pro-tuning) to adapt frozen vision models to various downstream vision tasks.
arXiv Detail & Related papers (2022-07-28T21:09:31Z) - The Power of Scale for Parameter-Efficient Prompt Tuning [4.481348281462904]
"prompt tuning" is a simple mechanism for learning "soft prompts" to condition frozen language models to perform specific downstream tasks.
Our end-to-end learned approach outperforms GPT-3's "few-shot" learning by a large margin.
arXiv Detail & Related papers (2021-04-18T03:19:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.