PANDA: Prompt Transfer Meets Knowledge Distillation for Efficient Model Adaptation
- URL: http://arxiv.org/abs/2208.10160v2
- Date: Tue, 2 Apr 2024 07:00:39 GMT
- Title: PANDA: Prompt Transfer Meets Knowledge Distillation for Efficient Model Adaptation
- Authors: Qihuang Zhong, Liang Ding, Juhua Liu, Bo Du, Dacheng Tao,
- Abstract summary: We propose a new metric to accurately predict the prompt transferability (regarding (i)), and a novel PoT approach (namely PANDA)
Our proposed metric works well to predict the prompt transferability; 2) our PANDA consistently outperforms the vanilla PoT approach by 2.3% average score (up to 24.1%) among all tasks and model sizes; 3) with our PANDA approach, prompt-tuning can achieve competitive and even better performance than model-tuning in various PLM scales scenarios.
- Score: 89.0074567748505
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Prompt Transfer (PoT) is a recently-proposed approach to improve prompt-tuning, by initializing the target prompt with the existing prompt trained on similar source tasks. However, such a vanilla PoT approach usually achieves sub-optimal performance, as (i) the PoT is sensitive to the similarity of source-target pair and (ii) directly fine-tuning the prompt initialized with source prompt on target task might lead to forgetting of the useful general knowledge learned from source task. To tackle these issues, we propose a new metric to accurately predict the prompt transferability (regarding (i)), and a novel PoT approach (namely PANDA) that leverages the knowledge distillation technique to alleviate the knowledge forgetting effectively (regarding (ii)). Extensive and systematic experiments on 189 combinations of 21 source and 9 target datasets across 5 scales of PLMs demonstrate that: 1) our proposed metric works well to predict the prompt transferability; 2) our PANDA consistently outperforms the vanilla PoT approach by 2.3% average score (up to 24.1%) among all tasks and model sizes; 3) with our PANDA approach, prompt-tuning can achieve competitive and even better performance than model-tuning in various PLM scales scenarios. We have publicly released our code in https://github.com/WHU-ZQH/PANDA.
Related papers
- Bayesian Multi-Task Transfer Learning for Soft Prompt Tuning [44.43258626098661]
We argue that when we extract knowledge from source tasks via training source prompts, we need to consider this correlation among source tasks for better transfer to target tasks.
We propose a Bayesian approach where we work with the posterior distribution of prompts across source tasks.
We show extensive experimental results on the standard benchmark NLP tasks, where our Bayesian multi-task transfer learning approach outperforms the state-of-the-art methods in many settings.
arXiv Detail & Related papers (2024-02-13T16:57:02Z) - Revisiting the Power of Prompt for Visual Tuning [50.11465784194896]
This study explores the correlation evolvement between prompts and patch tokens during proficient training.
Inspired by the observation that the prompt tokens tend to share high mutual information with patch tokens, we propose initializing prompts with downstream token prototypes.
Our method significantly advances the adaptation for self-supervised pretraining, achieving impressive task performance gains of at least 10% to 30%.
arXiv Detail & Related papers (2024-02-04T07:49:02Z) - Diverse Data Augmentation with Diffusions for Effective Test-time Prompt
Tuning [73.75282761503581]
We propose DiffTPT, which leverages pre-trained diffusion models to generate diverse and informative new data.
Our experiments on test datasets with distribution shifts and unseen categories demonstrate that DiffTPT improves the zero-shot accuracy by an average of 5.13%.
arXiv Detail & Related papers (2023-08-11T09:36:31Z) - Not All Features Matter: Enhancing Few-shot CLIP with Adaptive Prior
Refinement [24.108008515395458]
We propose APE, an Adaptive Prior rEfinement method for CLIP's pre-trained knowledge, which achieves superior accuracy with high computational efficiency.
For the average accuracy over 11 benchmarks, both APE and APE-T attain state-of-the-art and respectively outperform the second-best by +1.59% and +1.99% under 16 shots with x30 less learnable parameters.
arXiv Detail & Related papers (2023-04-03T17:58:54Z) - Multitask Prompt Tuning Enables Parameter-Efficient Transfer Learning [43.639430661322585]
We propose multitask prompt tuning (MPT)
MPT learns a single transferable prompt by distilling knowledge from multiple task-specific source prompts.
We then learn multiplicative low rank updates to this shared prompt to efficiently adapt it to each downstream target task.
arXiv Detail & Related papers (2023-03-06T03:25:59Z) - Test-Time Prompt Tuning for Zero-Shot Generalization in Vision-Language
Models [107.05966685291067]
We propose test-time prompt tuning (TPT) to learn adaptive prompts on the fly with a single test sample.
TPT improves the zero-shot top-1 accuracy of CLIP by 3.6% on average.
In evaluating cross-dataset generalization with unseen categories, TPT performs on par with the state-of-the-art approaches that use additional training data.
arXiv Detail & Related papers (2022-09-15T17:55:11Z) - Instance-wise Prompt Tuning for Pretrained Language Models [72.74916121511662]
Instance-wise Prompt Tuning (IPT) is the first prompt learning paradigm that injects knowledge from the input data instances to the prompts.
IPT significantly outperforms task-based prompt learning methods, and achieves comparable performance to conventional finetuning with only 0.5% - 1.5% of tuned parameters.
arXiv Detail & Related papers (2022-06-04T10:08:50Z) - DP-KB: Data Programming with Knowledge Bases Improves Transformer Fine
Tuning for Answer Sentence Selection [96.84143731242119]
transformers demonstrate impressive performance on many knowledge intensive (KI) tasks.
However, their ability to serve as implicit knowledge bases (KBs) remains limited.
We implement an efficient, data-programming technique that enriches training data with KB-derived context.
arXiv Detail & Related papers (2022-03-17T20:23:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.