Don't Stop Pretraining? Make Prompt-based Fine-tuning Powerful Learner
- URL: http://arxiv.org/abs/2305.01711v4
- Date: Fri, 6 Oct 2023 17:47:07 GMT
- Title: Don't Stop Pretraining? Make Prompt-based Fine-tuning Powerful Learner
- Authors: Zhengxiang Shi, Aldo Lipani
- Abstract summary: We re-visit the notion in NLP that continued pre-training improves the performance of fine-tuning (FT) in downstream tasks.
We propose Prompt-based Continued Pre-training (PCP), which combines the idea of instruction tuning with conventional continued pre-training.
Our empirical evaluations on 21 benchmarks demonstrate that the PCP consistently improves the performance of state-of-the-art prompt-based FT approaches.
- Score: 14.975436239088312
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Language models (LMs) trained on vast quantities of unlabelled data have
greatly advanced the field of natural language processing (NLP). In this study,
we re-visit the widely accepted notion in NLP that continued pre-training LMs
on task-related texts improves the performance of fine-tuning (FT) in
downstream tasks. Through experiments on eight single-sentence tasks and eight
sentence-pair tasks in both semi-supervised and fully-supervised settings, we
find that conventional continued pre-training does not consistently provide
benefits and can even be detrimental for sentence-pair tasks or when
prompt-based FT is used. To tackle these issues, we propose Prompt-based
Continued Pre-training (PCP), which combines the idea of instruction tuning
with conventional continued pre-training. Our approach aims to improve the
performance of prompt-based FT by presenting both task-related texts and prompt
templates to LMs through unsupervised pre-training objectives before
fine-tuning for the target task. Our empirical evaluations on 21 benchmarks
demonstrate that the PCP consistently improves the performance of
state-of-the-art prompt-based FT approaches (up to 20.1% absolute) in both
semi-supervised and fully-supervised settings, even with only hundreds of
unlabelled examples. Additionally, prompt-based FT with the PCP outperforms
state-of-the-art semi-supervised approaches with greater simplicity,
eliminating the need for an iterative process and extra data augmentation. Our
further analysis explores the performance lower bound of the PCP and reveals
that the advantages of PCP persist across different sizes of models and
datasets.
Related papers
- Aligning Instruction Tuning with Pre-training [81.4748965653345]
We propose Aligning Instruction Tuning with Pre-training (AITP) to align instruction tuning with pre-training distributions.
We show consistent performance improvements with AITP on three fully open large language models (LLMs) across eight benchmarks.
arXiv Detail & Related papers (2025-01-16T08:27:40Z) - TapWeight: Reweighting Pretraining Objectives for Task-Adaptive Pretraining [34.93043212352875]
TapWeight is a task-adaptive pretraining framework which automatically determines the optimal importance of each pretraining objective.
We applied TapWeight to both molecular property prediction and natural language understanding tasks, significantly surpassing baseline methods.
arXiv Detail & Related papers (2024-10-13T20:56:13Z) - Denoising Pre-Training and Customized Prompt Learning for Efficient Multi-Behavior Sequential Recommendation [69.60321475454843]
We propose DPCPL, the first pre-training and prompt-tuning paradigm tailored for Multi-Behavior Sequential Recommendation.
In the pre-training stage, we propose a novel Efficient Behavior Miner (EBM) to filter out the noise at multiple time scales.
Subsequently, we propose to tune the pre-trained model in a highly efficient manner with the proposed Customized Prompt Learning (CPL) module.
arXiv Detail & Related papers (2024-08-21T06:48:38Z) - Revisiting the Power of Prompt for Visual Tuning [50.11465784194896]
This study explores the correlation evolvement between prompts and patch tokens during proficient training.
Inspired by the observation that the prompt tokens tend to share high mutual information with patch tokens, we propose initializing prompts with downstream token prototypes.
Our method significantly advances the adaptation for self-supervised pretraining, achieving impressive task performance gains of at least 10% to 30%.
arXiv Detail & Related papers (2024-02-04T07:49:02Z) - Hierarchical Decomposition of Prompt-Based Continual Learning:
Rethinking Obscured Sub-optimality [55.88910947643436]
Self-supervised pre-training is essential for handling vast quantities of unlabeled data in practice.
HiDe-Prompt is an innovative approach that explicitly optimize the hierarchical components with an ensemble of task-specific prompts and statistics.
Our experiments demonstrate the superior performance of HiDe-Prompt and its robustness to pre-training paradigms in continual learning.
arXiv Detail & Related papers (2023-10-11T06:51:46Z) - Approximated Prompt Tuning for Vision-Language Pre-trained Models [54.326232586461614]
In vision-language pre-trained models, prompt tuning often requires a large number of learnable tokens to bridge the gap between the pre-training and downstream tasks.
We propose a novel Approximated Prompt Tuning (APT) approach towards efficient VL transfer learning.
arXiv Detail & Related papers (2023-06-27T05:43:47Z) - AdaPrompt: Adaptive Model Training for Prompt-based NLP [77.12071707955889]
We propose AdaPrompt, adaptively retrieving external data for continual pretraining of PLMs.
Experimental results on five NLP benchmarks show that AdaPrompt can improve over standard PLMs in few-shot settings.
In zero-shot settings, our method outperforms standard prompt-based methods by up to 26.35% relative error reduction.
arXiv Detail & Related papers (2022-02-10T04:04:57Z) - On Losses for Modern Language Models [18.56205816291398]
We show that NSP is detrimental to training due to its context splitting and shallow semantic signal.
Using multiple tasks in a multi-task pre-training framework provides better results than using any single auxiliary task.
arXiv Detail & Related papers (2020-10-04T21:44:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.