NoisyTune: A Little Noise Can Help You Finetune Pretrained Language
Models Better
- URL: http://arxiv.org/abs/2202.12024v1
- Date: Thu, 24 Feb 2022 11:08:02 GMT
- Title: NoisyTune: A Little Noise Can Help You Finetune Pretrained Language
Models Better
- Authors: Chuhan Wu, Fangzhao Wu, Tao Qi, Yongfeng Huang, Xing Xie
- Abstract summary: Finetuning pretrained language models (PLMs) is critical for their success in downstream tasks.
PLMs may have risks in overfitting pretraining signals, and there are gaps between downstream tasks and the pretraining tasks.
NoisyTune can help better finetune PLMs in downstream tasks by adding some noise to the parameters of PLMs before finetuning.
- Score: 98.5705258907774
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Effectively finetuning pretrained language models (PLMs) is critical for
their success in downstream tasks. However, PLMs may have risks in overfitting
pretraining signals, and there are some gaps between downstream tasks and the
pretraining tasks. It can be difficult for vanilla finetuning methods to
overcome the barrier between pretraining and downstream tasks, which leads to
suboptimal performance. In this paper, we propose a very simple yet effective
method named NoisyTune which can help better finetune PLMs in downstream tasks
by adding some noise to the parameters of PLMs before finetuning. More
specifically, we propose a matrix-wise perturbing method by adding different
uniform noises according to the standard deviations of different parameter
matrices, which can consider the varied characteristics of different types of
parameters in PLMs. Extensive experiments on the GLUE English benchmark and the
XTREME multilingual benchmark show that NoisyTune can consistently improve the
performance of different PLMs in many downstream tasks.
Related papers
- Towards Robust Low-Resource Fine-Tuning with Multi-View Compressed
Representations [51.75960511842552]
Fine-tuning of pretrained language models (PLMs) is prone to overfitting in the low resource scenarios.
We present a novel method that operates on the hidden representations of a PLM to reduce overfitting.
arXiv Detail & Related papers (2022-11-16T09:39:29Z) - ADEPT: A DEbiasing PrompT Framework [49.582497203415855]
Finetuning is an applicable approach for debiasing contextualized word embeddings.
discrete prompts with semantic meanings have shown to be effective in debiasing tasks.
We propose ADEPT, a method to debias PLMs using prompt tuning while maintaining the delicate balance between removing biases and ensuring representation ability.
arXiv Detail & Related papers (2022-11-10T08:41:40Z) - PATS: Sensitivity-aware Noisy Learning for Pretrained Language Models [29.140036130469042]
We present PATS (Perturbation According To Sensitivity), a noisy training mechanism which considers each parameter's importance in the downstream task.
Experiments conducted on different tasks of the GLUE benchmark show PATS can consistently empower the fine-tuning of different sizes of PLMs.
arXiv Detail & Related papers (2022-10-22T10:05:14Z) - Pruning Pre-trained Language Models Without Fine-Tuning [42.54071630668426]
We argue fine-tuning is redundant for first-order pruning, since first-order pruning is sufficient to converge PLMs to downstream tasks without fine-tuning.
Under this motivation, we propose Static Model Pruning (SMP), which only uses first-order pruning to adapt PLMs to downstream tasks while achieving the target sparsity level.
arXiv Detail & Related papers (2022-10-12T13:58:38Z) - Prompt Tuning for Discriminative Pre-trained Language Models [96.04765512463415]
Recent works have shown promising results of prompt tuning in stimulating pre-trained language models (PLMs) for natural language processing (NLP) tasks.
It is still unknown whether and how discriminative PLMs, e.g., ELECTRA, can be effectively prompt-tuned.
We present DPT, the first prompt tuning framework for discriminative PLMs, which reformulates NLP tasks into a discriminative language modeling problem.
arXiv Detail & Related papers (2022-05-23T10:11:50Z) - Task-guided Disentangled Tuning for Pretrained Language Models [16.429787408467703]
We propose Task-guided Disentangled Tuning (TDT) for pretrained language models (PLMs)
TDT enhances the generalization of representations by disentangling task-relevant signals from entangled representations.
Experimental results on GLUE and CLUE benchmarks show that TDT gives consistently better results than fine-tuning with different PLMs.
arXiv Detail & Related papers (2022-03-22T03:11:39Z) - Cutting Down on Prompts and Parameters: Simple Few-Shot Learning with
Language Models [48.0311578882384]
finetuning language models (LMs) with training examples and task descriptions has been seen as critical to recent successes in few-shot learning.
We show that finetuning LMs in the few-shot setting can considerably reduce the need for prompt engineering.
arXiv Detail & Related papers (2021-06-24T23:38:10Z) - CSS-LM: A Contrastive Framework for Semi-supervised Fine-tuning of
Pre-trained Language Models [59.49705076369856]
We introduce a novel framework to improve the fine-tuning phase of pre-trained language models (PLMs)
We retrieve positive and negative instances from large-scale unlabeled corpora according to their domain-level and class-level semantic relatedness to a task.
We then perform contrastive semi-supervised learning on both the retrieved unlabeled and original labeled instances to help PLMs capture crucial task-related semantic features.
arXiv Detail & Related papers (2021-02-07T09:27:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.