PATS: Sensitivity-aware Noisy Learning for Pretrained Language Models
- URL: http://arxiv.org/abs/2210.12403v2
- Date: Tue, 25 Oct 2022 03:20:09 GMT
- Title: PATS: Sensitivity-aware Noisy Learning for Pretrained Language Models
- Authors: Yupeng Zhang, Hongzhi Zhang, Sirui Wang, Wei Wu and Zhoujun Li
- Abstract summary: We present PATS (Perturbation According To Sensitivity), a noisy training mechanism which considers each parameter's importance in the downstream task.
Experiments conducted on different tasks of the GLUE benchmark show PATS can consistently empower the fine-tuning of different sizes of PLMs.
- Score: 29.140036130469042
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A wide range of NLP tasks benefit from the fine-tuning of pretrained language
models (PLMs). However, a number of redundant parameters which contribute less
to the downstream task are observed in a directly fine-tuned model. We consider
the gap between pretraining and downstream tasks hinders the training of these
redundant parameters, and results in a suboptimal performance of the overall
model. In this paper, we present PATS (Perturbation According To Sensitivity),
a noisy training mechanism which considers each parameter's importance in the
downstream task to help fine-tune PLMs. The main idea of PATS is to add bigger
noise to parameters with lower sensitivity and vice versa, in order to activate
more parameters' contributions to downstream tasks without affecting the
sensitive ones much. Extensive experiments conducted on different tasks of the
GLUE benchmark show PATS can consistently empower the fine-tuning of different
sizes of PLMs, and the parameters in the well-performing models always have
more concentrated distributions of sensitivities, which experimentally proves
the effectiveness of our method.
Related papers
- SAFE: Slow and Fast Parameter-Efficient Tuning for Continual Learning with Pre-Trained Models [26.484208658326857]
Continual learning aims to incrementally acquire new concepts in data streams while resisting forgetting previous knowledge.
With the rise of powerful pre-trained models (PTMs), there is a growing interest in training incremental learning systems.
arXiv Detail & Related papers (2024-11-04T15:34:30Z) - SaRA: High-Efficient Diffusion Model Fine-tuning with Progressive Sparse Low-Rank Adaptation [52.6922833948127]
In this work, we investigate the importance of parameters in pre-trained diffusion models.
We propose a novel model fine-tuning method to make full use of these ineffective parameters.
Our method enhances the generative capabilities of pre-trained models in downstream applications.
arXiv Detail & Related papers (2024-09-10T16:44:47Z) - MELoRA: Mini-Ensemble Low-Rank Adapters for Parameter-Efficient Fine-Tuning [71.50432879573614]
Low-rank adaptation (LoRA) is based on the idea that the adaptation process is intrinsically low-dimensional.
We present MELoRA, a mini-ensemble low-rank adapters that uses fewer trainable parameters while maintaining a higher rank.
Our experimental results show that, compared to LoRA, MELoRA achieves better performance with 8 times fewer trainable parameters on natural language understanding tasks and 36 times fewer trainable parameters on instruction following tasks.
arXiv Detail & Related papers (2024-02-27T07:14:12Z) - Uncertainty-aware Parameter-Efficient Self-training for Semi-supervised
Language Understanding [38.11411155621616]
We study self-training as one of the predominant semi-supervised learning approaches.
We present UPET, a novel Uncertainty-aware self-Training framework.
We show that UPET achieves a substantial improvement in terms of performance and efficiency.
arXiv Detail & Related papers (2023-10-19T02:18:29Z) - PETA: Parameter-Efficient Trojan Attacks [10.327226660571121]
We present PETA, a novel trojan attack that compromises the weights of PLMs.
We demonstrate PETA's effectiveness in terms of both attack success rate and clean accuracy, even when the attacker does not have full knowledge of the victim user's training process.
arXiv Detail & Related papers (2023-10-01T12:07:44Z) - Sensi-BERT: Towards Sensitivity Driven Fine-Tuning for
Parameter-Efficient BERT [6.029590006321152]
We present Sensi-BERT, a sensitivity driven efficient fine-tuning of BERT models for downstream tasks.
Our experiments show the efficacy of Sensi-BERT across different downstream tasks including MNLI, QQP, QNLI, SST-2 and SQuAD.
arXiv Detail & Related papers (2023-07-14T17:24:15Z) - Exploring the Impact of Model Scaling on Parameter-Efficient Tuning [100.61202305296275]
Scaling-efficient tuning (PET) methods can effectively drive extremely large pre-trained language models (PLMs)
In small PLMs, there are usually noticeable performance differences among PET methods.
We introduce a more flexible PET method called Arbitrary PET (APET) method.
arXiv Detail & Related papers (2023-06-04T10:10:54Z) - Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than
In-Context Learning [81.3514358542452]
Few-shot in-context learning (ICL) incurs substantial computational, memory, and storage costs because it involves processing all of the training examples every time a prediction is made.
parameter-efficient fine-tuning offers an alternative paradigm where a small set of parameters are trained to enable a model to perform the new task.
In this paper, we rigorously compare few-shot ICL and parameter-efficient fine-tuning and demonstrate that the latter offers better accuracy as well as dramatically lower computational costs.
arXiv Detail & Related papers (2022-05-11T17:10:41Z) - NoisyTune: A Little Noise Can Help You Finetune Pretrained Language
Models Better [98.5705258907774]
Finetuning pretrained language models (PLMs) is critical for their success in downstream tasks.
PLMs may have risks in overfitting pretraining signals, and there are gaps between downstream tasks and the pretraining tasks.
NoisyTune can help better finetune PLMs in downstream tasks by adding some noise to the parameters of PLMs before finetuning.
arXiv Detail & Related papers (2022-02-24T11:08:02Z) - No Parameters Left Behind: Sensitivity Guided Adaptive Learning Rate for
Training Large Transformer Models [132.90062129639705]
We propose a novel training strategy that encourages all parameters to be trained sufficiently.
A parameter with low sensitivity is redundant, and we improve its fitting by increasing its learning rate.
In contrast, a parameter with high sensitivity is well-trained and we regularize it by decreasing its learning rate to prevent further overfitting.
arXiv Detail & Related papers (2022-02-06T00:22:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.