Improving Pre-trained Language Model Fine-tuning with Noise Stability
Regularization
- URL: http://arxiv.org/abs/2206.05658v2
- Date: Thu, 9 Nov 2023 01:19:02 GMT
- Title: Improving Pre-trained Language Model Fine-tuning with Noise Stability
Regularization
- Authors: Hang Hua, Xingjian Li, Dejing Dou, Cheng-Zhong Xu, Jiebo Luo
- Abstract summary: We propose a novel and effective fine-tuning framework, named Layerwise Noise Stability Regularization (LNSR)
Specifically, we propose to inject the standard Gaussian noise and regularize hidden representations of the fine-tuned model.
We demonstrate the advantages of the proposed method over other state-of-the-art algorithms including L2-SP, Mixout and SMART.
- Score: 94.4409074435894
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The advent of large-scale pre-trained language models has contributed greatly
to the recent progress in natural language processing. Many state-of-the-art
language models are first trained on a large text corpus and then fine-tuned on
downstream tasks. Despite its recent success and wide adoption, fine-tuning a
pre-trained language model often suffers from overfitting, which leads to poor
generalizability due to the extremely high complexity of the model and the
limited training samples from downstream tasks. To address this problem, we
propose a novel and effective fine-tuning framework, named Layerwise Noise
Stability Regularization (LNSR). Specifically, we propose to inject the
standard Gaussian noise or In-manifold noise and regularize hidden
representations of the fine-tuned model. We first provide theoretical analyses
to support the efficacy of our method. We then demonstrate the advantages of
the proposed method over other state-of-the-art algorithms including L2-SP,
Mixout and SMART. While these previous works only verify the effectiveness of
their methods on relatively simple text classification tasks, we also verify
the effectiveness of our method on question answering tasks, where the target
problem is much more difficult and more training examples are available.
Furthermore, extensive experimental results indicate that the proposed
algorithm can not only enhance the in-domain performance of the language models
but also improve the domain generalization performance on out-of-domain data.
Related papers
- Advancing the Robustness of Large Language Models through Self-Denoised Smoothing [50.54276872204319]
Large language models (LLMs) have achieved significant success, but their vulnerability to adversarial perturbations has raised considerable concerns.
We propose to leverage the multitasking nature of LLMs to first denoise the noisy inputs and then to make predictions based on these denoised versions.
Unlike previous denoised smoothing techniques in computer vision, which require training a separate model to enhance the robustness of LLMs, our method offers significantly better efficiency and flexibility.
arXiv Detail & Related papers (2024-04-18T15:47:00Z) - Language Rectified Flow: Advancing Diffusion Language Generation with Probabilistic Flows [53.31856123113228]
This paper proposes Language Rectified Flow (ours)
Our method is based on the reformulation of the standard probabilistic flow models.
Experiments and ablation studies demonstrate that our method can be general, effective, and beneficial for many NLP tasks.
arXiv Detail & Related papers (2024-03-25T17:58:22Z) - Improving Temporal Generalization of Pre-trained Language Models with
Lexical Semantic Change [28.106524698188675]
Recent research has revealed that neural language models at scale suffer from poor temporal generalization capability.
We propose a simple yet effective lexical-level masking strategy to post-train a converged language model.
arXiv Detail & Related papers (2022-10-31T08:12:41Z) - An Exploration of Prompt Tuning on Generative Spoken Language Model for
Speech Processing Tasks [112.1942546460814]
We report the first exploration of the prompt tuning paradigm for speech processing tasks based on Generative Spoken Language Model (GSLM)
Experiment results show that the prompt tuning technique achieves competitive performance in speech classification tasks with fewer trainable parameters than fine-tuning specialized downstream models.
arXiv Detail & Related papers (2022-03-31T03:26:55Z) - Differentiable Prompt Makes Pre-trained Language Models Better Few-shot
Learners [23.150999852147283]
This study proposes a novel pluggable, and efficient approach named DifferentiAble pRompT (DART)
It can convert small language models into better few-shot learners without any prompt engineering.
A comprehensive evaluation of standard NLP tasks demonstrates that the proposed approach achieves a better few-shot performance.
arXiv Detail & Related papers (2021-08-30T12:29:25Z) - NoiER: An Approach for Training more Reliable Fine-TunedDownstream Task
Models [54.184609286094044]
We propose noise entropy regularisation (NoiER) as an efficient learning paradigm that solves the problem without auxiliary models and additional data.
The proposed approach improved traditional OOD detection evaluation metrics by 55% on average compared to the original fine-tuned models.
arXiv Detail & Related papers (2021-08-29T06:58:28Z) - Exploring Fine-tuning Techniques for Pre-trained Cross-lingual Models
via Continual Learning [74.25168207651376]
Fine-tuning pre-trained language models to downstream cross-lingual tasks has shown promising results.
We leverage continual learning to preserve the cross-lingual ability of the pre-trained model when we fine-tune it to downstream tasks.
Our methods achieve better performance than other fine-tuning baselines on the zero-shot cross-lingual part-of-speech tagging and named entity recognition tasks.
arXiv Detail & Related papers (2020-04-29T14:07:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.