Better Fine-Tuning by Reducing Representational Collapse
- URL: http://arxiv.org/abs/2008.03156v1
- Date: Thu, 6 Aug 2020 02:13:16 GMT
- Title: Better Fine-Tuning by Reducing Representational Collapse
- Authors: Armen Aghajanyan, Akshat Shrivastava, Anchit Gupta, Naman Goyal, Luke
Zettlemoyer, Sonal Gupta
- Abstract summary: Existing approaches for fine-tuning pre-trained language models have been shown to be unstable.
We present a method rooted in trust region theory that replaces previously used adversarial objectives with parametric noise.
We show it is less prone to representation collapse; the pre-trained models maintain more generalizable representations every time they are fine-tuned.
- Score: 77.44854918334232
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Although widely adopted, existing approaches for fine-tuning pre-trained
language models have been shown to be unstable across hyper-parameter settings,
motivating recent work on trust region methods. In this paper, we present a
simplified and efficient method rooted in trust region theory that replaces
previously used adversarial objectives with parametric noise (sampling from
either a normal or uniform distribution), thereby discouraging representation
change during fine-tuning when possible without hurting performance. We also
introduce a new analysis to motivate the use of trust region methods more
generally, by studying representational collapse; the degradation of
generalizable representations from pre-trained models as they are fine-tuned
for a specific end task. Extensive experiments show that our fine-tuning method
matches or exceeds the performance of previous trust region methods on a range
of understanding and generation tasks (including DailyMail/CNN, Gigaword,
Reddit TIFU, and the GLUE benchmark), while also being much faster. We also
show that it is less prone to representation collapse; the pre-trained models
maintain more generalizable representations every time they are fine-tuned.
Related papers
- Fine-Tuning Pre-trained Language Models for Robust Causal Representation Learning [26.29386609645171]
Fine-tuning of pre-trained language models (PLMs) has been shown to be effective across various domains.
We show that a robust representation can be derived through a so-called causal front-door adjustment, based on a decomposition assumption.
Our work thus sheds light on the domain generalization problem by introducing links between fine-tuning and causal mechanisms into representation learning.
arXiv Detail & Related papers (2024-10-18T11:06:23Z) - Semi-Supervised Crowd Counting with Contextual Modeling: Facilitating Holistic Understanding of Crowd Scenes [19.987151025364067]
This paper presents a new semi-supervised method for training a reliable crowd counting model.
We foster the model's intrinsic'subitizing' capability, which allows it to accurately estimate the count in regions.
Our method achieves the state-of-the-art performance, surpassing previous approaches by a large margin on challenging benchmarks.
arXiv Detail & Related papers (2023-10-16T12:42:43Z) - Prompting Diffusion Representations for Cross-Domain Semantic
Segmentation [101.04326113360342]
diffusion-pretraining achieves extraordinary domain generalization results for semantic segmentation.
We introduce a scene prompt and a prompt randomization strategy to help further disentangle the domain-invariant information when training the segmentation head.
arXiv Detail & Related papers (2023-07-05T09:28:25Z) - Debiased Fine-Tuning for Vision-language Models by Prompt Regularization [50.41984119504716]
We present a new paradigm for fine-tuning large-scale vision pre-trained models on downstream task, dubbed Prompt Regularization (ProReg)
ProReg uses the prediction by prompting the pretrained model to regularize the fine-tuning.
We show the consistently strong performance of ProReg compared with conventional fine-tuning, zero-shot prompt, prompt tuning, and other state-of-the-art methods.
arXiv Detail & Related papers (2023-01-29T11:53:55Z) - Improving Pre-trained Language Model Fine-tuning with Noise Stability
Regularization [94.4409074435894]
We propose a novel and effective fine-tuning framework, named Layerwise Noise Stability Regularization (LNSR)
Specifically, we propose to inject the standard Gaussian noise and regularize hidden representations of the fine-tuned model.
We demonstrate the advantages of the proposed method over other state-of-the-art algorithms including L2-SP, Mixout and SMART.
arXiv Detail & Related papers (2022-06-12T04:42:49Z) - Transfer Learning Gaussian Anomaly Detection by Fine-Tuning
Representations [3.5031508291335625]
catastrophic forgetting prevents the successful fine-tuning of pre-trained representations on new datasets.
We propose a new method to fine-tune learned representations for AD in a transfer learning setting.
We additionally propose to use augmentations commonly employed for vicinal risk in a validation scheme to detect onset of catastrophic forgetting.
arXiv Detail & Related papers (2021-08-09T15:29:04Z) - Squared $\ell_2$ Norm as Consistency Loss for Leveraging Augmented Data
to Learn Robust and Invariant Representations [76.85274970052762]
Regularizing distance between embeddings/representations of original samples and augmented counterparts is a popular technique for improving robustness of neural networks.
In this paper, we explore these various regularization choices, seeking to provide a general understanding of how we should regularize the embeddings.
We show that the generic approach we identified (squared $ell$ regularized augmentation) outperforms several recent methods, which are each specially designed for one task.
arXiv Detail & Related papers (2020-11-25T22:40:09Z) - FAR: A General Framework for Attributional Robustness [42.49606659285249]
We define a novel framework for attributional robustness (FAR) for training models with robust attributions.
We show that FAR is a generalized, less constrained formulation of currently existing training methods.
We then propose two new instantiations of this framework, AAT and AdvAAT, that directly optimize for both robust attributions and predictions.
arXiv Detail & Related papers (2020-10-14T20:33:00Z) - Learning Invariant Representations and Risks for Semi-supervised Domain
Adaptation [109.73983088432364]
We propose the first method that aims to simultaneously learn invariant representations and risks under the setting of semi-supervised domain adaptation (Semi-DA)
We introduce the LIRR algorithm for jointly textbfLearning textbfInvariant textbfRepresentations and textbfRisks.
arXiv Detail & Related papers (2020-10-09T15:42:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.