Adversarial Training for Large Neural Language Models
- URL: http://arxiv.org/abs/2004.08994v2
- Date: Wed, 29 Apr 2020 21:16:31 GMT
- Title: Adversarial Training for Large Neural Language Models
- Authors: Xiaodong Liu, Hao Cheng, Pengcheng He, Weizhu Chen, Yu Wang, Hoifung
Poon and Jianfeng Gao
- Abstract summary: We show that adversarial pre-training can improve both generalization and robustness.
ALUM regularizes the training objective by applying perturbations in the embedding space that maximizes the adversarial loss.
ALUM can be further combined with task-specific fine-tuning to attain additional gains.
- Score: 107.84290922621163
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generalization and robustness are both key desiderata for designing machine
learning methods. Adversarial training can enhance robustness, but past work
often finds it hurts generalization. In natural language processing (NLP),
pre-training large neural language models such as BERT have demonstrated
impressive gain in generalization for a variety of tasks, with further
improvement from adversarial fine-tuning. However, these models are still
vulnerable to adversarial attacks. In this paper, we show that adversarial
pre-training can improve both generalization and robustness. We propose a
general algorithm ALUM (Adversarial training for large neural LangUage Models),
which regularizes the training objective by applying perturbations in the
embedding space that maximizes the adversarial loss. We present the first
comprehensive study of adversarial training in all stages, including
pre-training from scratch, continual pre-training on a well-trained model, and
task-specific fine-tuning. ALUM obtains substantial gains over BERT on a wide
range of NLP tasks, in both regular and adversarial scenarios. Even for models
that have been well trained on extremely large text corpora, such as RoBERTa,
ALUM can still produce significant gains from continual pre-training, whereas
conventional non-adversarial methods can not. ALUM can be further combined with
task-specific fine-tuning to attain additional gains. The ALUM code is publicly
available at https://github.com/namisan/mt-dnn.
Related papers
- Instruction Pre-Training: Language Models are Supervised Multitask Learners [115.95022434390181]
In this paper, we propose a framework that augments massive raw corpora with instruction-response pairs to pre-train language models (LMs)
In our experiments, we synthesize 200M instruction-response pairs covering 40+ task categories to verify the effectiveness of Instruction Pre-Training.
arXiv Detail & Related papers (2024-06-20T16:55:33Z) - An Emulator for Fine-Tuning Large Language Models using Small Language
Models [91.02498576056057]
We introduce emulated fine-tuning (EFT), a principled and practical method for sampling from a distribution that approximates the result of pre-training and fine-tuning at different scales.
We show that EFT enables test-time adjustment of competing behavioral traits like helpfulness and harmlessness without additional training.
Finally, a special case of emulated fine-tuning, which we call LM up-scaling, avoids resource-intensive fine-tuning of large pre-trained models by ensembling them with small fine-tuned models.
arXiv Detail & Related papers (2023-10-19T17:57:16Z) - TWINS: A Fine-Tuning Framework for Improved Transferability of
Adversarial Robustness and Generalization [89.54947228958494]
This paper focuses on the fine-tuning of an adversarially pre-trained model in various classification tasks.
We propose a novel statistics-based approach, Two-WIng NormliSation (TWINS) fine-tuning framework.
TWINS is shown to be effective on a wide range of image classification datasets in terms of both generalization and robustness.
arXiv Detail & Related papers (2023-03-20T14:12:55Z) - Impact of Adversarial Training on Robustness and Generalizability of
Language Models [33.790145748360686]
This work provides an in depth comparison of different approaches for adversarial training in language models.
Our findings suggest that better robustness can be achieved by pre-training data augmentation or by training with input space perturbation.
A linguistic correlation analysis of neurons of the learned models reveals that the improved generalization is due to'more specialized' neurons.
arXiv Detail & Related papers (2022-11-10T12:36:50Z) - How Should Pre-Trained Language Models Be Fine-Tuned Towards Adversarial
Robustness? [121.57551065856164]
We propose Robust Informative Fine-Tuning (RIFT) as a novel adversarial fine-tuning method from an information-theoretical perspective.
RIFT encourages an objective model to retain the features learned from the pre-trained model throughout the entire fine-tuning process.
Experimental results show that RIFT consistently outperforms the state-of-the-arts on two popular NLP tasks.
arXiv Detail & Related papers (2021-12-22T05:04:41Z) - Robust Transfer Learning with Pretrained Language Models through
Adapters [40.45102278979193]
Transfer learning with large pretrained language models like BERT has become a dominating approach for most NLP tasks.
We propose a simple yet effective adapter-based approach to mitigate these issues.
Our experiments demonstrate that such a training scheme leads to improved stability and adversarial robustness in transfer learning to various downstream tasks.
arXiv Detail & Related papers (2021-08-05T02:30:13Z) - Self-Progressing Robust Training [146.8337017922058]
Current robust training methods such as adversarial training explicitly uses an "attack" to generate adversarial examples.
We propose a new framework called SPROUT, self-progressing robust training.
Our results shed new light on scalable, effective and attack-independent robust training methods.
arXiv Detail & Related papers (2020-12-22T00:45:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.