DSRM: Boost Textual Adversarial Training with Distribution Shift Risk
Minimization
- URL: http://arxiv.org/abs/2306.15164v1
- Date: Tue, 27 Jun 2023 02:46:08 GMT
- Title: DSRM: Boost Textual Adversarial Training with Distribution Shift Risk
Minimization
- Authors: Songyang Gao, Shihan Dou, Yan Liu, Xiao Wang, Qi Zhang, Zhongyu Wei,
Jin Ma, Ying Shan
- Abstract summary: Adversarial training is one of the best-performing methods in improving the robustness of deep language models.
We introduce a novel, effective procedure for instead adversarial training with only clean data.
Our approach requires zero adversarial samples for training and reduces time consumption by up to 70% compared to current best-performing adversarial training methods.
- Score: 36.10642858867033
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Adversarial training is one of the best-performing methods in improving the
robustness of deep language models. However, robust models come at the cost of
high time consumption, as they require multi-step gradient ascents or word
substitutions to obtain adversarial samples. In addition, these generated
samples are deficient in grammatical quality and semantic consistency, which
impairs the effectiveness of adversarial training. To address these problems,
we introduce a novel, effective procedure for instead adversarial training with
only clean data. Our procedure, distribution shift risk minimization (DSRM),
estimates the adversarial loss by perturbing the input data's probability
distribution rather than their embeddings. This formulation results in a robust
model that minimizes the expected global loss under adversarial attacks. Our
approach requires zero adversarial samples for training and reduces time
consumption by up to 70\% compared to current best-performing adversarial
training methods. Experiments demonstrate that DSRM considerably improves
BERT's resistance to textual adversarial attacks and achieves state-of-the-art
robust accuracy on various benchmarks.
Related papers
- Perturbation-Invariant Adversarial Training for Neural Ranking Models:
Improving the Effectiveness-Robustness Trade-Off [107.35833747750446]
adversarial examples can be crafted by adding imperceptible perturbations to legitimate documents.
This vulnerability raises significant concerns about their reliability and hinders the widespread deployment of NRMs.
In this study, we establish theoretical guarantees regarding the effectiveness-robustness trade-off in NRMs.
arXiv Detail & Related papers (2023-12-16T05:38:39Z) - Doubly Robust Instance-Reweighted Adversarial Training [107.40683655362285]
We propose a novel doubly-robust instance reweighted adversarial framework.
Our importance weights are obtained by optimizing the KL-divergence regularized loss function.
Our proposed approach outperforms related state-of-the-art baseline methods in terms of average robust performance.
arXiv Detail & Related papers (2023-08-01T06:16:18Z) - Boundary Adversarial Examples Against Adversarial Overfitting [4.391102490444538]
adversarial training approaches suffer from robust overfitting where the robust accuracy decreases when models are adversarially trained for too long.
Several mitigation approaches including early stopping, temporal ensembling and weight memorizations have been proposed to mitigate the effect of robust overfitting.
In this paper, we investigate if these mitigation approaches are complimentary to each other in improving adversarial training performance.
arXiv Detail & Related papers (2022-11-25T13:16:53Z) - Efficient Adversarial Training With Data Pruning [26.842714298874192]
We show that data pruning leads to improvements in convergence and reliability of adversarial training.
In some settings data pruning brings benefits from both worlds-it both improves adversarial accuracy and training time.
arXiv Detail & Related papers (2022-07-01T23:54:46Z) - Adaptive perturbation adversarial training: based on reinforcement
learning [9.563820241076103]
One of the shortcomings of adversarial training is that it will reduce the recognition accuracy of normal samples.
Adaptive adversarial training is proposed to alleviate this problem.
It uses marginal adversarial samples that are close to the decision boundary but does not cross the decision boundary for adversarial training.
arXiv Detail & Related papers (2021-08-30T13:49:55Z) - Understanding the Logit Distributions of Adversarially-Trained Deep
Neural Networks [6.439477789066243]
Adversarial defenses train deep neural networks to be invariant to the input perturbations from adversarial attacks.
Although adversarial training is successful at mitigating adversarial attacks, the behavioral differences between adversarially-trained (AT) models and standard models are still poorly understood.
We identify three logit characteristics essential to learning adversarial robustness.
arXiv Detail & Related papers (2021-08-26T19:09:15Z) - Self-Progressing Robust Training [146.8337017922058]
Current robust training methods such as adversarial training explicitly uses an "attack" to generate adversarial examples.
We propose a new framework called SPROUT, self-progressing robust training.
Our results shed new light on scalable, effective and attack-independent robust training methods.
arXiv Detail & Related papers (2020-12-22T00:45:24Z) - On the Generalization Properties of Adversarial Training [21.79888306754263]
This paper studies the generalization performance of a generic adversarial training algorithm.
A series of numerical studies are conducted to demonstrate how the smoothness and L1 penalization help improve the adversarial robustness of models.
arXiv Detail & Related papers (2020-08-15T02:32:09Z) - Adversarial Self-Supervised Contrastive Learning [62.17538130778111]
Existing adversarial learning approaches mostly use class labels to generate adversarial samples that lead to incorrect predictions.
We propose a novel adversarial attack for unlabeled data, which makes the model confuse the instance-level identities of the perturbed data samples.
We present a self-supervised contrastive learning framework to adversarially train a robust neural network without labeled data.
arXiv Detail & Related papers (2020-06-13T08:24:33Z) - Precise Tradeoffs in Adversarial Training for Linear Regression [55.764306209771405]
We provide a precise and comprehensive understanding of the role of adversarial training in the context of linear regression with Gaussian features.
We precisely characterize the standard/robust accuracy and the corresponding tradeoff achieved by a contemporary mini-max adversarial training approach.
Our theory for adversarial training algorithms also facilitates the rigorous study of how a variety of factors (size and quality of training data, model overparametrization etc.) affect the tradeoff between these two competing accuracies.
arXiv Detail & Related papers (2020-02-24T19:01:47Z) - Adversarial Distributional Training for Robust Deep Learning [53.300984501078126]
Adversarial training (AT) is among the most effective techniques to improve model robustness by augmenting training data with adversarial examples.
Most existing AT methods adopt a specific attack to craft adversarial examples, leading to the unreliable robustness against other unseen attacks.
In this paper, we introduce adversarial distributional training (ADT), a novel framework for learning robust models.
arXiv Detail & Related papers (2020-02-14T12:36:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.