Alleviating Robust Overfitting of Adversarial Training With Consistency
Regularization
- URL: http://arxiv.org/abs/2205.11744v1
- Date: Tue, 24 May 2022 03:18:43 GMT
- Title: Alleviating Robust Overfitting of Adversarial Training With Consistency
Regularization
- Authors: Shudong Zhang, Haichang Gao, Tianwei Zhang, Yunyi Zhou and Zihui Wu
- Abstract summary: Adversarial training (AT) has proven to be one of the most effective ways to defend Deep Neural Networks (DNNs) against adversarial attacks.
robustness will drop sharply at a certain stage, always exists during AT.
consistency regularization, a popular technique in semi-supervised learning, has a similar goal as AT and can be used to alleviate robust overfitting.
- Score: 9.686724616328874
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Adversarial training (AT) has proven to be one of the most effective ways to
defend Deep Neural Networks (DNNs) against adversarial attacks. However, the
phenomenon of robust overfitting, i.e., the robustness will drop sharply at a
certain stage, always exists during AT. It is of great importance to decrease
this robust generalization gap in order to obtain a robust model. In this
paper, we present an in-depth study towards the robust overfitting from a new
angle. We observe that consistency regularization, a popular technique in
semi-supervised learning, has a similar goal as AT and can be used to alleviate
robust overfitting. We empirically validate this observation, and find a
majority of prior solutions have implicit connections to consistency
regularization. Motivated by this, we introduce a new AT solution, which
integrates the consistency regularization and Mean Teacher (MT) strategy into
AT. Specifically, we introduce a teacher model, coming from the average weights
of the student models over the training steps. Then we design a consistency
loss function to make the prediction distribution of the student models over
adversarial examples consistent with that of the teacher model over clean
samples. Experiments show that our proposed method can effectively alleviate
robust overfitting and improve the robustness of DNN models against common
adversarial attacks.
Related papers
- New Paradigm of Adversarial Training: Breaking Inherent Trade-Off between Accuracy and Robustness via Dummy Classes [11.694880978089852]
Adversarial Training (AT) is one of the most effective methods to enhance the robustness of DNNs.
Existing AT methods suffer from an inherent trade-off between adversarial robustness and clean accuracy.
We propose a new AT paradigm by introducing an additional dummy class for each original class.
arXiv Detail & Related papers (2024-10-16T15:36:10Z) - The Effectiveness of Random Forgetting for Robust Generalization [21.163070161951868]
We introduce a novel learning paradigm called "Forget to Mitigate Overfitting" (FOMO)
FOMO alternates between the forgetting phase, which randomly forgets a subset of weights, and the relearning phase, which emphasizes learning generalizable features.
Our experiments show that FOMO alleviates robust overfitting by significantly reducing the gap between the best and last robust test accuracy.
arXiv Detail & Related papers (2024-02-18T23:14:40Z) - Learn from the Past: A Proxy Guided Adversarial Defense Framework with
Self Distillation Regularization [53.04697800214848]
Adversarial Training (AT) is pivotal in fortifying the robustness of deep learning models.
AT methods, relying on direct iterative updates for target model's defense, frequently encounter obstacles such as unstable training and catastrophic overfitting.
We present a general proxy guided defense framework, LAST' (bf Learn from the Pbf ast)
arXiv Detail & Related papers (2023-10-19T13:13:41Z) - TWINS: A Fine-Tuning Framework for Improved Transferability of
Adversarial Robustness and Generalization [89.54947228958494]
This paper focuses on the fine-tuning of an adversarially pre-trained model in various classification tasks.
We propose a novel statistics-based approach, Two-WIng NormliSation (TWINS) fine-tuning framework.
TWINS is shown to be effective on a wide range of image classification datasets in terms of both generalization and robustness.
arXiv Detail & Related papers (2023-03-20T14:12:55Z) - Latent Boundary-guided Adversarial Training [61.43040235982727]
Adrial training is proved to be the most effective strategy that injects adversarial examples into model training.
We propose a novel adversarial training framework called LAtent bounDary-guided aDvErsarial tRaining.
arXiv Detail & Related papers (2022-06-08T07:40:55Z) - Self-Ensemble Adversarial Training for Improved Robustness [14.244311026737666]
Adversarial training is the strongest strategy against various adversarial attacks among all sorts of defense methods.
Recent works mainly focus on developing new loss functions or regularizers, attempting to find the unique optimal point in the weight space.
We devise a simple but powerful emphSelf-Ensemble Adversarial Training (SEAT) method for yielding a robust classifier by averaging weights of history models.
arXiv Detail & Related papers (2022-03-18T01:12:18Z) - A Unified Wasserstein Distributional Robustness Framework for
Adversarial Training [24.411703133156394]
This paper presents a unified framework that connects Wasserstein distributional robustness with current state-of-the-art AT methods.
We introduce a new Wasserstein cost function and a new series of risk functions, with which we show that standard AT methods are special cases of their counterparts in our framework.
This connection leads to an intuitive relaxation and generalization of existing AT methods and facilitates the development of a new family of distributional robustness AT-based algorithms.
arXiv Detail & Related papers (2022-02-27T19:40:29Z) - Interpolated Joint Space Adversarial Training for Robust and
Generalizable Defenses [82.3052187788609]
Adversarial training (AT) is considered to be one of the most reliable defenses against adversarial attacks.
Recent works show generalization improvement with adversarial samples under novel threat models.
We propose a novel threat model called Joint Space Threat Model (JSTM)
Under JSTM, we develop novel adversarial attacks and defenses.
arXiv Detail & Related papers (2021-12-12T21:08:14Z) - Adaptive Feature Alignment for Adversarial Training [56.17654691470554]
CNNs are typically vulnerable to adversarial attacks, which pose a threat to security-sensitive applications.
We propose the adaptive feature alignment (AFA) to generate features of arbitrary attacking strengths.
Our method is trained to automatically align features of arbitrary attacking strength.
arXiv Detail & Related papers (2021-05-31T17:01:05Z) - Adversarial Distributional Training for Robust Deep Learning [53.300984501078126]
Adversarial training (AT) is among the most effective techniques to improve model robustness by augmenting training data with adversarial examples.
Most existing AT methods adopt a specific attack to craft adversarial examples, leading to the unreliable robustness against other unseen attacks.
In this paper, we introduce adversarial distributional training (ADT), a novel framework for learning robust models.
arXiv Detail & Related papers (2020-02-14T12:36:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.