Enhance Diffusion to Improve Robust Generalization
- URL: http://arxiv.org/abs/2306.02618v2
- Date: Thu, 17 Aug 2023 22:43:01 GMT
- Title: Enhance Diffusion to Improve Robust Generalization
- Authors: Jianhui Sun and Sanchit Sinha and Aidong Zhang
- Abstract summary: emphAdversarial Training (AT) is one of the strongest defense mechanisms against adversarial perturbations.
This paper focuses on the primary AT framework - Projected Gradient Descent Adversarial Training (PGD-AT)
We propose a novel approach, emphDiffusion Enhanced Adversarial Training (DEAT), to manipulate the diffusion term to improve robust generalization with virtually no extra computational burden.
- Score: 39.9012723077658
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep neural networks are susceptible to human imperceptible adversarial
perturbations. One of the strongest defense mechanisms is \emph{Adversarial
Training} (AT). In this paper, we aim to address two predominant problems in
AT. First, there is still little consensus on how to set hyperparameters with a
performance guarantee for AT research, and customized settings impede a fair
comparison between different model designs in AT research. Second, the robustly
trained neural networks struggle to generalize well and suffer from tremendous
overfitting. This paper focuses on the primary AT framework - Projected
Gradient Descent Adversarial Training (PGD-AT). We approximate the dynamic of
PGD-AT by a continuous-time Stochastic Differential Equation (SDE), and show
that the diffusion term of this SDE determines the robust generalization. An
immediate implication of this theoretical finding is that robust generalization
is positively correlated with the ratio between learning rate and batch size.
We further propose a novel approach, \emph{Diffusion Enhanced Adversarial
Training} (DEAT), to manipulate the diffusion term to improve robust
generalization with virtually no extra computational burden. We theoretically
show that DEAT obtains a tighter generalization bound than PGD-AT. Our
empirical investigation is extensive and firmly attests that DEAT universally
outperforms PGD-AT by a significant margin.
Related papers
- Exploiting the Layered Intrinsic Dimensionality of Deep Models for Practical Adversarial Training [31.495803865226158]
Adversarial Training (AT) is rarely, if ever, deployed in practical AI systems for two primary reasons.
AT results in a drop in generalization in vision models whereas, in encoder-based language models, generalization either improves or remains unchanged.
We show that SMAAT requires only 25-33% of the GPU time compared to standard AT, while significantly improving robustness across all applications.
arXiv Detail & Related papers (2024-05-27T12:48:30Z) - The Effectiveness of Random Forgetting for Robust Generalization [21.163070161951868]
We introduce a novel learning paradigm called "Forget to Mitigate Overfitting" (FOMO)
FOMO alternates between the forgetting phase, which randomly forgets a subset of weights, and the relearning phase, which emphasizes learning generalizable features.
Our experiments show that FOMO alleviates robust overfitting by significantly reducing the gap between the best and last robust test accuracy.
arXiv Detail & Related papers (2024-02-18T23:14:40Z) - Perturbation-Invariant Adversarial Training for Neural Ranking Models:
Improving the Effectiveness-Robustness Trade-Off [107.35833747750446]
adversarial examples can be crafted by adding imperceptible perturbations to legitimate documents.
This vulnerability raises significant concerns about their reliability and hinders the widespread deployment of NRMs.
In this study, we establish theoretical guarantees regarding the effectiveness-robustness trade-off in NRMs.
arXiv Detail & Related papers (2023-12-16T05:38:39Z) - Alleviating Robust Overfitting of Adversarial Training With Consistency
Regularization [9.686724616328874]
Adversarial training (AT) has proven to be one of the most effective ways to defend Deep Neural Networks (DNNs) against adversarial attacks.
robustness will drop sharply at a certain stage, always exists during AT.
consistency regularization, a popular technique in semi-supervised learning, has a similar goal as AT and can be used to alleviate robust overfitting.
arXiv Detail & Related papers (2022-05-24T03:18:43Z) - Sparsity Winning Twice: Better Robust Generalization from More Efficient
Training [94.92954973680914]
We introduce two alternatives for sparse adversarial training: (i) static sparsity and (ii) dynamic sparsity.
We find both methods to yield win-win: substantially shrinking the robust generalization gap and alleviating the robust overfitting.
Our approaches can be combined with existing regularizers, establishing new state-of-the-art results in adversarial training.
arXiv Detail & Related papers (2022-02-20T15:52:08Z) - Adaptive Feature Alignment for Adversarial Training [56.17654691470554]
CNNs are typically vulnerable to adversarial attacks, which pose a threat to security-sensitive applications.
We propose the adaptive feature alignment (AFA) to generate features of arbitrary attacking strengths.
Our method is trained to automatically align features of arbitrary attacking strength.
arXiv Detail & Related papers (2021-05-31T17:01:05Z) - Consistency Regularization for Adversarial Robustness [88.65786118562005]
Adversarial training is one of the most successful methods to obtain the adversarial robustness of deep neural networks.
However, a significant generalization gap in the robustness obtained from AT has been problematic.
In this paper, we investigate data augmentation techniques to address the issue.
arXiv Detail & Related papers (2021-03-08T09:21:41Z) - Detached Error Feedback for Distributed SGD with Random Sparsification [98.98236187442258]
Communication bottleneck has been a critical problem in large-scale deep learning.
We propose a new distributed error feedback (DEF) algorithm, which shows better convergence than error feedback for non-efficient distributed problems.
We also propose DEFA to accelerate the generalization of DEF, which shows better bounds than DEF.
arXiv Detail & Related papers (2020-04-11T03:50:59Z) - Adversarial Vertex Mixup: Toward Better Adversarially Robust
Generalization [28.072758856453106]
Adversarial examples cause neural networks to produce incorrect outputs with high confidence.
We show that adversarial training can overshoot the optimal point in terms of robust generalization, leading to Adversarial Feature Overfitting (AFO)
We propose Adversarial Vertex mixup (AVmixup) as a soft-labeled data augmentation approach for improving adversarially robust generalization.
arXiv Detail & Related papers (2020-03-05T08:47:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.