Stable and Efficient Adversarial Training through Local Linearization
- URL: http://arxiv.org/abs/2210.05373v1
- Date: Tue, 11 Oct 2022 11:57:37 GMT
- Title: Stable and Efficient Adversarial Training through Local Linearization
- Authors: Zhuorong Li and Daiwei Yu
- Abstract summary: A phenomenon referred to as catastrophic overfitting" has been observed, which is prevalent in single-step defenses.
We propose a novel method, Stable and Efficient Adversarial Training (SEAT), which mitigates catastrophic overfitting.
Our single-step method can reach 51% robust accuracy for CIFAR-10 with $l_infty$ perturbations of radius $8/255$ under a strong PGD-50 attack.
- Score: 0.5076419064097734
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: There has been a recent surge in single-step adversarial training as it shows
robustness and efficiency. However, a phenomenon referred to as ``catastrophic
overfitting" has been observed, which is prevalent in single-step defenses and
may frustrate attempts to use FGSM adversarial training. To address this issue,
we propose a novel method, Stable and Efficient Adversarial Training (SEAT),
which mitigates catastrophic overfitting by harnessing on local properties that
distinguish a robust model from that of a catastrophic overfitted model. The
proposed SEAT has strong theoretical justifications, in that minimizing the
SEAT loss can be shown to favour smooth empirical risk, thereby leading to
robustness. Experimental results demonstrate that the proposed method
successfully mitigates catastrophic overfitting, yielding superior performance
amongst efficient defenses. Our single-step method can reach 51% robust
accuracy for CIFAR-10 with $l_\infty$ perturbations of radius $8/255$ under a
strong PGD-50 attack, matching the performance of a 10-step iterative
adversarial training at merely 3% computational cost.
Related papers
- Perturbation-Invariant Adversarial Training for Neural Ranking Models:
Improving the Effectiveness-Robustness Trade-Off [107.35833747750446]
adversarial examples can be crafted by adding imperceptible perturbations to legitimate documents.
This vulnerability raises significant concerns about their reliability and hinders the widespread deployment of NRMs.
In this study, we establish theoretical guarantees regarding the effectiveness-robustness trade-off in NRMs.
arXiv Detail & Related papers (2023-12-16T05:38:39Z) - Reducing Adversarial Training Cost with Gradient Approximation [0.3916094706589679]
We propose a new and efficient adversarial training method, adversarial training with gradient approximation (GAAT) to reduce the cost of building up robust models.
Our proposed method saves up to 60% of the training time with comparable model test accuracy on datasets.
arXiv Detail & Related papers (2023-09-18T03:55:41Z) - Enhancing Adversarial Training via Reweighting Optimization Trajectory [72.75558017802788]
A number of approaches have been proposed to address drawbacks such as extra regularization, adversarial weights, and training with more data.
We propose a new method named textbfWeighted Optimization Trajectories (WOT) that leverages the optimization trajectories of adversarial training in time.
Our results show that WOT integrates seamlessly with the existing adversarial training methods and consistently overcomes the robust overfitting issue.
arXiv Detail & Related papers (2023-06-25T15:53:31Z) - Revisiting and Advancing Adversarial Training Through A Simple Baseline [7.226961695849204]
We introduce a simple baseline approach, termed SimpleAT, that performs competitively with recent methods and mitigates robust overfitting.
We conduct extensive experiments on CIFAR-10/100 and Tiny-ImageNet, which validate the robustness of SimpleAT against state-of-the-art adversarial attackers.
Our results also reveal the connections between SimpleAT and many advanced state-of-the-art adversarial defense methods.
arXiv Detail & Related papers (2023-06-13T08:12:52Z) - Raising the Bar for Certified Adversarial Robustness with Diffusion
Models [9.684141378657522]
In this work, we demonstrate that a similar approach can substantially improve deterministic certified defenses.
One of our main insights is that the difference between the training and test accuracy of the original model, is a good predictor of the magnitude of the improvement.
Our approach achieves state-of-the-art deterministic robustness certificates on CIFAR-10 for the $ell$ ($epsilon = 36/255$) and $ell_infty$ ($epsilon = 8/255$) threat models.
arXiv Detail & Related papers (2023-05-17T17:29:10Z) - Enhancing Adversarial Robustness for Deep Metric Learning [77.75152218980605]
adversarial robustness of deep metric learning models has to be improved.
In order to avoid model collapse due to excessively hard examples, the existing defenses dismiss the min-max adversarial training.
We propose Hardness Manipulation to efficiently perturb the training triplet till a specified level of hardness for adversarial training.
arXiv Detail & Related papers (2022-03-02T22:27:44Z) - Robust Single-step Adversarial Training with Regularizer [11.35007968593652]
We propose a novel Fast Gradient Sign Method with PGD Regularization (FGSMPR) to boost the efficiency of adversarial training without catastrophic overfitting.
Experiments demonstrate that our proposed method can train a robust deep network for L$_infty$-perturbations with FGSM adversarial training.
arXiv Detail & Related papers (2021-02-05T19:07:10Z) - A Simple Fine-tuning Is All You Need: Towards Robust Deep Learning Via
Adversarial Fine-tuning [90.44219200633286]
We propose a simple yet very effective adversarial fine-tuning approach based on a $textitslow start, fast decay$ learning rate scheduling strategy.
Experimental results show that the proposed adversarial fine-tuning approach outperforms the state-of-the-art methods on CIFAR-10, CIFAR-100 and ImageNet datasets.
arXiv Detail & Related papers (2020-12-25T20:50:15Z) - To be Robust or to be Fair: Towards Fairness in Adversarial Training [83.42241071662897]
We find that adversarial training algorithms tend to introduce severe disparity of accuracy and robustness between different groups of data.
We propose a Fair-Robust-Learning (FRL) framework to mitigate this unfairness problem when doing adversarial defenses.
arXiv Detail & Related papers (2020-10-13T02:21:54Z) - Bag of Tricks for Adversarial Training [50.53525358778331]
Adrial training is one of the most effective strategies for promoting model robustness.
Recent benchmarks show that most of the proposed improvements on AT are less effective than simply early stopping the training procedure.
arXiv Detail & Related papers (2020-10-01T15:03:51Z) - Fast is better than free: Revisiting adversarial training [86.11788847990783]
We show that it is possible to train empirically robust models using a much weaker and cheaper adversary.
We identify a failure mode referred to as "catastrophic overfitting" which may have caused previous attempts to use FGSM adversarial training to fail.
arXiv Detail & Related papers (2020-01-12T20:30:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.