Preventing Catastrophic Overfitting in Fast Adversarial Training: A Bi-level Optimization Perspective
- URL: http://arxiv.org/abs/2407.12443v1
- Date: Wed, 17 Jul 2024 09:53:20 GMT
- Title: Preventing Catastrophic Overfitting in Fast Adversarial Training: A Bi-level Optimization Perspective
- Authors: Zhaoxin Wang, Handing Wang, Cong Tian, Yaochu Jin,
- Abstract summary: Adversarial training (AT) has become an effective defense method against adversarial examples (AEs)
Fast AT (FAT) employs a single-step attack strategy to guide the training process.
FAT methods suffer from the catastrophic overfitting problem.
- Score: 20.99874786089634
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Adversarial training (AT) has become an effective defense method against adversarial examples (AEs) and it is typically framed as a bi-level optimization problem. Among various AT methods, fast AT (FAT), which employs a single-step attack strategy to guide the training process, can achieve good robustness against adversarial attacks at a low cost. However, FAT methods suffer from the catastrophic overfitting problem, especially on complex tasks or with large-parameter models. In this work, we propose a FAT method termed FGSM-PCO, which mitigates catastrophic overfitting by averting the collapse of the inner optimization problem in the bi-level optimization process. FGSM-PCO generates current-stage AEs from the historical AEs and incorporates them into the training process using an adaptive mechanism. This mechanism determines an appropriate fusion ratio according to the performance of the AEs on the training model. Coupled with a loss function tailored to the training framework, FGSM-PCO can alleviate catastrophic overfitting and help the recovery of an overfitted model to effective training. We evaluate our algorithm across three models and three datasets to validate its effectiveness. Comparative empirical studies against other FAT algorithms demonstrate that our proposed method effectively addresses unresolved overfitting issues in existing algorithms.
Related papers
- Efficient Adversarial Training in LLMs with Continuous Attacks [99.5882845458567]
Large language models (LLMs) are vulnerable to adversarial attacks that can bypass their safety guardrails.
We propose a fast adversarial training algorithm (C-AdvUL) composed of two losses.
C-AdvIPO is an adversarial variant of IPO that does not require utility data for adversarially robust alignment.
arXiv Detail & Related papers (2024-05-24T14:20:09Z) - Reducing Adversarial Training Cost with Gradient Approximation [0.3916094706589679]
We propose a new and efficient adversarial training method, adversarial training with gradient approximation (GAAT) to reduce the cost of building up robust models.
Our proposed method saves up to 60% of the training time with comparable model test accuracy on datasets.
arXiv Detail & Related papers (2023-09-18T03:55:41Z) - Improving Fast Adversarial Training with Prior-Guided Knowledge [80.52575209189365]
We investigate the relationship between adversarial example quality and catastrophic overfitting by comparing the training processes of standard adversarial training and Fast adversarial training.
We find that catastrophic overfitting occurs when the attack success rate of adversarial examples becomes worse.
arXiv Detail & Related papers (2023-04-01T02:18:12Z) - Prior-Guided Adversarial Initialization for Fast Adversarial Training [84.56377396106447]
We investigate the difference between the training processes of adversarial examples (AEs) of Fast adversarial training (FAT) and standard adversarial training (SAT)
We observe that the attack success rate of adversarial examples (AEs) of FAT gets worse gradually in the late training stage, resulting in overfitting.
Based on the observation, we propose a prior-guided FGSM initialization method to avoid overfitting.
The proposed method can prevent catastrophic overfitting and outperform state-of-the-art FAT methods.
arXiv Detail & Related papers (2022-07-18T18:13:10Z) - Revisiting and Advancing Fast Adversarial Training Through The Lens of
Bi-Level Optimization [60.72410937614299]
We propose a new tractable bi-level optimization problem, design and analyze a new set of algorithms termed Bi-level AT (FAST-BAT)
FAST-BAT is capable of defending sign-based projected descent (PGD) attacks without calling any gradient sign method and explicit robust regularization.
arXiv Detail & Related papers (2021-12-23T06:25:36Z) - Boosting Adversarial Training with Hypersphere Embedding [53.75693100495097]
Adversarial training is one of the most effective defenses against adversarial attacks for deep learning models.
In this work, we advocate incorporating the hypersphere embedding mechanism into the AT procedure.
We validate our methods under a wide range of adversarial attacks on the CIFAR-10 and ImageNet datasets.
arXiv Detail & Related papers (2020-02-20T08:42:29Z) - Adversarial Distributional Training for Robust Deep Learning [53.300984501078126]
Adversarial training (AT) is among the most effective techniques to improve model robustness by augmenting training data with adversarial examples.
Most existing AT methods adopt a specific attack to craft adversarial examples, leading to the unreliable robustness against other unseen attacks.
In this paper, we introduce adversarial distributional training (ADT), a novel framework for learning robust models.
arXiv Detail & Related papers (2020-02-14T12:36:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.