Towards Rapid and Robust Adversarial Training with One-Step Attacks
- URL: http://arxiv.org/abs/2002.10097v4
- Date: Tue, 17 Mar 2020 07:52:57 GMT
- Title: Towards Rapid and Robust Adversarial Training with One-Step Attacks
- Authors: Leo Schwinn, Ren\'e Raab, Bj\"orn Eskofier
- Abstract summary: Adversarial training is the most successful method for increasing the robustness of neural networks against adversarial attacks.
We present two ideas that enable adversarial training with the computationally less expensive Fast Gradient Sign Method.
We show that noise injection in conjunction with FGSM-based adversarial training achieves comparable results to adversarial training with PGD while being considerably faster.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Adversarial training is the most successful empirical method for increasing
the robustness of neural networks against adversarial attacks. However, the
most effective approaches, like training with Projected Gradient Descent (PGD)
are accompanied by high computational complexity. In this paper, we present two
ideas that, in combination, enable adversarial training with the
computationally less expensive Fast Gradient Sign Method (FGSM). First, we add
uniform noise to the initial data point of the FGSM attack, which creates a
wider variety of adversaries, thus prohibiting overfitting to one particular
perturbation bound. Further, we add a learnable regularization step prior to
the neural network, which we call Pixelwise Noise Injection Layer (PNIL).
Inputs propagated trough the PNIL are resampled from a learned Gaussian
distribution. The regularization induced by the PNIL prevents the model form
learning to obfuscate its gradients, a factor that hindered prior approaches
from successfully applying one-step methods for adversarial training. We show
that noise injection in conjunction with FGSM-based adversarial training
achieves comparable results to adversarial training with PGD while being
considerably faster. Moreover, we outperform PGD-based adversarial training by
combining noise injection and PNIL.
Related papers
- Efficient Adversarial Training in LLMs with Continuous Attacks [99.5882845458567]
Large language models (LLMs) are vulnerable to adversarial attacks that can bypass their safety guardrails.
We propose a fast adversarial training algorithm (C-AdvUL) composed of two losses.
C-AdvIPO is an adversarial variant of IPO that does not require utility data for adversarially robust alignment.
arXiv Detail & Related papers (2024-05-24T14:20:09Z) - Purify Unlearnable Examples via Rate-Constrained Variational Autoencoders [101.42201747763178]
Unlearnable examples (UEs) seek to maximize testing error by making subtle modifications to training examples that are correctly labeled.
Our work provides a novel disentanglement mechanism to build an efficient pre-training purification method.
arXiv Detail & Related papers (2024-05-02T16:49:25Z) - Fast Propagation is Better: Accelerating Single-Step Adversarial
Training via Sampling Subnetworks [69.54774045493227]
A drawback of adversarial training is the computational overhead introduced by the generation of adversarial examples.
We propose to exploit the interior building blocks of the model to improve efficiency.
Compared with previous methods, our method not only reduces the training cost but also achieves better model robustness.
arXiv Detail & Related papers (2023-10-24T01:36:20Z) - Adversarial Coreset Selection for Efficient Robust Training [11.510009152620666]
We show how selecting a small subset of training data provides a principled approach to reducing the time complexity of robust training.
We conduct extensive experiments to demonstrate that our approach speeds up adversarial training by 2-3 times.
arXiv Detail & Related papers (2022-09-13T07:37:53Z) - Distributed Adversarial Training to Robustify Deep Neural Networks at
Scale [100.19539096465101]
Current deep neural networks (DNNs) are vulnerable to adversarial attacks, where adversarial perturbations to the inputs can change or manipulate classification.
To defend against such attacks, an effective approach, known as adversarial training (AT), has been shown to mitigate robust training.
We propose a large-batch adversarial training framework implemented over multiple machines.
arXiv Detail & Related papers (2022-06-13T15:39:43Z) - Robust Single-step Adversarial Training with Regularizer [11.35007968593652]
We propose a novel Fast Gradient Sign Method with PGD Regularization (FGSMPR) to boost the efficiency of adversarial training without catastrophic overfitting.
Experiments demonstrate that our proposed method can train a robust deep network for L$_infty$-perturbations with FGSM adversarial training.
arXiv Detail & Related papers (2021-02-05T19:07:10Z) - Self-Progressing Robust Training [146.8337017922058]
Current robust training methods such as adversarial training explicitly uses an "attack" to generate adversarial examples.
We propose a new framework called SPROUT, self-progressing robust training.
Our results shed new light on scalable, effective and attack-independent robust training methods.
arXiv Detail & Related papers (2020-12-22T00:45:24Z) - Feature Purification: How Adversarial Training Performs Robust Deep
Learning [66.05472746340142]
We show a principle that we call Feature Purification, where we show one of the causes of the existence of adversarial examples is the accumulation of certain small dense mixtures in the hidden weights during the training process of a neural network.
We present both experiments on the CIFAR-10 dataset to illustrate this principle, and a theoretical result proving that for certain natural classification tasks, training a two-layer neural network with ReLU activation using randomly gradient descent indeed this principle.
arXiv Detail & Related papers (2020-05-20T16:56:08Z) - Initializing Perturbations in Multiple Directions for Fast Adversarial
Training [1.8638865257327277]
In image classification, an adversarial example can fool the well trained deep neural networks by adding barely imperceptible perturbations to clean images.
Adversarial Training, one of the most direct and effective methods, minimizes the losses of perturbed-data.
We propose the Diversified Initialized Perturbations Adversarial Training (DIP-FAT)
arXiv Detail & Related papers (2020-05-15T15:52:33Z) - Improving the affordability of robustness training for DNNs [11.971637253035107]
We show that the initial phase of adversarial training is redundant and can be replaced with natural training which significantly improves the computational efficiency.
We show that our proposed method can reduce the training time by a factor of up to 2.5 with comparable or better model test accuracy and generalization on various strengths of adversarial attacks.
arXiv Detail & Related papers (2020-02-11T07:29:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.