Attacking Adversarial Attacks as A Defense
- URL: http://arxiv.org/abs/2106.04938v1
- Date: Wed, 9 Jun 2021 09:31:10 GMT
- Title: Attacking Adversarial Attacks as A Defense
- Authors: Boxi Wu, Heng Pan, Li Shen, Jindong Gu, Shuai Zhao, Zhifeng Li, Deng
Cai, Xiaofei He, Wei Liu
- Abstract summary: adversarial attacks can fool deep neural networks with imperceptible perturbations.
On adversarially-trained models, perturbing adversarial examples with a small random noise may invalidate their misled predictions.
We propose to counter attacks by crafting more effective defensive perturbations.
- Score: 40.8739589617252
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: It is well known that adversarial attacks can fool deep neural networks with
imperceptible perturbations. Although adversarial training significantly
improves model robustness, failure cases of defense still broadly exist. In
this work, we find that the adversarial attacks can also be vulnerable to small
perturbations. Namely, on adversarially-trained models, perturbing adversarial
examples with a small random noise may invalidate their misled predictions.
After carefully examining state-of-the-art attacks of various kinds, we find
that all these attacks have this deficiency to different extents. Enlightened
by this finding, we propose to counter attacks by crafting more effective
defensive perturbations. Our defensive perturbations leverage the advantage
that adversarial training endows the ground-truth class with smaller local
Lipschitzness. By simultaneously attacking all the classes, the misled
predictions with larger Lipschitzness can be flipped into correct ones. We
verify our defensive perturbation with both empirical experiments and
theoretical analyses on a linear model. On CIFAR10, it boosts the
state-of-the-art model from 66.16% to 72.66% against the four attacks of
AutoAttack, including 71.76% to 83.30% against the Square attack. On ImageNet,
the top-1 robust accuracy of FastAT is improved from 33.18% to 38.54% under the
100-step PGD attack.
Related papers
- Protecting against simultaneous data poisoning attacks [14.893813906644153]
Current backdoor defense methods are evaluated against a single attack at a time.
We show that simultaneously executed data poisoning attacks can effectively install multiple backdoors in a single model.
We develop a new defense, BaDLoss, that is effective in the multi-attack setting.
arXiv Detail & Related papers (2024-08-23T16:57:27Z) - PubDef: Defending Against Transfer Attacks From Public Models [6.0012551318569285]
We propose a new practical threat model where the adversary relies on transfer attacks through publicly available surrogate models.
We evaluate the transfer attacks in this setting and propose a specialized defense method based on a game-theoretic perspective.
Under this threat model, our defense, PubDef, outperforms the state-of-the-art white-box adversarial training by a large margin with almost no loss in the normal accuracy.
arXiv Detail & Related papers (2023-10-26T17:58:08Z) - Guidance Through Surrogate: Towards a Generic Diagnostic Attack [101.36906370355435]
We develop a guided mechanism to avoid local minima during attack optimization, leading to a novel attack dubbed Guided Projected Gradient Attack (G-PGA)
Our modified attack does not require random restarts, large number of attack iterations or search for an optimal step-size.
More than an effective attack, G-PGA can be used as a diagnostic tool to reveal elusive robustness due to gradient masking in adversarial defenses.
arXiv Detail & Related papers (2022-12-30T18:45:23Z) - Utilizing Adversarial Targeted Attacks to Boost Adversarial Robustness [10.94463750304394]
Adversarial attacks have been shown to be highly effective at degrading the performance of deep neural networks (DNNs)
We propose a novel solution by adopting the recently suggested Predictive Normalized Maximum Likelihood.
We extensively evaluate our approach on 16 adversarial attack benchmarks using ResNet-50, WideResNet-28, and a2-layer ConvNet trained with ImageNet, CIFAR10, and MNIST.
arXiv Detail & Related papers (2021-09-04T22:30:49Z) - Adversarial Attack and Defense in Deep Ranking [100.17641539999055]
We propose two attacks against deep ranking systems that can raise or lower the rank of chosen candidates by adversarial perturbations.
Conversely, an anti-collapse triplet defense is proposed to improve the ranking model robustness against all proposed attacks.
Our adversarial ranking attacks and defenses are evaluated on MNIST, Fashion-MNIST, CUB200-2011, CARS196 and Stanford Online Products datasets.
arXiv Detail & Related papers (2021-06-07T13:41:45Z) - Unified Detection of Digital and Physical Face Attacks [61.6674266994173]
State-of-the-art defense mechanisms against face attacks achieve near perfect accuracies within one of three attack categories, namely adversarial, digital manipulation, or physical spoofs.
We propose a unified attack detection framework, namely UniFAD, that can automatically cluster 25 coherent attack types belonging to the three categories.
arXiv Detail & Related papers (2021-04-05T21:08:28Z) - Lagrangian Objective Function Leads to Improved Unforeseen Attack
Generalization in Adversarial Training [0.0]
Adversarial training (AT) has been shown effective to reach a robust model against the attack that is used during training.
We propose a simple modification to the AT that mitigates the mentioned issue.
We show that our attack is faster than other attack schemes that are designed for unseen attack generalization.
arXiv Detail & Related papers (2021-03-29T07:23:46Z) - Optimal Transport as a Defense Against Adversarial Attacks [4.6193503399184275]
Adversarial attacks can find a human-imperceptible perturbation for a given image that will mislead a trained model.
Previous work aimed to align original and adversarial image representations in the same way as domain adaptation to improve robustness.
We propose to use a loss between distributions that faithfully reflect the ground distance.
This leads to SAT (Sinkhorn Adversarial Training), a more robust defense against adversarial attacks.
arXiv Detail & Related papers (2021-02-05T13:24:36Z) - Are Adversarial Examples Created Equal? A Learnable Weighted Minimax
Risk for Robustness under Non-uniform Attacks [70.11599738647963]
Adversarial Training is one of the few defenses that withstand strong attacks.
Traditional defense mechanisms assume a uniform attack over the examples according to the underlying data distribution.
We present a weighted minimax risk optimization that defends against non-uniform attacks.
arXiv Detail & Related papers (2020-10-24T21:20:35Z) - Perceptual Adversarial Robustness: Defense Against Unseen Threat Models [58.47179090632039]
A key challenge in adversarial robustness is the lack of a precise mathematical characterization of human perception.
Under the neural perceptual threat model, we develop novel perceptual adversarial attacks and defenses.
Because the NPTM is very broad, we find that Perceptual Adrial Training (PAT) against a perceptual attack gives robustness against many other types of adversarial attacks.
arXiv Detail & Related papers (2020-06-22T22:40:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.