Related papers: Attacking Adversarial Attacks as A Defense

Attacking Adversarial Attacks as A Defense

URL: http://arxiv.org/abs/2106.04938v1
Date: Wed, 9 Jun 2021 09:31:10 GMT
Title: Attacking Adversarial Attacks as A Defense
Authors: Boxi Wu, Heng Pan, Li Shen, Jindong Gu, Shuai Zhao, Zhifeng Li, Deng Cai, Xiaofei He, Wei Liu
Abstract summary: adversarial attacks can fool deep neural networks with imperceptible perturbations. On adversarially-trained models, perturbing adversarial examples with a small random noise may invalidate their misled predictions. We propose to counter attacks by crafting more effective defensive perturbations.
Score: 40.8739589617252
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: It is well known that adversarial attacks can fool deep neural networks with imperceptible perturbations. Although adversarial training significantly improves model robustness, failure cases of defense still broadly exist. In this work, we find that the adversarial attacks can also be vulnerable to small perturbations. Namely, on adversarially-trained models, perturbing adversarial examples with a small random noise may invalidate their misled predictions. After carefully examining state-of-the-art attacks of various kinds, we find that all these attacks have this deficiency to different extents. Enlightened by this finding, we propose to counter attacks by crafting more effective defensive perturbations. Our defensive perturbations leverage the advantage that adversarial training endows the ground-truth class with smaller local Lipschitzness. By simultaneously attacking all the classes, the misled predictions with larger Lipschitzness can be flipped into correct ones. We verify our defensive perturbation with both empirical experiments and theoretical analyses on a linear model. On CIFAR10, it boosts the state-of-the-art model from 66.16% to 72.66% against the four attacks of AutoAttack, including 71.76% to 83.30% against the Square attack. On ImageNet, the top-1 robust accuracy of FastAT is improved from 33.18% to 38.54% under the 100-step PGD attack.

Related papers

Adapting to Evolving Adversaries with Regularized Continual Robust Training [47.93633573641843]
We present theoretical results which show that the gap in a model's robustness against different attacks is bounded by how far each attack perturbs a sample in the model's logit space. Our findings and open-source code lay the groundwork for the deployment of models robust to evolving attacks.
arXiv Detail & Related papers (2025-02-06T17:38:41Z)
Evaluating the Robustness of the "Ensemble Everything Everywhere" Defense [90.7494670101357]
Ensemble everything everywhere is a defense to adversarial examples. We show that this defense is not robust to adversarial attack. We then use standard adaptive attack techniques to reduce the defense's robust accuracy.
arXiv Detail & Related papers (2024-11-22T10:17:32Z)
Protecting against simultaneous data poisoning attacks [14.893813906644153]
Current backdoor defense methods are evaluated against a single attack at a time. We show that simultaneously executed data poisoning attacks can effectively install multiple backdoors in a single model. We develop a new defense, BaDLoss, that is effective in the multi-attack setting.
arXiv Detail & Related papers (2024-08-23T16:57:27Z)
PubDef: Defending Against Transfer Attacks From Public Models [6.0012551318569285]
We propose a new practical threat model where the adversary relies on transfer attacks through publicly available surrogate models. We evaluate the transfer attacks in this setting and propose a specialized defense method based on a game-theoretic perspective. Under this threat model, our defense, PubDef, outperforms the state-of-the-art white-box adversarial training by a large margin with almost no loss in the normal accuracy.
arXiv Detail & Related papers (2023-10-26T17:58:08Z)
Guidance Through Surrogate: Towards a Generic Diagnostic Attack [101.36906370355435]
We develop a guided mechanism to avoid local minima during attack optimization, leading to a novel attack dubbed Guided Projected Gradient Attack (G-PGA) Our modified attack does not require random restarts, large number of attack iterations or search for an optimal step-size. More than an effective attack, G-PGA can be used as a diagnostic tool to reveal elusive robustness due to gradient masking in adversarial defenses.
arXiv Detail & Related papers (2022-12-30T18:45:23Z)
Utilizing Adversarial Targeted Attacks to Boost Adversarial Robustness [10.94463750304394]
Adversarial attacks have been shown to be highly effective at degrading the performance of deep neural networks (DNNs) We propose a novel solution by adopting the recently suggested Predictive Normalized Maximum Likelihood. We extensively evaluate our approach on 16 adversarial attack benchmarks using ResNet-50, WideResNet-28, and a2-layer ConvNet trained with ImageNet, CIFAR10, and MNIST.
arXiv Detail & Related papers (2021-09-04T22:30:49Z)
Adversarial Attack and Defense in Deep Ranking [100.17641539999055]
We propose two attacks against deep ranking systems that can raise or lower the rank of chosen candidates by adversarial perturbations. Conversely, an anti-collapse triplet defense is proposed to improve the ranking model robustness against all proposed attacks. Our adversarial ranking attacks and defenses are evaluated on MNIST, Fashion-MNIST, CUB200-2011, CARS196 and Stanford Online Products datasets.
arXiv Detail & Related papers (2021-06-07T13:41:45Z)
Unified Detection of Digital and Physical Face Attacks [61.6674266994173]
State-of-the-art defense mechanisms against face attacks achieve near perfect accuracies within one of three attack categories, namely adversarial, digital manipulation, or physical spoofs. We propose a unified attack detection framework, namely UniFAD, that can automatically cluster 25 coherent attack types belonging to the three categories.
arXiv Detail & Related papers (2021-04-05T21:08:28Z)
Lagrangian Objective Function Leads to Improved Unforeseen Attack Generalization in Adversarial Training [0.0]
Adversarial training (AT) has been shown effective to reach a robust model against the attack that is used during training. We propose a simple modification to the AT that mitigates the mentioned issue. We show that our attack is faster than other attack schemes that are designed for unseen attack generalization.
arXiv Detail & Related papers (2021-03-29T07:23:46Z)
Optimal Transport as a Defense Against Adversarial Attacks [4.6193503399184275]
Adversarial attacks can find a human-imperceptible perturbation for a given image that will mislead a trained model. Previous work aimed to align original and adversarial image representations in the same way as domain adaptation to improve robustness. We propose to use a loss between distributions that faithfully reflect the ground distance. This leads to SAT (Sinkhorn Adversarial Training), a more robust defense against adversarial attacks.
arXiv Detail & Related papers (2021-02-05T13:24:36Z)
Are Adversarial Examples Created Equal? A Learnable Weighted Minimax Risk for Robustness under Non-uniform Attacks [70.11599738647963]
Adversarial Training is one of the few defenses that withstand strong attacks. Traditional defense mechanisms assume a uniform attack over the examples according to the underlying data distribution. We present a weighted minimax risk optimization that defends against non-uniform attacks.
arXiv Detail & Related papers (2020-10-24T21:20:35Z)
Perceptual Adversarial Robustness: Defense Against Unseen Threat Models [58.47179090632039]
A key challenge in adversarial robustness is the lack of a precise mathematical characterization of human perception. Under the neural perceptual threat model, we develop novel perceptual adversarial attacks and defenses. Because the NPTM is very broad, we find that Perceptual Adrial Training (PAT) against a perceptual attack gives robustness against many other types of adversarial attacks.
arXiv Detail & Related papers (2020-06-22T22:40:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.