Tricking Adversarial Attacks To Fail
- URL: http://arxiv.org/abs/2006.04504v1
- Date: Mon, 8 Jun 2020 12:22:07 GMT
- Title: Tricking Adversarial Attacks To Fail
- Authors: Blerta Lindqvist
- Abstract summary: Our white-box defense tricks untargeted attacks into becoming attacks targeted at designated target classes.
Our Target Training defense tricks the minimization at the core of untargeted, gradient-based adversarial attacks.
- Score: 0.05076419064097732
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent adversarial defense approaches have failed. Untargeted gradient-based
attacks cause classifiers to choose any wrong class. Our novel white-box
defense tricks untargeted attacks into becoming attacks targeted at designated
target classes. From these target classes, we can derive the real classes. Our
Target Training defense tricks the minimization at the core of untargeted,
gradient-based adversarial attacks: minimize the sum of (1) perturbation and
(2) classifier adversarial loss. Target Training changes the classifier
minimally, and trains it with additional duplicated points (at 0 distance)
labeled with designated classes. These differently-labeled duplicated samples
minimize both terms (1) and (2) of the minimization, steering attack
convergence to samples of designated classes, from which correct classification
is derived. Importantly, Target Training eliminates the need to know the attack
and the overhead of generating adversarial samples of attacks that minimize
perturbations. We obtain an 86.2% accuracy for CW-L2 (confidence=0) in CIFAR10,
exceeding even unsecured classifier accuracy on non-adversarial samples. Target
Training presents a fundamental change in adversarial defense strategy.
Related papers
- Any Target Can be Offense: Adversarial Example Generation via Generalized Latent Infection [83.72430401516674]
GAKer is able to construct adversarial examples to any target class.
Our method achieves an approximately $14.13%$ higher attack success rate for unknown classes.
arXiv Detail & Related papers (2024-07-17T03:24:09Z) - Meta Invariance Defense Towards Generalizable Robustness to Unknown Adversarial Attacks [62.036798488144306]
Current defense mainly focuses on the known attacks, but the adversarial robustness to the unknown attacks is seriously overlooked.
We propose an attack-agnostic defense method named Meta Invariance Defense (MID)
We show that MID simultaneously achieves robustness to the imperceptible adversarial perturbations in high-level image classification and attack-suppression in low-level robust image regeneration.
arXiv Detail & Related papers (2024-04-04T10:10:38Z) - A Generative Approach to Surrogate-based Black-box Attacks [18.37537526008645]
State-of-the-art surrogate-based attacks involve training a discriminative surrogate that mimics the target's outputs.
We propose a generative surrogate that learns the distribution of samples residing on or close to the target's decision boundaries.
The proposed generative approach results in attacks with remarkably high attack success rates on various targets and datasets.
arXiv Detail & Related papers (2024-02-05T05:22:58Z) - Object-fabrication Targeted Attack for Object Detection [54.10697546734503]
adversarial attack for object detection contains targeted attack and untargeted attack.
New object-fabrication targeted attack mode can mislead detectors tofabricate extra false objects with specific target labels.
arXiv Detail & Related papers (2022-12-13T08:42:39Z) - Constrained Gradient Descent: A Powerful and Principled Evasion Attack
Against Neural Networks [19.443306494201334]
We introduce several innovations that make white-box targeted attacks follow the intuition of the attacker's goal.
First, we propose a new loss function that explicitly captures the goal of targeted attacks.
Second, we propose a new attack method that uses a further developed version of our loss function capturing both the misclassification objective and the $L_infty$ distance limit.
arXiv Detail & Related papers (2021-12-28T17:36:58Z) - Towards A Conceptually Simple Defensive Approach for Few-shot
classifiers Against Adversarial Support Samples [107.38834819682315]
We study a conceptually simple approach to defend few-shot classifiers against adversarial attacks.
We propose a simple attack-agnostic detection method, using the concept of self-similarity and filtering.
Our evaluation on the miniImagenet (MI) and CUB datasets exhibit good attack detection performance.
arXiv Detail & Related papers (2021-10-24T05:46:03Z) - Hidden Backdoor Attack against Semantic Segmentation Models [60.0327238844584]
The emphbackdoor attack intends to embed hidden backdoors in deep neural networks (DNNs) by poisoning training data.
We propose a novel attack paradigm, the emphfine-grained attack, where we treat the target label from the object-level instead of the image-level.
Experiments show that the proposed methods can successfully attack semantic segmentation models by poisoning only a small proportion of training data.
arXiv Detail & Related papers (2021-03-06T05:50:29Z) - Target Training Does Adversarial Training Without Adversarial Samples [0.10152838128195464]
adversarial samples are not optimal for steering attack convergence, based on the minimization at the core of adversarial attacks.
Target Training eliminates the need to generate adversarial samples for training against all attacks that minimize perturbation.
Using adversarial samples against attacks that do not minimize perturbation, Target Training exceeds current best defense ($69.1$%) with $76.4$% against CW-L$($kappa=40$) in CIFAR10.
arXiv Detail & Related papers (2021-02-09T14:17:57Z) - Untargeted, Targeted and Universal Adversarial Attacks and Defenses on
Time Series [0.0]
We have performed untargeted, targeted and universal adversarial attacks on UCR time series datasets.
Our results show that deep learning based time series classification models are vulnerable to these attacks.
We also show that universal adversarial attacks have good generalization property as it need only a fraction of the training data.
arXiv Detail & Related papers (2021-01-13T13:00:51Z) - CD-UAP: Class Discriminative Universal Adversarial Perturbation [83.60161052867534]
A single universal adversarial perturbation (UAP) can be added to all natural images to change most of their predicted class labels.
We propose a new universal attack method to generate a single perturbation that fools a target network to misclassify only a chosen group of classes.
arXiv Detail & Related papers (2020-10-07T09:26:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.