Deflecting Adversarial Attacks
- URL: http://arxiv.org/abs/2002.07405v1
- Date: Tue, 18 Feb 2020 06:59:13 GMT
- Title: Deflecting Adversarial Attacks
- Authors: Yao Qin, Nicholas Frosst, Colin Raffel, Garrison Cottrell and Geoffrey
Hinton
- Abstract summary: We present a new approach towards ending this cycle where we "deflect" adversarial attacks by causing the attacker to produce an input that resembles the attack's target class.
We first propose a stronger defense based on Capsule Networks that combines three detection mechanisms to achieve state-of-the-art detection performance.
- Score: 94.85315681223702
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: There has been an ongoing cycle where stronger defenses against adversarial
attacks are subsequently broken by a more advanced defense-aware attack. We
present a new approach towards ending this cycle where we "deflect''
adversarial attacks by causing the attacker to produce an input that
semantically resembles the attack's target class. To this end, we first propose
a stronger defense based on Capsule Networks that combines three detection
mechanisms to achieve state-of-the-art detection performance on both standard
and defense-aware attacks. We then show that undetected attacks against our
defense often perceptually resemble the adversarial target class by performing
a human study where participants are asked to label images produced by the
attack. These attack images can no longer be called "adversarial'' because our
network classifies them the same way as humans do.
Related papers
- Counter-Samples: A Stateless Strategy to Neutralize Black Box Adversarial Attacks [2.9815109163161204]
Our paper presents a novel defence against black box attacks, where attackers use the victim model as an oracle to craft their adversarial examples.
Unlike traditional preprocessing defences that rely on sanitizing input samples, our strategy counters the attack process itself.
We demonstrate that our approach is remarkably effective against state-of-the-art black box attacks and outperforms existing defences for both the CIFAR-10 and ImageNet datasets.
arXiv Detail & Related papers (2024-03-14T10:59:54Z) - On the Difficulty of Defending Contrastive Learning against Backdoor
Attacks [58.824074124014224]
We show how contrastive backdoor attacks operate through distinctive mechanisms.
Our findings highlight the need for defenses tailored to the specificities of contrastive backdoor attacks.
arXiv Detail & Related papers (2023-12-14T15:54:52Z) - The Best Defense is a Good Offense: Adversarial Augmentation against
Adversarial Attacks [91.56314751983133]
$A5$ is a framework to craft a defensive perturbation to guarantee that any attack towards the input in hand will fail.
We show effective on-the-fly defensive augmentation with a robustifier network that ignores the ground truth label.
We also show how to apply $A5$ to create certifiably robust physical objects.
arXiv Detail & Related papers (2023-05-23T16:07:58Z) - Game Theoretic Mixed Experts for Combinational Adversarial Machine
Learning [10.368343314144553]
We provide a game-theoretic framework for ensemble adversarial attacks and defenses.
We propose three new attack algorithms, specifically designed to target defenses with randomized transformations, multi-model voting schemes, and adversarial detector architectures.
arXiv Detail & Related papers (2022-11-26T21:35:01Z) - Ares: A System-Oriented Wargame Framework for Adversarial ML [3.197282271064602]
Ares is an evaluation framework for adversarial ML that allows researchers to explore attacks and defenses in a realistic wargame-like environment.
Ares frames the conflict between the attacker and defender as two agents in a reinforcement learning environment with opposing objectives.
This allows the introduction of system-level evaluation metrics such as time to failure and evaluation of complex strategies.
arXiv Detail & Related papers (2022-10-24T04:55:18Z) - Contributor-Aware Defenses Against Adversarial Backdoor Attacks [2.830541450812474]
adversarial backdoor attacks have demonstrated the capability to perform targeted misclassification of specific examples.
We propose a contributor-aware universal defensive framework for learning in the presence of multiple, potentially adversarial data sources.
Our empirical studies demonstrate the robustness of the proposed framework against adversarial backdoor attacks from multiple simultaneous adversaries.
arXiv Detail & Related papers (2022-05-28T20:25:34Z) - Zero-Query Transfer Attacks on Context-Aware Object Detectors [95.18656036716972]
Adversarial attacks perturb images such that a deep neural network produces incorrect classification results.
A promising approach to defend against adversarial attacks on natural multi-object scenes is to impose a context-consistency check.
We present the first approach for generating context-consistent adversarial attacks that can evade the context-consistency check.
arXiv Detail & Related papers (2022-03-29T04:33:06Z) - Adversarial Attack and Defense in Deep Ranking [100.17641539999055]
We propose two attacks against deep ranking systems that can raise or lower the rank of chosen candidates by adversarial perturbations.
Conversely, an anti-collapse triplet defense is proposed to improve the ranking model robustness against all proposed attacks.
Our adversarial ranking attacks and defenses are evaluated on MNIST, Fashion-MNIST, CUB200-2011, CARS196 and Stanford Online Products datasets.
arXiv Detail & Related papers (2021-06-07T13:41:45Z) - Defenses Against Multi-Sticker Physical Domain Attacks on Classifiers [24.809185168969066]
One important attack can fool a classifier by placing black and white stickers on an object such as a road sign.
There are currently no defenses designed to protect against this attack.
In this paper, we propose new defenses that can protect against multi-sticker attacks.
arXiv Detail & Related papers (2021-01-26T19:59:28Z) - Guided Adversarial Attack for Evaluating and Enhancing Adversarial
Defenses [59.58128343334556]
We introduce a relaxation term to the standard loss, that finds more suitable gradient-directions, increases attack efficacy and leads to more efficient adversarial training.
We propose Guided Adversarial Margin Attack (GAMA), which utilizes function mapping of the clean image to guide the generation of adversaries.
We also propose Guided Adversarial Training (GAT), which achieves state-of-the-art performance amongst single-step defenses.
arXiv Detail & Related papers (2020-11-30T16:39:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.