Adversarial Example Games
- URL: http://arxiv.org/abs/2007.00720v6
- Date: Sat, 9 Jan 2021 01:44:02 GMT
- Title: Adversarial Example Games
- Authors: Avishek Joey Bose, Gauthier Gidel, Hugo Berard, Andre Cianflone,
Pascal Vincent, Simon Lacoste-Julien and William L. Hamilton
- Abstract summary: Adrial Example Games (AEG) is a framework that models the crafting of adversarial examples.
AEG provides a new way to design adversarial examples by adversarially training a generator and aversa from a given hypothesis class.
We demonstrate the efficacy of AEG on the MNIST and CIFAR-10 datasets.
- Score: 51.92698856933169
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The existence of adversarial examples capable of fooling trained neural
network classifiers calls for a much better understanding of possible attacks
to guide the development of safeguards against them. This includes attack
methods in the challenging non-interactive blackbox setting, where adversarial
attacks are generated without any access, including queries, to the target
model. Prior attacks in this setting have relied mainly on algorithmic
innovations derived from empirical observations (e.g., that momentum helps),
lacking principled transferability guarantees. In this work, we provide a
theoretical foundation for crafting transferable adversarial examples to entire
hypothesis classes. We introduce Adversarial Example Games (AEG), a framework
that models the crafting of adversarial examples as a min-max game between a
generator of attacks and a classifier. AEG provides a new way to design
adversarial examples by adversarially training a generator and a classifier
from a given hypothesis class (e.g., architecture). We prove that this game has
an equilibrium, and that the optimal generator is able to craft adversarial
examples that can attack any classifier from the corresponding hypothesis
class. We demonstrate the efficacy of AEG on the MNIST and CIFAR-10 datasets,
outperforming prior state-of-the-art approaches with an average relative
improvement of $29.9\%$ and $47.2\%$ against undefended and robust models
(Table 2 & 3) respectively.
Related papers
- Meta Invariance Defense Towards Generalizable Robustness to Unknown Adversarial Attacks [62.036798488144306]
Current defense mainly focuses on the known attacks, but the adversarial robustness to the unknown attacks is seriously overlooked.
We propose an attack-agnostic defense method named Meta Invariance Defense (MID)
We show that MID simultaneously achieves robustness to the imperceptible adversarial perturbations in high-level image classification and attack-suppression in low-level robust image regeneration.
arXiv Detail & Related papers (2024-04-04T10:10:38Z) - AdvDiff: Generating Unrestricted Adversarial Examples using Diffusion Models [7.406040859734522]
Unrestricted adversarial attacks present a serious threat to deep learning models and adversarial defense techniques.
Previous attack methods often directly inject Projected Gradient Descent (PGD) gradients into the sampling of generative models.
We propose a new method, called AdvDiff, to generate unrestricted adversarial examples with diffusion models.
arXiv Detail & Related papers (2023-07-24T03:10:02Z) - Adversarial Attacks are a Surprisingly Strong Baseline for Poisoning
Few-Shot Meta-Learners [28.468089304148453]
We attack amortized meta-learners, which allows us to craft colluding sets of inputs that fool the system's learning algorithm.
We show that in a white box setting, these attacks are very successful and can cause the target model's predictions to become worse than chance.
We explore two hypotheses to explain this: 'overfitting' by the attack, and mismatch between the model on which the attack is generated and that to which the attack is transferred.
arXiv Detail & Related papers (2022-11-23T14:55:44Z) - Understanding the Vulnerability of Skeleton-based Human Activity Recognition via Black-box Attack [53.032801921915436]
Human Activity Recognition (HAR) has been employed in a wide range of applications, e.g. self-driving cars.
Recently, the robustness of skeleton-based HAR methods have been questioned due to their vulnerability to adversarial attacks.
We show such threats exist, even when the attacker only has access to the input/output of the model.
We propose the very first black-box adversarial attack approach in skeleton-based HAR called BASAR.
arXiv Detail & Related papers (2022-11-21T09:51:28Z) - The Space of Adversarial Strategies [6.295859509997257]
Adversarial examples, inputs designed to induce worst-case behavior in machine learning models, have been extensively studied over the past decade.
We propose a systematic approach to characterize worst-case (i.e., optimal) adversaries.
arXiv Detail & Related papers (2022-09-09T20:53:11Z) - Adversarial Robustness of Deep Reinforcement Learning based Dynamic
Recommender Systems [50.758281304737444]
We propose to explore adversarial examples and attack detection on reinforcement learning-based interactive recommendation systems.
We first craft different types of adversarial examples by adding perturbations to the input and intervening on the casual factors.
Then, we augment recommendation systems by detecting potential attacks with a deep learning-based classifier based on the crafted data.
arXiv Detail & Related papers (2021-12-02T04:12:24Z) - Towards A Conceptually Simple Defensive Approach for Few-shot
classifiers Against Adversarial Support Samples [107.38834819682315]
We study a conceptually simple approach to defend few-shot classifiers against adversarial attacks.
We propose a simple attack-agnostic detection method, using the concept of self-similarity and filtering.
Our evaluation on the miniImagenet (MI) and CUB datasets exhibit good attack detection performance.
arXiv Detail & Related papers (2021-10-24T05:46:03Z) - Are Adversarial Examples Created Equal? A Learnable Weighted Minimax
Risk for Robustness under Non-uniform Attacks [70.11599738647963]
Adversarial Training is one of the few defenses that withstand strong attacks.
Traditional defense mechanisms assume a uniform attack over the examples according to the underlying data distribution.
We present a weighted minimax risk optimization that defends against non-uniform attacks.
arXiv Detail & Related papers (2020-10-24T21:20:35Z) - ATRO: Adversarial Training with a Rejection Option [10.36668157679368]
This paper proposes a classification framework with a rejection option to mitigate the performance deterioration caused by adversarial examples.
Applying the adversarial training objective to both a classifier and a rejection function simultaneously, we can choose to abstain from classification when it has insufficient confidence to classify a test data point.
arXiv Detail & Related papers (2020-10-24T14:05:03Z) - Challenging the adversarial robustness of DNNs based on error-correcting
output codes [33.46319608673487]
ECOC-based networks can be attacked quite easily by introducing a small adversarial perturbation.
adversarial examples can be generated in such a way to achieve high probabilities for the predicted target class.
arXiv Detail & Related papers (2020-03-26T12:14:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.