Attack to Fool and Explain Deep Networks
- URL: http://arxiv.org/abs/2106.10606v1
- Date: Sun, 20 Jun 2021 03:07:36 GMT
- Title: Attack to Fool and Explain Deep Networks
- Authors: Naveed Akhtar, Muhammad A. A. K. Jalwana, Mohammed Bennamoun, Ajmal
Mian
- Abstract summary: We counter-argue by providing evidence of human-meaningful patterns in adversarial perturbations.
Our major contribution is a novel pragmatic adversarial attack that is subsequently transformed into a tool to interpret the visual models.
- Score: 59.97135687719244
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep visual models are susceptible to adversarial perturbations to inputs.
Although these signals are carefully crafted, they still appear noise-like
patterns to humans. This observation has led to the argument that deep visual
representation is misaligned with human perception. We counter-argue by
providing evidence of human-meaningful patterns in adversarial perturbations.
We first propose an attack that fools a network to confuse a whole category of
objects (source class) with a target label. Our attack also limits the
unintended fooling by samples from non-sources classes, thereby circumscribing
human-defined semantic notions for network fooling. We show that the proposed
attack not only leads to the emergence of regular geometric patterns in the
perturbations, but also reveals insightful information about the decision
boundaries of deep models. Exploring this phenomenon further, we alter the
`adversarial' objective of our attack to use it as a tool to `explain' deep
visual representation. We show that by careful channeling and projection of the
perturbations computed by our method, we can visualize a model's understanding
of human-defined semantic notions. Finally, we exploit the explanability
properties of our perturbations to perform image generation, inpainting and
interactive image manipulation by attacking adversarialy robust
`classifiers'.In all, our major contribution is a novel pragmatic adversarial
attack that is subsequently transformed into a tool to interpret the visual
models. The article also makes secondary contributions in terms of establishing
the utility of our attack beyond the adversarial objective with multiple
interesting applications.
Related papers
- Investigating Human-Identifiable Features Hidden in Adversarial
Perturbations [54.39726653562144]
Our study explores up to five attack algorithms across three datasets.
We identify human-identifiable features in adversarial perturbations.
Using pixel-level annotations, we extract such features and demonstrate their ability to compromise target models.
arXiv Detail & Related papers (2023-09-28T22:31:29Z) - Mitigating Adversarial Attacks in Deepfake Detection: An Exploration of
Perturbation and AI Techniques [1.0718756132502771]
adversarial examples are subtle perturbations artfully injected into clean images or videos.
Deepfakes have emerged as a potent tool to manipulate public opinion and tarnish the reputations of public figures.
This article delves into the multifaceted world of adversarial examples, elucidating the underlying principles behind their capacity to deceive deep learning algorithms.
arXiv Detail & Related papers (2023-02-22T23:48:19Z) - Robust Feature-Level Adversaries are Interpretability Tools [17.72884349429452]
Recent work that manipulates latent representations of image generators to create "feature-level" adversarial perturbations gives us an opportunity to explore perceptible, interpretable adversarial attacks.
We show that these adversaries are uniquely versatile and highly robust.
They can be used to produce targeted, universal, disguised, physically-realizable, and black-box attacks at the ImageNet scale.
arXiv Detail & Related papers (2021-10-07T16:33:11Z) - Online Alternate Generator against Adversarial Attacks [144.45529828523408]
Deep learning models are notoriously sensitive to adversarial examples which are synthesized by adding quasi-perceptible noises on real images.
We propose a portable defense method, online alternate generator, which does not need to access or modify the parameters of the target networks.
The proposed method works by online synthesizing another image from scratch for an input image, instead of removing or destroying adversarial noises.
arXiv Detail & Related papers (2020-09-17T07:11:16Z) - Bluff: Interactively Deciphering Adversarial Attacks on Deep Neural
Networks [21.074988013822566]
Bluff is an interactive system for visualizing, characterizing, and deciphering adversarial attacks on vision-based neural networks.
It reveals mechanisms that adversarial attacks employ to inflict harm on a model.
arXiv Detail & Related papers (2020-09-05T22:08:35Z) - Stylized Adversarial Defense [105.88250594033053]
adversarial training creates perturbation patterns and includes them in the training set to robustify the model.
We propose to exploit additional information from the feature space to craft stronger adversaries.
Our adversarial training approach demonstrates strong robustness compared to state-of-the-art defenses.
arXiv Detail & Related papers (2020-07-29T08:38:10Z) - Adversarial Attacks and Defenses: An Interpretation Perspective [80.23908920686625]
We review recent work on adversarial attacks and defenses, particularly from the perspective of machine learning interpretation.
The goal of model interpretation, or interpretable machine learning, is to extract human-understandable terms for the working mechanism of models.
For each type of interpretation, we elaborate on how it could be used for adversarial attacks and defenses.
arXiv Detail & Related papers (2020-04-23T23:19:00Z) - Towards Achieving Adversarial Robustness by Enforcing Feature
Consistency Across Bit Planes [51.31334977346847]
We train networks to form coarse impressions based on the information in higher bit planes, and use the lower bit planes only to refine their prediction.
We demonstrate that, by imposing consistency on the representations learned across differently quantized images, the adversarial robustness of networks improves significantly.
arXiv Detail & Related papers (2020-04-01T09:31:10Z) - Generating Semantic Adversarial Examples via Feature Manipulation [23.48763375455514]
We propose a more practical adversarial attack by designing structured perturbation with semantic meanings.
Our proposed technique manipulates the semantic attributes of images via the disentangled latent codes.
We demonstrate the existence of a universal, image-agnostic semantic adversarial example.
arXiv Detail & Related papers (2020-01-06T06:28:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.