Adversarial Doodles: Interpretable and Human-drawable Attacks Provide
Describable Insights
- URL: http://arxiv.org/abs/2311.15994v2
- Date: Tue, 28 Nov 2023 03:59:35 GMT
- Title: Adversarial Doodles: Interpretable and Human-drawable Attacks Provide
Describable Insights
- Authors: Ryoya Nara and Yusuke Matsui
- Abstract summary: We propose Adversarial Doodles, which have interpretable shapes.
We obtain compact attacks that cause misclassification even when humans replicate them by hand.
- Score: 14.832208701208414
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: DNN-based image classification models are susceptible to adversarial attacks.
Most previous adversarial attacks do not focus on the interpretability of the
generated adversarial examples, and we cannot gain insights into the mechanism
of the target classifier from the attacks. Therefore, we propose Adversarial
Doodles, which have interpretable shapes. We optimize black b\'ezier curves to
fool the target classifier by overlaying them onto the input image. By
introducing random perspective transformation and regularizing the doodled
area, we obtain compact attacks that cause misclassification even when humans
replicate them by hand. Adversarial doodles provide describable and intriguing
insights into the relationship between our attacks and the classifier's output.
We utilize adversarial doodles and discover the bias inherent in the target
classifier, such as "We add two strokes on its head, a triangle onto its body,
and two lines inside the triangle on a bird image. Then, the classifier
misclassifies the image as a butterfly."
Related papers
- Understanding the Vulnerability of Skeleton-based Human Activity Recognition via Black-box Attack [53.032801921915436]
Human Activity Recognition (HAR) has been employed in a wide range of applications, e.g. self-driving cars.
Recently, the robustness of skeleton-based HAR methods have been questioned due to their vulnerability to adversarial attacks.
We show such threats exist, even when the attacker only has access to the input/output of the model.
We propose the very first black-box adversarial attack approach in skeleton-based HAR called BASAR.
arXiv Detail & Related papers (2022-11-21T09:51:28Z) - GAMA: Generative Adversarial Multi-Object Scene Attacks [48.33120361498787]
This paper presents the first approach of using generative models for adversarial attacks on multi-object scenes.
We call this attack approach Generative Adversarial Multi-object scene Attacks (GAMA)
arXiv Detail & Related papers (2022-09-20T06:40:54Z) - Robust Feature-Level Adversaries are Interpretability Tools [17.72884349429452]
Recent work that manipulates latent representations of image generators to create "feature-level" adversarial perturbations gives us an opportunity to explore perceptible, interpretable adversarial attacks.
We show that these adversaries are uniquely versatile and highly robust.
They can be used to produce targeted, universal, disguised, physically-realizable, and black-box attacks at the ImageNet scale.
arXiv Detail & Related papers (2021-10-07T16:33:11Z) - Attack to Fool and Explain Deep Networks [59.97135687719244]
We counter-argue by providing evidence of human-meaningful patterns in adversarial perturbations.
Our major contribution is a novel pragmatic adversarial attack that is subsequently transformed into a tool to interpret the visual models.
arXiv Detail & Related papers (2021-06-20T03:07:36Z) - Poisoned classifiers are not only backdoored, they are fundamentally
broken [84.67778403778442]
Under a commonly-studied backdoor poisoning attack against classification models, an attacker adds a small trigger to a subset of the training data.
It is often assumed that the poisoned classifier is vulnerable exclusively to the adversary who possesses the trigger.
In this paper, we show empirically that this view of backdoored classifiers is incorrect.
arXiv Detail & Related papers (2020-10-18T19:42:44Z) - Learning to Attack with Fewer Pixels: A Probabilistic Post-hoc Framework
for Refining Arbitrary Dense Adversarial Attacks [21.349059923635515]
adversarial evasion attacks are reported to be susceptible to deep neural network image classifiers.
We propose a probabilistic post-hoc framework that refines given dense attacks by significantly reducing the number of perturbed pixels.
Our framework performs adversarial attacks much faster than existing sparse attacks.
arXiv Detail & Related papers (2020-10-13T02:51:10Z) - Double Targeted Universal Adversarial Perturbations [83.60161052867534]
We introduce a double targeted universal adversarial perturbations (DT-UAPs) to bridge the gap between the instance-discriminative image-dependent perturbations and the generic universal perturbations.
We show the effectiveness of the proposed DTA algorithm on a wide range of datasets and also demonstrate its potential as a physical attack.
arXiv Detail & Related papers (2020-10-07T09:08:51Z) - Towards Feature Space Adversarial Attack [18.874224858723494]
We propose a new adversarial attack to Deep Neural Networks for image classification.
Our attack focuses on perturbing abstract features, more specifically, features that denote styles.
We show that our attack can generate adversarial samples that are more natural-looking than the state-of-the-art attacks.
arXiv Detail & Related papers (2020-04-26T13:56:31Z) - Deflecting Adversarial Attacks [94.85315681223702]
We present a new approach towards ending this cycle where we "deflect" adversarial attacks by causing the attacker to produce an input that resembles the attack's target class.
We first propose a stronger defense based on Capsule Networks that combines three detection mechanisms to achieve state-of-the-art detection performance.
arXiv Detail & Related papers (2020-02-18T06:59:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.