Related papers: Adversarial Doodles: Interpretable and Human-drawable Attacks Provide Describable Insights

Adversarial Doodles: Interpretable and Human-drawable Attacks Provide Describable Insights

URL: http://arxiv.org/abs/2311.15994v2
Date: Tue, 28 Nov 2023 03:59:35 GMT
Title: Adversarial Doodles: Interpretable and Human-drawable Attacks Provide Describable Insights
Authors: Ryoya Nara and Yusuke Matsui
Abstract summary: We propose Adversarial Doodles, which have interpretable shapes. We obtain compact attacks that cause misclassification even when humans replicate them by hand.
Score: 14.832208701208414
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: DNN-based image classification models are susceptible to adversarial attacks. Most previous adversarial attacks do not focus on the interpretability of the generated adversarial examples, and we cannot gain insights into the mechanism of the target classifier from the attacks. Therefore, we propose Adversarial Doodles, which have interpretable shapes. We optimize black b\'ezier curves to fool the target classifier by overlaying them onto the input image. By introducing random perspective transformation and regularizing the doodled area, we obtain compact attacks that cause misclassification even when humans replicate them by hand. Adversarial doodles provide describable and intriguing insights into the relationship between our attacks and the classifier's output. We utilize adversarial doodles and discover the bias inherent in the target classifier, such as "We add two strokes on its head, a triangle onto its body, and two lines inside the triangle on a bird image. Then, the classifier misclassifies the image as a butterfly."

Related papers

Web Artifact Attacks Disrupt Vision Language Models [61.59021920232986]
Vision-language models (VLMs) are trained on large-scale, lightly curated web datasets. They learn unintended correlations between semantic concepts and unrelated visual signals. Prior work has weaponized these correlations as an attack vector to manipulate model predictions. We introduce artifact-based attacks: a novel class of manipulations that mislead models using both non-matching text and graphical elements.
arXiv Detail & Related papers (2025-03-17T18:59:29Z)
Robust Feature-Level Adversaries are Interpretability Tools [17.72884349429452]
Recent work that manipulates latent representations of image generators to create "feature-level" adversarial perturbations gives us an opportunity to explore perceptible, interpretable adversarial attacks. We show that these adversaries are uniquely versatile and highly robust. They can be used to produce targeted, universal, disguised, physically-realizable, and black-box attacks at the ImageNet scale.
arXiv Detail & Related papers (2021-10-07T16:33:11Z)
Query-based Adversarial Attacks on Graph with Fake Nodes [32.67989796394633]
We propose a novel adversarial attack by introducing a set of fake nodes to the original graph. Specifically, we query the victim model for each victim node to acquire their most adversarial feature. Our attack is performed in a practical and unnoticeable manner.
arXiv Detail & Related papers (2021-09-27T14:19:17Z)
Real-World Adversarial Examples involving Makeup Application [58.731070632586594]
We propose a physical adversarial attack with the use of full-face makeup. Our attack can effectively overcome manual errors in makeup application, such as color and position-related errors.
arXiv Detail & Related papers (2021-09-04T05:29:28Z)
Attack to Fool and Explain Deep Networks [59.97135687719244]
We counter-argue by providing evidence of human-meaningful patterns in adversarial perturbations. Our major contribution is a novel pragmatic adversarial attack that is subsequently transformed into a tool to interpret the visual models.
arXiv Detail & Related papers (2021-06-20T03:07:36Z)
Poisoned classifiers are not only backdoored, they are fundamentally broken [84.67778403778442]
Under a commonly-studied backdoor poisoning attack against classification models, an attacker adds a small trigger to a subset of the training data. It is often assumed that the poisoned classifier is vulnerable exclusively to the adversary who possesses the trigger. In this paper, we show empirically that this view of backdoored classifiers is incorrect.
arXiv Detail & Related papers (2020-10-18T19:42:44Z)
Learning to Attack with Fewer Pixels: A Probabilistic Post-hoc Framework for Refining Arbitrary Dense Adversarial Attacks [21.349059923635515]
adversarial evasion attacks are reported to be susceptible to deep neural network image classifiers. We propose a probabilistic post-hoc framework that refines given dense attacks by significantly reducing the number of perturbed pixels. Our framework performs adversarial attacks much faster than existing sparse attacks.
arXiv Detail & Related papers (2020-10-13T02:51:10Z)
Adversarial examples are useful too! [47.64219291655723]
I propose a new method to tell whether a model has been subject to a backdoor attack. The idea is to generate adversarial examples, targeted or untargeted, using conventional attacks such as FGSM. It is possible to visually locate the perturbed regions and unveil the attack.
arXiv Detail & Related papers (2020-05-13T01:38:56Z)
Towards Feature Space Adversarial Attack [18.874224858723494]
We propose a new adversarial attack to Deep Neural Networks for image classification. Our attack focuses on perturbing abstract features, more specifically, features that denote styles. We show that our attack can generate adversarial samples that are more natural-looking than the state-of-the-art attacks.
arXiv Detail & Related papers (2020-04-26T13:56:31Z)
Deflecting Adversarial Attacks [94.85315681223702]
We present a new approach towards ending this cycle where we "deflect" adversarial attacks by causing the attacker to produce an input that resembles the attack's target class. We first propose a stronger defense based on Capsule Networks that combines three detection mechanisms to achieve state-of-the-art detection performance.
arXiv Detail & Related papers (2020-02-18T06:59:13Z)
Over-the-Air Adversarial Flickering Attacks against Video Recognition Networks [54.82488484053263]
Deep neural networks for video classification may be subjected to adversarial manipulation. We present a manipulation scheme for fooling video classifiers by introducing a flickering temporal perturbation. The attack was implemented on several target models and the transferability of the attack was demonstrated.
arXiv Detail & Related papers (2020-02-12T17:58:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.