Related papers: Two Souls in an Adversarial Image: Towards Universal Adversarial Example Detection using Multi-view Inconsistency

Two Souls in an Adversarial Image: Towards Universal Adversarial Example Detection using Multi-view Inconsistency

URL: http://arxiv.org/abs/2109.12459v1
Date: Sat, 25 Sep 2021 23:47:13 GMT
Title: Two Souls in an Adversarial Image: Towards Universal Adversarial Example Detection using Multi-view Inconsistency
Authors: Sohaib Kiani, Sana Awan, Chao Lan, Fengjun Li, Bo Luo
Abstract summary: In evasion attacks against deep neural networks (DNN), the attacker generates adversarial instances that are visually indistinguishable from benign samples. We propose a novel multi-view adversarial image detector, namely Argos, based on a novel observation. Argos significantly outperforms two representative adversarial detectors in both detection accuracy and robustness.
Score: 10.08837640910022
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In the evasion attacks against deep neural networks (DNN), the attacker generates adversarial instances that are visually indistinguishable from benign samples and sends them to the target DNN to trigger misclassifications. In this paper, we propose a novel multi-view adversarial image detector, namely Argos, based on a novel observation. That is, there exist two "souls" in an adversarial instance, i.e., the visually unchanged content, which corresponds to the true label, and the added invisible perturbation, which corresponds to the misclassified label. Such inconsistencies could be further amplified through an autoregressive generative approach that generates images with seed pixels selected from the original image, a selected label, and pixel distributions learned from the training data. The generated images (i.e., the "views") will deviate significantly from the original one if the label is adversarial, demonstrating inconsistencies that Argos expects to detect. To this end, Argos first amplifies the discrepancies between the visual content of an image and its misclassified label induced by the attack using a set of regeneration mechanisms and then identifies an image as adversarial if the reproduced views deviate to a preset degree. Our experimental results show that Argos significantly outperforms two representative adversarial detectors in both detection accuracy and robustness against six well-known adversarial attacks. Code is available at: https://github.com/sohaib730/Argos-Adversarial_Detection

Related papers

AdvQDet: Detecting Query-Based Adversarial Attacks with Adversarial Contrastive Prompt Tuning [93.77763753231338]
Adversarial Contrastive Prompt Tuning (ACPT) is proposed to fine-tune the CLIP image encoder to extract similar embeddings for any two intermediate adversarial queries. We show that ACPT can detect 7 state-of-the-art query-based attacks with $>99%$ detection rate within 5 shots. We also show that ACPT is robust to 3 types of adaptive attacks.
arXiv Detail & Related papers (2024-08-04T09:53:50Z)
Vulnerabilities in AI-generated Image Detection: The Challenge of Adversarial Attacks [17.87119255294563]
We investigate the vulnerability of state-of-the-art AIGI detectors against adversarial attack under white-box and black-box settings. We propose a new attack containing two main parts. First, inspired by the obvious difference between real images and fake images in the frequency domain, we add perturbations under the frequency domain to push the image away from its original frequency distribution. We show that adversarial attack is truly a real threat to AIGI detectors, because FPBA can deliver successful black-box attacks across models, generators, defense methods, and even evade cross-generator detection.
arXiv Detail & Related papers (2024-07-30T14:07:17Z)
Transparency Attacks: How Imperceptible Image Layers Can Fool AI Perception [0.0]
This paper investigates a novel algorithmic vulnerability when imperceptible image layers confound vision models into arbitrary label assignments and captions. We explore image preprocessing methods to introduce stealth transparency, which triggers AI misinterpretation of what the human eye perceives. The stealth transparency confounds established vision systems, including evading facial recognition and surveillance, digital watermarking, content filtering, dataset curating, automotive and drone autonomy, forensic evidence tampering, and retail product misclassifying.
arXiv Detail & Related papers (2024-01-29T00:52:01Z)
I See Dead People: Gray-Box Adversarial Attack on Image-To-Text Models [0.0]
We present a gray-box adversarial attack on image-to-text, both untargeted and targeted. Our attack operates in a gray-box manner, requiring no knowledge about the decoder module. We also show that our attacks fool the popular open-source platform Hugging Face.
arXiv Detail & Related papers (2023-06-13T07:35:28Z)
Human-imperceptible, Machine-recognizable Images [76.01951148048603]
A major conflict is exposed relating to software engineers between better developing AI systems and distancing from the sensitive training data. This paper proposes an efficient privacy-preserving learning paradigm, where images are encrypted to become human-imperceptible, machine-recognizable'' We show that the proposed paradigm can ensure the encrypted images have become human-imperceptible while preserving machine-recognizable information.
arXiv Detail & Related papers (2023-06-06T13:41:37Z)
Black-Box Attack against GAN-Generated Image Detector with Contrastive Perturbation [0.4297070083645049]
We propose a new black-box attack method against GAN-generated image detectors. A novel contrastive learning strategy is adopted to train the encoder-decoder network based anti-forensic model. The proposed attack effectively reduces the accuracy of three state-of-the-art detectors on six popular GANs.
arXiv Detail & Related papers (2022-11-07T12:56:14Z)
Context-Aware Transfer Attacks for Object Detection [51.65308857232767]
We present a new approach to generate context-aware attacks for object detectors. We show that by using co-occurrence of objects and their relative locations and sizes as context information, we can successfully generate targeted mis-categorization attacks.
arXiv Detail & Related papers (2021-12-06T18:26:39Z)
Discriminator-Free Generative Adversarial Attack [87.71852388383242]
Agenerative-based adversarial attacks can get rid of this limitation. ASymmetric Saliency-based Auto-Encoder (SSAE) generates the perturbations. The adversarial examples generated by SSAE not only make thewidely-used models collapse, but also achieves good visual quality.
arXiv Detail & Related papers (2021-07-20T01:55:21Z)
Detecting Adversarial Examples by Input Transformations, Defense Perturbations, and Voting [71.57324258813674]
convolutional neural networks (CNNs) have proved to reach super-human performance in visual recognition tasks. CNNs can easily be fooled by adversarial examples, i.e., maliciously-crafted images that force the networks to predict an incorrect output. This paper extensively explores the detection of adversarial examples via image transformations and proposes a novel methodology.
arXiv Detail & Related papers (2021-01-27T14:50:41Z)
Breaking certified defenses: Semantic adversarial examples with spoofed robustness certificates [57.52763961195292]
We present a new attack that exploits not only the labelling function of a classifier, but also the certificate generator. The proposed method applies large perturbations that place images far from a class boundary while maintaining the imperceptibility property of adversarial examples.
arXiv Detail & Related papers (2020-03-19T17:59:44Z)
Generating Semantic Adversarial Examples via Feature Manipulation [23.48763375455514]
We propose a more practical adversarial attack by designing structured perturbation with semantic meanings. Our proposed technique manipulates the semantic attributes of images via the disentangled latent codes. We demonstrate the existence of a universal, image-agnostic semantic adversarial example.
arXiv Detail & Related papers (2020-01-06T06:28:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.