DAFAR: Detecting Adversaries by Feedback-Autoencoder Reconstruction
- URL: http://arxiv.org/abs/2103.06487v1
- Date: Thu, 11 Mar 2021 06:18:50 GMT
- Title: DAFAR: Detecting Adversaries by Feedback-Autoencoder Reconstruction
- Authors: Haowen Liu, Ping Yi, Hsiao-Ying Lin, Jie Shi
- Abstract summary: DAFAR allows deep learning models to detect adversarial examples in high accuracy and universality.
It transforms imperceptible-perturbation attack on the target network directly into obvious reconstruction-error attack on the feedback autoencoder.
Experiments show that DAFAR is effective against popular and arguably most advanced attacks without losing performance on legitimate samples.
- Score: 7.867922462470315
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep learning has shown impressive performance on challenging perceptual
tasks. However, researchers found deep neural networks vulnerable to
adversarial examples. Since then, many methods are proposed to defend against
or detect adversarial examples, but they are either attack-dependent or shown
to be ineffective with new attacks.
We propose DAFAR, a feedback framework that allows deep learning models to
detect adversarial examples in high accuracy and universality. DAFAR has a
relatively simple structure, which contains a target network, a plug-in
feedback network and an autoencoder-based detector. The key idea is to capture
the high-level features extracted by the target network, and then reconstruct
the input using the feedback network. These two parts constitute a feedback
autoencoder. It transforms the imperceptible-perturbation attack on the target
network directly into obvious reconstruction-error attack on the feedback
autoencoder. Finally the detector gives an anomaly score and determines whether
the input is adversarial according to the reconstruction errors. Experiments
are conducted on MNIST and CIFAR-10 data-sets. Experimental results show that
DAFAR is effective against popular and arguably most advanced attacks without
losing performance on legitimate samples, with high accuracy and universality
across attack methods and parameters.
Related papers
- AdvQDet: Detecting Query-Based Adversarial Attacks with Adversarial Contrastive Prompt Tuning [93.77763753231338]
Adversarial Contrastive Prompt Tuning (ACPT) is proposed to fine-tune the CLIP image encoder to extract similar embeddings for any two intermediate adversarial queries.
We show that ACPT can detect 7 state-of-the-art query-based attacks with $>99%$ detection rate within 5 shots.
We also show that ACPT is robust to 3 types of adaptive attacks.
arXiv Detail & Related papers (2024-08-04T09:53:50Z) - New Adversarial Image Detection Based on Sentiment Analysis [37.139957973240264]
adversarial attack models, e.g., DeepFool, are on the rise and outrunning adversarial example detection techniques.
This paper presents a new adversarial example detector that outperforms state-of-the-art detectors in identifying the latest adversarial attacks on image datasets.
arXiv Detail & Related papers (2023-05-03T14:32:21Z) - Object-fabrication Targeted Attack for Object Detection [54.10697546734503]
adversarial attack for object detection contains targeted attack and untargeted attack.
New object-fabrication targeted attack mode can mislead detectors tofabricate extra false objects with specific target labels.
arXiv Detail & Related papers (2022-12-13T08:42:39Z) - Nowhere to Hide: A Lightweight Unsupervised Detector against Adversarial
Examples [14.332434280103667]
Adversarial examples are generated by adding slight but maliciously crafted perturbations to benign images.
In this paper, we propose an AutoEncoder-based Adversarial Examples detector.
We show empirically that the AEAE is unsupervised and inexpensive against the most state-of-the-art attacks.
arXiv Detail & Related papers (2022-10-16T16:29:47Z) - On Trace of PGD-Like Adversarial Attacks [77.75152218980605]
Adversarial attacks pose safety and security concerns for deep learning applications.
We construct Adrial Response Characteristics (ARC) features to reflect the model's gradient consistency.
Our method is intuitive, light-weighted, non-intrusive, and data-undemanding.
arXiv Detail & Related papers (2022-05-19T14:26:50Z) - Universal Adversarial Examples in Remote Sensing: Methodology and
Benchmark [17.13291434132985]
We propose a novel black-box adversarial attack method, namely Mixup-Attack, and its simple variant Mixcut-Attack, for remote sensing data.
Despite their simplicity, the proposed methods can generate transferable adversarial examples that deceive most of the state-of-the-art deep neural networks.
We provide the generated universal adversarial examples in the dataset named UAE-RS, which is the first dataset that provides black-box adversarial samples in the remote sensing field.
arXiv Detail & Related papers (2022-02-14T21:52:45Z) - Improving the Adversarial Robustness for Speaker Verification by Self-Supervised Learning [95.60856995067083]
This work is among the first to perform adversarial defense for ASV without knowing the specific attack algorithms.
We propose to perform adversarial defense from two perspectives: 1) adversarial perturbation purification and 2) adversarial perturbation detection.
Experimental results show that our detection module effectively shields the ASV by detecting adversarial samples with an accuracy of around 80%.
arXiv Detail & Related papers (2021-06-01T07:10:54Z) - Self-Supervised Adversarial Example Detection by Disentangled
Representation [16.98476232162835]
We train an autoencoder, assisted by a discriminator network, over both correctly paired class/semantic features and incorrectly paired class/semantic features to reconstruct benign and counterexamples.
This mimics the behavior of adversarial examples and can reduce the unnecessary generalization ability of autoencoder.
Compared with the state-of-the-art self-supervised detection methods, our method exhibits better performance in various measurements.
arXiv Detail & Related papers (2021-05-08T12:48:18Z) - Detection of Adversarial Supports in Few-shot Classifiers Using Feature
Preserving Autoencoders and Self-Similarity [89.26308254637702]
We propose a detection strategy to highlight adversarial support sets.
We make use of feature preserving autoencoder filtering and also the concept of self-similarity of a support set to perform this detection.
Our method is attack-agnostic and also the first to explore detection for few-shot classifiers to the best of our knowledge.
arXiv Detail & Related papers (2020-12-09T14:13:41Z) - Open-set Adversarial Defense [93.25058425356694]
We show that open-set recognition systems are vulnerable to adversarial attacks.
Motivated by this observation, we emphasize the need of an Open-Set Adrial Defense (OSAD) mechanism.
This paper proposes an Open-Set Defense Network (OSDN) as a solution to the OSAD problem.
arXiv Detail & Related papers (2020-09-02T04:35:33Z) - Category-wise Attack: Transferable Adversarial Examples for Anchor Free
Object Detection [38.813947369401525]
We present an effective and efficient algorithm to generate adversarial examples to attack anchor-free object models.
Surprisingly, the generated adversarial examples it not only able to effectively attack the targeted anchor-free object detector but also to be transferred to attack other object detectors.
arXiv Detail & Related papers (2020-02-10T04:49:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.