Related papers: Nowhere to Hide: A Lightweight Unsupervised Detector against Adversarial Examples

Nowhere to Hide: A Lightweight Unsupervised Detector against Adversarial Examples

URL: http://arxiv.org/abs/2210.08579v1
Date: Sun, 16 Oct 2022 16:29:47 GMT
Title: Nowhere to Hide: A Lightweight Unsupervised Detector against Adversarial Examples
Authors: Hui Liu, Bo Zhao, Kehuan Zhang, Peng Liu
Abstract summary: Adversarial examples are generated by adding slight but maliciously crafted perturbations to benign images. In this paper, we propose an AutoEncoder-based Adversarial Examples detector. We show empirically that the AEAE is unsupervised and inexpensive against the most state-of-the-art attacks.
Score: 14.332434280103667
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Although deep neural networks (DNNs) have shown impressive performance on many perceptual tasks, they are vulnerable to adversarial examples that are generated by adding slight but maliciously crafted perturbations to benign images. Adversarial detection is an important technique for identifying adversarial examples before they are entered into target DNNs. Previous studies to detect adversarial examples either targeted specific attacks or required expensive computation. How design a lightweight unsupervised detector is still a challenging problem. In this paper, we propose an AutoEncoder-based Adversarial Examples (AEAE) detector, that can guard DNN models by detecting adversarial examples with low computation in an unsupervised manner. The AEAE includes only a shallow autoencoder but plays two roles. First, a well-trained autoencoder has learned the manifold of benign examples. This autoencoder can produce a large reconstruction error for adversarial images with large perturbations, so we can detect significantly perturbed adversarial examples based on the reconstruction error. Second, the autoencoder can filter out the small noise and change the DNN's prediction on adversarial examples with small perturbations. It helps to detect slightly perturbed adversarial examples based on the prediction distance. To cover these two cases, we utilize the reconstruction error and prediction distance from benign images to construct a two-tuple feature set and train an adversarial detector using the isolation forest algorithm. We show empirically that the AEAE is unsupervised and inexpensive against the most state-of-the-art attacks. Through the detection in these two cases, there is nowhere to hide adversarial examples.

Related papers

A Few Large Shifts: Layer-Inconsistency Based Minimal Overhead Adversarial Example Detection [9.335304254034401]
We introduce a lightweight, plug-in detection framework that leverages internal layer-wise inconsistencies within the target model itself.<n>Our method achieves state-of-the-art detection performance with negligible computational overhead and no compromise to clean accuracy.
arXiv Detail & Related papers (2025-05-19T00:48:53Z)
HOLMES: to Detect Adversarial Examples with Multiple Detectors [1.455585466338228]
HOLMES is able to distinguish textitunseen adversarial examples from multiple attacks with high accuracy and low false positive rates. Our effective and inexpensive strategies neither modify original DNN models nor require its internal parameters.
arXiv Detail & Related papers (2024-05-30T11:22:55Z)
New Adversarial Image Detection Based on Sentiment Analysis [37.139957973240264]
adversarial attack models, e.g., DeepFool, are on the rise and outrunning adversarial example detection techniques. This paper presents a new adversarial example detector that outperforms state-of-the-art detectors in identifying the latest adversarial attacks on image datasets.
arXiv Detail & Related papers (2023-05-03T14:32:21Z)
ADC: Adversarial attacks against object Detection that evade Context consistency checks [55.8459119462263]
We show that even context consistency checks can be brittle to properly crafted adversarial examples. We propose an adaptive framework to generate examples that subvert such defenses. Our results suggest that how to robustly model context and check its consistency, is still an open problem.
arXiv Detail & Related papers (2021-10-24T00:25:09Z)
Discriminator-Free Generative Adversarial Attack [87.71852388383242]
Agenerative-based adversarial attacks can get rid of this limitation. ASymmetric Saliency-based Auto-Encoder (SSAE) generates the perturbations. The adversarial examples generated by SSAE not only make thewidely-used models collapse, but also achieves good visual quality.
arXiv Detail & Related papers (2021-07-20T01:55:21Z)
Self-Supervised Adversarial Example Detection by Disentangled Representation [16.98476232162835]
We train an autoencoder, assisted by a discriminator network, over both correctly paired class/semantic features and incorrectly paired class/semantic features to reconstruct benign and counterexamples. This mimics the behavior of adversarial examples and can reduce the unnecessary generalization ability of autoencoder. Compared with the state-of-the-art self-supervised detection methods, our method exhibits better performance in various measurements.
arXiv Detail & Related papers (2021-05-08T12:48:18Z)
DAFAR: Detecting Adversaries by Feedback-Autoencoder Reconstruction [7.867922462470315]
DAFAR allows deep learning models to detect adversarial examples in high accuracy and universality. It transforms imperceptible-perturbation attack on the target network directly into obvious reconstruction-error attack on the feedback autoencoder. Experiments show that DAFAR is effective against popular and arguably most advanced attacks without losing performance on legitimate samples.
arXiv Detail & Related papers (2021-03-11T06:18:50Z)
Object Detection Made Simpler by Eliminating Heuristic NMS [70.93004137521946]
We show a simple NMS-free, end-to-end object detection framework. We attain on par or even improved detection accuracy compared with the original one-stage detector.
arXiv Detail & Related papers (2021-01-28T02:38:29Z)
Detecting Adversarial Examples by Input Transformations, Defense Perturbations, and Voting [71.57324258813674]
convolutional neural networks (CNNs) have proved to reach super-human performance in visual recognition tasks. CNNs can easily be fooled by adversarial examples, i.e., maliciously-crafted images that force the networks to predict an incorrect output. This paper extensively explores the detection of adversarial examples via image transformations and proposes a novel methodology.
arXiv Detail & Related papers (2021-01-27T14:50:41Z)
Learning to Separate Clusters of Adversarial Representations for Robust Adversarial Detection [50.03939695025513]
We propose a new probabilistic adversarial detector motivated by a recently introduced non-robust feature. In this paper, we consider the non-robust features as a common property of adversarial examples, and we deduce it is possible to find a cluster in representation space corresponding to the property. This idea leads us to probability estimate distribution of adversarial representations in a separate cluster, and leverage the distribution for a likelihood based adversarial detector.
arXiv Detail & Related papers (2020-12-07T07:21:18Z)
A Self-supervised Approach for Adversarial Robustness [105.88250594033053]
Adversarial examples can cause catastrophic mistakes in Deep Neural Network (DNNs) based vision systems. This paper proposes a self-supervised adversarial training mechanism in the input space. It provides significant robustness against the textbfunseen adversarial attacks.
arXiv Detail & Related papers (2020-06-08T20:42:39Z)
Category-wise Attack: Transferable Adversarial Examples for Anchor Free Object Detection [38.813947369401525]
We present an effective and efficient algorithm to generate adversarial examples to attack anchor-free object models. Surprisingly, the generated adversarial examples it not only able to effectively attack the targeted anchor-free object detector but also to be transferred to attack other object detectors.
arXiv Detail & Related papers (2020-02-10T04:49:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.