Detecting Adversarial Examples Is (Nearly) As Hard As Classifying Them
- URL: http://arxiv.org/abs/2107.11630v1
- Date: Sat, 24 Jul 2021 15:14:53 GMT
- Title: Detecting Adversarial Examples Is (Nearly) As Hard As Classifying Them
- Authors: Florian Tram\`er
- Abstract summary: We prove a general hardness reduction between detection and classification of adversarial examples.
Our reduction is computationally inefficient, and thus cannot be used to build practical classifiers.
It is a useful sanity check to test whether empirical detection results imply something much stronger than the authors presumably anticipated.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Making classifiers robust to adversarial examples is hard. Thus, many
defenses tackle the seemingly easier task of detecting perturbed inputs. We
show a barrier towards this goal. We prove a general hardness reduction between
detection and classification of adversarial examples: given a robust detector
for attacks at distance {\epsilon} (in some metric), we can build a similarly
robust (but inefficient) classifier for attacks at distance {\epsilon}/2. Our
reduction is computationally inefficient, and thus cannot be used to build
practical classifiers. Instead, it is a useful sanity check to test whether
empirical detection results imply something much stronger than the authors
presumably anticipated. To illustrate, we revisit 13 detector defenses. For
11/13 cases, we show that the claimed detection results would imply an
inefficient classifier with robustness far beyond the state-of-the-art.
Related papers
- Towards Robust Domain Generation Algorithm Classification [1.4542411354617986]
We implement 32 white-box attacks, 19 of which are very effective and induce a false-negative rate (FNR) of $approx$ 100% on unhardened classifiers.
We propose a novel training scheme that leverages adversarial latent space vectors and discretized adversarial domains to significantly improve robustness.
arXiv Detail & Related papers (2024-04-09T11:56:29Z) - On the Universal Adversarial Perturbations for Efficient Data-free
Adversarial Detection [55.73320979733527]
We propose a data-agnostic adversarial detection framework, which induces different responses between normal and adversarial samples to UAPs.
Experimental results show that our method achieves competitive detection performance on various text classification tasks.
arXiv Detail & Related papers (2023-06-27T02:54:07Z) - Adversarial Examples Detection with Enhanced Image Difference Features
based on Local Histogram Equalization [20.132066800052712]
We propose an adversarial example detection framework based on a high-frequency information enhancement strategy.
This framework can effectively extract and amplify the feature differences between adversarial examples and normal examples.
arXiv Detail & Related papers (2023-05-08T03:14:01Z) - Identifying Adversarially Attackable and Robust Samples [1.4213973379473654]
Adrial attacks insert small, imperceptible perturbations to input samples that cause large, undesired changes to the output of deep learning models.
This work introduces the notion of sample attackability, where we aim to identify samples that are most susceptible to adversarial attacks.
We propose a deep-learning-based detector to identify the adversarially attackable and robust samples in an unseen dataset for an unseen target model.
arXiv Detail & Related papers (2023-01-30T13:58:14Z) - Illusory Attacks: Information-Theoretic Detectability Matters in Adversarial Attacks [76.35478518372692]
We introduce epsilon-illusory, a novel form of adversarial attack on sequential decision-makers.
Compared to existing attacks, we empirically find epsilon-illusory to be significantly harder to detect with automated methods.
Our findings suggest the need for better anomaly detectors, as well as effective hardware- and system-level defenses.
arXiv Detail & Related papers (2022-07-20T19:49:09Z) - Adversarial Detector with Robust Classifier [14.586106862913553]
We propose a novel adversarial detector, which consists of a robust classifier and a plain one, to highly detect adversarial examples.
In an experiment, the proposed detector is demonstrated to outperform a state-of-the-art detector without any robust classifier.
arXiv Detail & Related papers (2022-02-05T07:21:05Z) - Towards A Conceptually Simple Defensive Approach for Few-shot
classifiers Against Adversarial Support Samples [107.38834819682315]
We study a conceptually simple approach to defend few-shot classifiers against adversarial attacks.
We propose a simple attack-agnostic detection method, using the concept of self-similarity and filtering.
Our evaluation on the miniImagenet (MI) and CUB datasets exhibit good attack detection performance.
arXiv Detail & Related papers (2021-10-24T05:46:03Z) - Adversarially Robust One-class Novelty Detection [83.1570537254877]
We show that existing novelty detectors are susceptible to adversarial examples.
We propose a defense strategy that manipulates the latent space of novelty detectors to improve the robustness against adversarial examples.
arXiv Detail & Related papers (2021-08-25T10:41:29Z) - Detection of Adversarial Supports in Few-shot Classifiers Using Feature
Preserving Autoencoders and Self-Similarity [89.26308254637702]
We propose a detection strategy to highlight adversarial support sets.
We make use of feature preserving autoencoder filtering and also the concept of self-similarity of a support set to perform this detection.
Our method is attack-agnostic and also the first to explore detection for few-shot classifiers to the best of our knowledge.
arXiv Detail & Related papers (2020-12-09T14:13:41Z) - Learning to Separate Clusters of Adversarial Representations for Robust
Adversarial Detection [50.03939695025513]
We propose a new probabilistic adversarial detector motivated by a recently introduced non-robust feature.
In this paper, we consider the non-robust features as a common property of adversarial examples, and we deduce it is possible to find a cluster in representation space corresponding to the property.
This idea leads us to probability estimate distribution of adversarial representations in a separate cluster, and leverage the distribution for a likelihood based adversarial detector.
arXiv Detail & Related papers (2020-12-07T07:21:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.