What You See is Not What the Network Infers: Detecting Adversarial
Examples Based on Semantic Contradiction
- URL: http://arxiv.org/abs/2201.09650v1
- Date: Mon, 24 Jan 2022 13:15:31 GMT
- Title: What You See is Not What the Network Infers: Detecting Adversarial
Examples Based on Semantic Contradiction
- Authors: Yijun Yang, Ruiyuan Gao, Yu Li, Qiuxia Lai and Qiang Xu
- Abstract summary: Adversarial examples (AEs) pose severe threats to the applications of deep neural networks (DNNs) to safety-critical domains.
We propose a novel AE detection framework based on the very nature of AEs.
We show that ContraNet outperforms existing solutions by a large margin, especially under adaptive attacks.
- Score: 14.313178290347293
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Adversarial examples (AEs) pose severe threats to the applications of deep
neural networks (DNNs) to safety-critical domains, e.g., autonomous driving.
While there has been a vast body of AE defense solutions, to the best of our
knowledge, they all suffer from some weaknesses, e.g., defending against only a
subset of AEs or causing a relatively high accuracy loss for legitimate inputs.
Moreover, most existing solutions cannot defend against adaptive attacks,
wherein attackers are knowledgeable about the defense mechanisms and craft AEs
accordingly. In this paper, we propose a novel AE detection framework based on
the very nature of AEs, i.e., their semantic information is inconsistent with
the discriminative features extracted by the target DNN model. To be specific,
the proposed solution, namely ContraNet, models such contradiction by first
taking both the input and the inference result to a generator to obtain a
synthetic output and then comparing it against the original input. For
legitimate inputs that are correctly inferred, the synthetic output tries to
reconstruct the input. On the contrary, for AEs, instead of reconstructing the
input, the synthetic output would be created to conform to the wrong label
whenever possible. Consequently, by measuring the distance between the input
and the synthetic output with metric learning, we can differentiate AEs from
legitimate inputs. We perform comprehensive evaluations under various AE attack
scenarios, and experimental results show that ContraNet outperforms existing
solutions by a large margin, especially under adaptive attacks. Moreover, our
analysis shows that successful AEs that can bypass ContraNet tend to have
much-weakened adversarial semantics. We have also shown that ContraNet can be
easily combined with adversarial training techniques to achieve further
improved AE defense capabilities.
Related papers
- Eliminating Catastrophic Overfitting Via Abnormal Adversarial Examples Regularization [50.43319961935526]
Single-step adversarial training (SSAT) has demonstrated the potential to achieve both efficiency and robustness.
SSAT suffers from catastrophic overfitting (CO), a phenomenon that leads to a severely distorted classifier.
In this work, we observe that some adversarial examples generated on the SSAT-trained network exhibit anomalous behaviour.
arXiv Detail & Related papers (2024-04-11T22:43:44Z) - LAMBO: Large AI Model Empowered Edge Intelligence [71.56135386994119]
Next-generation edge intelligence is anticipated to benefit various applications via offloading techniques.
Traditional offloading architectures face several issues, including heterogeneous constraints, partial perception, uncertain generalization, and lack of tractability.
We propose a Large AI Model-Based Offloading (LAMBO) framework with over one billion parameters for solving these problems.
arXiv Detail & Related papers (2023-08-29T07:25:42Z) - On the Transferability of Adversarial Examples between Encrypted Models [20.03508926499504]
We investigate the transferability of models encrypted for adversarially robust defense for the first time.
In an image-classification experiment, the use of encrypted models is confirmed not only to be robust against AEs but to also reduce the influence of AEs.
arXiv Detail & Related papers (2022-09-07T08:50:26Z) - Be Your Own Neighborhood: Detecting Adversarial Example by the
Neighborhood Relations Built on Self-Supervised Learning [64.78972193105443]
This paper presents a novel AE detection framework, named trustworthy for predictions.
performs the detection by distinguishing the AE's abnormal relation with its augmented versions.
An off-the-shelf Self-Supervised Learning (SSL) model is used to extract the representation and predict the label.
arXiv Detail & Related papers (2022-08-31T08:18:44Z) - Detecting and Recovering Adversarial Examples from Extracting Non-robust
and Highly Predictive Adversarial Perturbations [15.669678743693947]
adversarial examples (AEs) are maliciously designed to fool target models.
Deep neural networks (DNNs) have been shown to be vulnerable against adversarial examples.
We propose a model-free AEs detection method, the whole process of which is free from querying the victim model.
arXiv Detail & Related papers (2022-06-30T08:48:28Z) - Do autoencoders need a bottleneck for anomaly detection? [78.24964622317634]
Learning the identity function renders the AEs useless for anomaly detection.
In this work, we investigate the value of non-bottlenecked AEs.
We propose the infinitely-wide AEs as an extreme example of non-bottlenecked AEs.
arXiv Detail & Related papers (2022-02-25T11:57:58Z) - Discriminator-Free Generative Adversarial Attack [87.71852388383242]
Agenerative-based adversarial attacks can get rid of this limitation.
ASymmetric Saliency-based Auto-Encoder (SSAE) generates the perturbations.
The adversarial examples generated by SSAE not only make thewidely-used models collapse, but also achieves good visual quality.
arXiv Detail & Related papers (2021-07-20T01:55:21Z) - The Feasibility and Inevitability of Stealth Attacks [63.14766152741211]
We study new adversarial perturbations that enable an attacker to gain control over decisions in generic Artificial Intelligence systems.
In contrast to adversarial data modification, the attack mechanism we consider here involves alterations to the AI system itself.
arXiv Detail & Related papers (2021-06-26T10:50:07Z) - MixDefense: A Defense-in-Depth Framework for Adversarial Example
Detection Based on Statistical and Semantic Analysis [14.313178290347293]
We propose a multilayer defense-in-depth framework for AE detection, namely MixDefense.
We leverage the noise' features extracted from the inputs to discover the statistical difference between natural images and tampered ones for AE detection.
We show that the proposed MixDefense solution outperforms the existing AE detection techniques by a considerable margin.
arXiv Detail & Related papers (2021-04-20T15:57:07Z) - SLAP: Improving Physical Adversarial Examples with Short-Lived
Adversarial Perturbations [19.14079118174123]
Short-Lived Adrial Perturbations (SLAP) is a novel technique that allows adversaries to realize physically robust real-world AE by using a light projector.
SLAP allows the adversary greater control over the attack compared to adversarial patches.
We study the feasibility of SLAP in the self-driving scenario, targeting both object detector and traffic sign recognition tasks.
arXiv Detail & Related papers (2020-07-08T14:11:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.