Related papers: AdvCheck: Characterizing Adversarial Examples via Local Gradient Checking

AdvCheck: Characterizing Adversarial Examples via Local Gradient Checking

URL: http://arxiv.org/abs/2303.18131v1
Date: Sat, 25 Mar 2023 17:46:09 GMT
Title: AdvCheck: Characterizing Adversarial Examples via Local Gradient Checking
Authors: Ruoxi Chen, Haibo Jin, Jinyin Chen, Haibin Zheng
Abstract summary: We introduce the concept of local gradient, and reveal that adversarial examples have a larger bound of local gradient than the benign ones. Specifically, by calculating the local gradient from a few benign examples and noise-added misclassified examples to train a detector, adversarial examples and even misclassified natural inputs can be precisely distinguished from benign ones. We have validated the AdvCheck's superior performance to the state-of-the-art (SOTA) baselines, with detection rate ($sim times 1.2$) on general adversarial attacks and ($sim times 1.4$) on misclassified natural inputs
Score: 3.425727850372357
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deep neural networks (DNNs) are vulnerable to adversarial examples, which may lead to catastrophe in security-critical domains. Numerous detection methods are proposed to characterize the feature uniqueness of adversarial examples, or to distinguish DNN's behavior activated by the adversarial examples. Detections based on features cannot handle adversarial examples with large perturbations. Besides, they require a large amount of specific adversarial examples. Another mainstream, model-based detections, which characterize input properties by model behaviors, suffer from heavy computation cost. To address the issues, we introduce the concept of local gradient, and reveal that adversarial examples have a quite larger bound of local gradient than the benign ones. Inspired by the observation, we leverage local gradient for detecting adversarial examples, and propose a general framework AdvCheck. Specifically, by calculating the local gradient from a few benign examples and noise-added misclassified examples to train a detector, adversarial examples and even misclassified natural inputs can be precisely distinguished from benign ones. Through extensive experiments, we have validated the AdvCheck's superior performance to the state-of-the-art (SOTA) baselines, with detection rate ($\sim \times 1.2$) on general adversarial attacks and ($\sim \times 1.4$) on misclassified natural inputs on average, with average 1/500 time cost. We also provide interpretable results for successful detection.

Related papers

AED-PADA:Improving Generalizability of Adversarial Example Detection via Principal Adversarial Domain Adaptation [38.55694348512267]
We propose a novel method, named Adversarial Example Detection via Principal Adversarial Domain Adaptation (AED-PADA) Specifically, our approach identifies the Principal Adversarial Domains (PADs) We pioneer to exploit Multi-source Unsupervised Domain Adaptation in adversarial example detection, with PADs as the source domains.
arXiv Detail & Related papers (2024-04-19T05:32:37Z)
Adversarial Examples Detection with Enhanced Image Difference Features based on Local Histogram Equalization [20.132066800052712]
We propose an adversarial example detection framework based on a high-frequency information enhancement strategy. This framework can effectively extract and amplify the feature differences between adversarial examples and normal examples.
arXiv Detail & Related papers (2023-05-08T03:14:01Z)
Latent Feature Relation Consistency for Adversarial Robustness [80.24334635105829]
misclassification will occur when deep neural networks predict adversarial examples which add human-imperceptible adversarial noise to natural examples. We propose textbfLatent textbfFeature textbfRelation textbfConsistency (textbfLFRC) LFRC constrains the relation of adversarial examples in latent space to be consistent with the natural examples.
arXiv Detail & Related papers (2023-03-29T13:50:01Z)
Latent Boundary-guided Adversarial Training [61.43040235982727]
Adrial training is proved to be the most effective strategy that injects adversarial examples into model training. We propose a novel adversarial training framework called LAtent bounDary-guided aDvErsarial tRaining.
arXiv Detail & Related papers (2022-06-08T07:40:55Z)
On Trace of PGD-Like Adversarial Attacks [77.75152218980605]
Adversarial attacks pose safety and security concerns for deep learning applications. We construct Adrial Response Characteristics (ARC) features to reflect the model's gradient consistency. Our method is intuitive, light-weighted, non-intrusive, and data-undemanding.
arXiv Detail & Related papers (2022-05-19T14:26:50Z)
ADC: Adversarial attacks against object Detection that evade Context consistency checks [55.8459119462263]
We show that even context consistency checks can be brittle to properly crafted adversarial examples. We propose an adaptive framework to generate examples that subvert such defenses. Our results suggest that how to robustly model context and check its consistency, is still an open problem.
arXiv Detail & Related papers (2021-10-24T00:25:09Z)
Unsupervised Detection of Adversarial Examples with Model Explanations [0.6091702876917279]
We propose a simple yet effective method to detect adversarial examples using methods developed to explain the model's behavior. Our evaluations with MNIST handwritten dataset show that our method is capable of detecting adversarial examples with high confidence.
arXiv Detail & Related papers (2021-07-22T06:54:18Z)
Adversarial Examples Detection with Bayesian Neural Network [57.185482121807716]
We propose a new framework to detect adversarial examples motivated by the observations that random components can improve the smoothness of predictors. We propose a novel Bayesian adversarial example detector, short for BATer, to improve the performance of adversarial example detection.
arXiv Detail & Related papers (2021-05-18T15:51:24Z)
Detecting Adversarial Examples by Input Transformations, Defense Perturbations, and Voting [71.57324258813674]
convolutional neural networks (CNNs) have proved to reach super-human performance in visual recognition tasks. CNNs can easily be fooled by adversarial examples, i.e., maliciously-crafted images that force the networks to predict an incorrect output. This paper extensively explores the detection of adversarial examples via image transformations and proposes a novel methodology.
arXiv Detail & Related papers (2021-01-27T14:50:41Z)
Beating Attackers At Their Own Games: Adversarial Example Detection Using Adversarial Gradient Directions [16.993439721743478]
The proposed method is based on the observation that the directions of adversarial gradients play a key role in characterizing the adversarial space. Experiments conducted on two different databases, CIFAR-10 and ImageNet, show that the proposed detection method achieves 97.9% and 98.6% AUC-ROC on five different adversarial attacks.
arXiv Detail & Related papers (2020-12-31T01:12:24Z)
Learning to Separate Clusters of Adversarial Representations for Robust Adversarial Detection [50.03939695025513]
We propose a new probabilistic adversarial detector motivated by a recently introduced non-robust feature. In this paper, we consider the non-robust features as a common property of adversarial examples, and we deduce it is possible to find a cluster in representation space corresponding to the property. This idea leads us to probability estimate distribution of adversarial representations in a separate cluster, and leverage the distribution for a likelihood based adversarial detector.
arXiv Detail & Related papers (2020-12-07T07:21:18Z)
Are L2 adversarial examples intrinsically different? [14.77179227968466]
We unravel the properties that can intrinsically differentiate adversarial examples and normal inputs through theoretical analysis. We achieve a recovered classification accuracy of up to 99% on MNIST, 89% on CIFAR, and 87% on ImageNet subsets against $L$ attacks.
arXiv Detail & Related papers (2020-02-28T03:42:52Z)
Defending Adversarial Attacks via Semantic Feature Manipulation [23.48763375455514]
We propose a one-off and attack-agnostic Feature Manipulation (FM)-Defense to detect and purify adversarial examples. To enable manipulation of features, a combo-variational autoencoder is applied to learn disentangled latent codes that reveal semantic features. Experiments show FM-Defense can detect nearly $100%$ of adversarial examples produced by different state-of-the-art adversarial attacks.
arXiv Detail & Related papers (2020-02-03T23:24:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.