Related papers: Beating Attackers At Their Own Games: Adversarial Example Detection Using Adversarial Gradient Directions

Beating Attackers At Their Own Games: Adversarial Example Detection Using Adversarial Gradient Directions

URL: http://arxiv.org/abs/2012.15386v1
Date: Thu, 31 Dec 2020 01:12:24 GMT
Title: Beating Attackers At Their Own Games: Adversarial Example Detection Using Adversarial Gradient Directions
Authors: Yuhang Wu, Sunpreet S. Arora, Yanhong Wu, Hao Yang
Abstract summary: The proposed method is based on the observation that the directions of adversarial gradients play a key role in characterizing the adversarial space. Experiments conducted on two different databases, CIFAR-10 and ImageNet, show that the proposed detection method achieves 97.9% and 98.6% AUC-ROC on five different adversarial attacks.
Score: 16.993439721743478
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Adversarial examples are input examples that are specifically crafted to deceive machine learning classifiers. State-of-the-art adversarial example detection methods characterize an input example as adversarial either by quantifying the magnitude of feature variations under multiple perturbations or by measuring its distance from estimated benign example distribution. Instead of using such metrics, the proposed method is based on the observation that the directions of adversarial gradients when crafting (new) adversarial examples play a key role in characterizing the adversarial space. Compared to detection methods that use multiple perturbations, the proposed method is efficient as it only applies a single random perturbation on the input example. Experiments conducted on two different databases, CIFAR-10 and ImageNet, show that the proposed detection method achieves, respectively, 97.9% and 98.6% AUC-ROC (on average) on five different adversarial attacks, and outperforms multiple state-of-the-art detection methods. Results demonstrate the effectiveness of using adversarial gradient directions for adversarial example detection.

Related papers

Indiscriminate Disruption of Conditional Inference on Multivariate Gaussians [60.22542847840578]
Despite advances in adversarial machine learning, inference for Gaussian models in the presence of an adversary is notably understudied. We consider a self-interested attacker who wishes to disrupt a decisionmaker's conditional inference and subsequent actions by corrupting a set of evidentiary variables. To avoid detection, the attacker also desires the attack to appear plausible wherein plausibility is determined by the density of the corrupted evidence.
arXiv Detail & Related papers (2024-11-21T17:46:55Z)
On the Universal Adversarial Perturbations for Efficient Data-free Adversarial Detection [55.73320979733527]
We propose a data-agnostic adversarial detection framework, which induces different responses between normal and adversarial samples to UAPs. Experimental results show that our method achieves competitive detection performance on various text classification tasks.
arXiv Detail & Related papers (2023-06-27T02:54:07Z)
Towards Black-box Adversarial Example Detection: A Data Reconstruction-based Method [9.857570123016213]
Black-box attack is a more realistic threat and has led to various black-box adversarial training-based defense methods. To tackle the BAD problem, we propose a data reconstruction-based adversarial example detection method.
arXiv Detail & Related papers (2023-06-03T06:34:17Z)
Adversarial Examples Detection with Enhanced Image Difference Features based on Local Histogram Equalization [20.132066800052712]
We propose an adversarial example detection framework based on a high-frequency information enhancement strategy. This framework can effectively extract and amplify the feature differences between adversarial examples and normal examples.
arXiv Detail & Related papers (2023-05-08T03:14:01Z)
AdvCheck: Characterizing Adversarial Examples via Local Gradient Checking [3.425727850372357]
We introduce the concept of local gradient, and reveal that adversarial examples have a larger bound of local gradient than the benign ones. Specifically, by calculating the local gradient from a few benign examples and noise-added misclassified examples to train a detector, adversarial examples and even misclassified natural inputs can be precisely distinguished from benign ones. We have validated the AdvCheck's superior performance to the state-of-the-art (SOTA) baselines, with detection rate ($sim times 1.2$) on general adversarial attacks and ($sim times 1.4$) on misclassified natural inputs
arXiv Detail & Related papers (2023-03-25T17:46:09Z)
TREATED:Towards Universal Defense against Textual Adversarial Attacks [28.454310179377302]
We propose TREATED, a universal adversarial detection method that can defend against attacks of various perturbation levels without making any assumptions. Extensive experiments on three competitive neural networks and two widely used datasets show that our method achieves better detection performance than baselines.
arXiv Detail & Related papers (2021-09-13T03:31:20Z)
Adversarial Examples Detection beyond Image Space [88.7651422751216]
We find that there exists compliance between perturbations and prediction confidence, which guides us to detect few-perturbation attacks from the aspect of prediction confidence. We propose a method beyond image space by a two-stream architecture, in which the image stream focuses on the pixel artifacts and the gradient stream copes with the confidence artifacts.
arXiv Detail & Related papers (2021-02-23T09:55:03Z)
Learning to Separate Clusters of Adversarial Representations for Robust Adversarial Detection [50.03939695025513]
We propose a new probabilistic adversarial detector motivated by a recently introduced non-robust feature. In this paper, we consider the non-robust features as a common property of adversarial examples, and we deduce it is possible to find a cluster in representation space corresponding to the property. This idea leads us to probability estimate distribution of adversarial representations in a separate cluster, and leverage the distribution for a likelihood based adversarial detector.
arXiv Detail & Related papers (2020-12-07T07:21:18Z)
A Hamiltonian Monte Carlo Method for Probabilistic Adversarial Attack and Learning [122.49765136434353]
We present an effective method, called Hamiltonian Monte Carlo with Accumulated Momentum (HMCAM), aiming to generate a sequence of adversarial examples. We also propose a new generative method called Contrastive Adversarial Training (CAT), which approaches equilibrium distribution of adversarial examples. Both quantitative and qualitative analysis on several natural image datasets and practical systems have confirmed the superiority of the proposed algorithm.
arXiv Detail & Related papers (2020-10-15T16:07:26Z)
Effective and Robust Detection of Adversarial Examples via Benford-Fourier Coefficients [40.9343499298864]
Adrial examples have been well known as a serious threat to deep neural networks (DNNs) In this work, we study the detection of adversarial examples, based on the assumption that the output and internal responses of one model for both adversarial and benign examples follow the generalized Gaussian distribution (GGD) We propose to construct discriminative features via the shape factor for adversarial detection, employing the magnitude of Benford-Fourier coefficients (MBF)
arXiv Detail & Related papers (2020-05-12T05:20:59Z)
Fundamental Tradeoffs between Invariance and Sensitivity to Adversarial Perturbations [65.05561023880351]
Adversarial examples are malicious inputs crafted to induce misclassification. This paper studies a complementary failure mode, invariance-based adversarial examples. We show that defenses against sensitivity-based attacks actively harm a model's accuracy on invariance-based attacks.
arXiv Detail & Related papers (2020-02-11T18:50:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.