Defending Adversarial Attacks via Semantic Feature Manipulation
- URL: http://arxiv.org/abs/2002.02007v2
- Date: Wed, 22 Apr 2020 13:14:48 GMT
- Title: Defending Adversarial Attacks via Semantic Feature Manipulation
- Authors: Shuo Wang, Tianle Chen, Surya Nepal, Carsten Rudolph, Marthie Grobler,
Shangyu Chen
- Abstract summary: We propose a one-off and attack-agnostic Feature Manipulation (FM)-Defense to detect and purify adversarial examples.
To enable manipulation of features, a combo-variational autoencoder is applied to learn disentangled latent codes that reveal semantic features.
Experiments show FM-Defense can detect nearly $100%$ of adversarial examples produced by different state-of-the-art adversarial attacks.
- Score: 23.48763375455514
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Machine learning models have demonstrated vulnerability to adversarial
attacks, more specifically misclassification of adversarial examples. In this
paper, we propose a one-off and attack-agnostic Feature Manipulation
(FM)-Defense to detect and purify adversarial examples in an interpretable and
efficient manner. The intuition is that the classification result of a normal
image is generally resistant to non-significant intrinsic feature changes,
e.g., varying thickness of handwritten digits. In contrast, adversarial
examples are sensitive to such changes since the perturbation lacks
transferability. To enable manipulation of features, a combo-variational
autoencoder is applied to learn disentangled latent codes that reveal semantic
features. The resistance to classification change over the morphs, derived by
varying and reconstructing latent codes, is used to detect suspicious inputs.
Further, combo-VAE is enhanced to purify the adversarial examples with good
quality by considering both class-shared and class-unique features. We
empirically demonstrate the effectiveness of detection and the quality of
purified instance. Our experiments on three datasets show that FM-Defense can
detect nearly $100\%$ of adversarial examples produced by different
state-of-the-art adversarial attacks. It achieves more than $99\%$ overall
purification accuracy on the suspicious instances that close the manifold of
normal examples.
Related papers
- Indiscriminate Disruption of Conditional Inference on Multivariate Gaussians [60.22542847840578]
Despite advances in adversarial machine learning, inference for Gaussian models in the presence of an adversary is notably understudied.
We consider a self-interested attacker who wishes to disrupt a decisionmaker's conditional inference and subsequent actions by corrupting a set of evidentiary variables.
To avoid detection, the attacker also desires the attack to appear plausible wherein plausibility is determined by the density of the corrupted evidence.
arXiv Detail & Related papers (2024-11-21T17:46:55Z) - Spatial-Frequency Discriminability for Revealing Adversarial Perturbations [53.279716307171604]
Vulnerability of deep neural networks to adversarial perturbations has been widely perceived in the computer vision community.
Current algorithms typically detect adversarial patterns through discriminative decomposition for natural and adversarial data.
We propose a discriminative detector relying on a spatial-frequency Krawtchouk decomposition.
arXiv Detail & Related papers (2023-05-18T10:18:59Z) - Adversarial Examples Detection with Enhanced Image Difference Features
based on Local Histogram Equalization [20.132066800052712]
We propose an adversarial example detection framework based on a high-frequency information enhancement strategy.
This framework can effectively extract and amplify the feature differences between adversarial examples and normal examples.
arXiv Detail & Related papers (2023-05-08T03:14:01Z) - AdvCheck: Characterizing Adversarial Examples via Local Gradient
Checking [3.425727850372357]
We introduce the concept of local gradient, and reveal that adversarial examples have a larger bound of local gradient than the benign ones.
Specifically, by calculating the local gradient from a few benign examples and noise-added misclassified examples to train a detector, adversarial examples and even misclassified natural inputs can be precisely distinguished from benign ones.
We have validated the AdvCheck's superior performance to the state-of-the-art (SOTA) baselines, with detection rate ($sim times 1.2$) on general adversarial attacks and ($sim times 1.4$) on misclassified natural inputs
arXiv Detail & Related papers (2023-03-25T17:46:09Z) - On the Effect of Adversarial Training Against Invariance-based
Adversarial Examples [0.23624125155742057]
This work addresses the impact of adversarial training with invariance-based adversarial examples on a convolutional neural network (CNN)
We show that when adversarial training with invariance-based and perturbation-based adversarial examples is applied, it should be conducted simultaneously and not consecutively.
arXiv Detail & Related papers (2023-02-16T12:35:37Z) - Improving Adversarial Robustness to Sensitivity and Invariance Attacks
with Deep Metric Learning [80.21709045433096]
A standard method in adversarial robustness assumes a framework to defend against samples crafted by minimally perturbing a sample.
We use metric learning to frame adversarial regularization as an optimal transport problem.
Our preliminary results indicate that regularizing over invariant perturbations in our framework improves both invariant and sensitivity defense.
arXiv Detail & Related papers (2022-11-04T13:54:02Z) - On Trace of PGD-Like Adversarial Attacks [77.75152218980605]
Adversarial attacks pose safety and security concerns for deep learning applications.
We construct Adrial Response Characteristics (ARC) features to reflect the model's gradient consistency.
Our method is intuitive, light-weighted, non-intrusive, and data-undemanding.
arXiv Detail & Related papers (2022-05-19T14:26:50Z) - Towards Defending against Adversarial Examples via Attack-Invariant
Features [147.85346057241605]
Deep neural networks (DNNs) are vulnerable to adversarial noise.
adversarial robustness can be improved by exploiting adversarial examples.
Models trained on seen types of adversarial examples generally cannot generalize well to unseen types of adversarial examples.
arXiv Detail & Related papers (2021-06-09T12:49:54Z) - Self-Supervised Adversarial Example Detection by Disentangled
Representation [16.98476232162835]
We train an autoencoder, assisted by a discriminator network, over both correctly paired class/semantic features and incorrectly paired class/semantic features to reconstruct benign and counterexamples.
This mimics the behavior of adversarial examples and can reduce the unnecessary generalization ability of autoencoder.
Compared with the state-of-the-art self-supervised detection methods, our method exhibits better performance in various measurements.
arXiv Detail & Related papers (2021-05-08T12:48:18Z) - Are L2 adversarial examples intrinsically different? [14.77179227968466]
We unravel the properties that can intrinsically differentiate adversarial examples and normal inputs through theoretical analysis.
We achieve a recovered classification accuracy of up to 99% on MNIST, 89% on CIFAR, and 87% on ImageNet subsets against $L$ attacks.
arXiv Detail & Related papers (2020-02-28T03:42:52Z) - Fundamental Tradeoffs between Invariance and Sensitivity to Adversarial
Perturbations [65.05561023880351]
Adversarial examples are malicious inputs crafted to induce misclassification.
This paper studies a complementary failure mode, invariance-based adversarial examples.
We show that defenses against sensitivity-based attacks actively harm a model's accuracy on invariance-based attacks.
arXiv Detail & Related papers (2020-02-11T18:50:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.