Are L2 adversarial examples intrinsically different?
- URL: http://arxiv.org/abs/2002.12527v2
- Date: Mon, 7 Sep 2020 04:37:21 GMT
- Title: Are L2 adversarial examples intrinsically different?
- Authors: Mingxuan Li, Jingyuan Wang, Yufan Wu
- Abstract summary: We unravel the properties that can intrinsically differentiate adversarial examples and normal inputs through theoretical analysis.
We achieve a recovered classification accuracy of up to 99% on MNIST, 89% on CIFAR, and 87% on ImageNet subsets against $L$ attacks.
- Score: 14.77179227968466
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep Neural Network (DDN) has achieved notable success in various tasks,
including many security concerning scenarios. However, a considerable amount of
work has proved its vulnerability to adversaries. We unravel the properties
that can intrinsically differentiate adversarial examples and normal inputs
through theoretical analysis. That is, adversarial examples generated by $L_2$
attacks usually have larger input sensitivity which can be used to identify
them efficiently. We also found that those generated by $L_\infty$ attacks will
be different enough in the pixel domain to be detected empirically. To verify
our analysis, we proposed a \textbf{G}uided \textbf{C}omplementary
\textbf{D}efense module (\textbf{GCD}) integrating detection and recovery
processes. When compared with adversarial detection methods, our detector
achieves a detection AUC of over 0.98 against most of the attacks. When
comparing our guided rectifier with commonly used adversarial training methods
and other rectification methods, our rectifier outperforms them by a large
margin. We achieve a recovered classification accuracy of up to 99\% on MNIST,
89\% on CIFAR-10, and 87\% on ImageNet subsets against $L_2$ attacks.
Furthermore, under the white-box setting, our holistic defensive module shows a
promising degree of robustness. Thus, we confirm that at least $L_2$
adversarial examples are intrinsically different enough from normal inputs both
theoretically and empirically. And we shed light upon designing simple yet
effective defensive methods with these properties.
Related papers
- Any Target Can be Offense: Adversarial Example Generation via Generalized Latent Infection [83.72430401516674]
GAKer is able to construct adversarial examples to any target class.
Our method achieves an approximately $14.13%$ higher attack success rate for unknown classes.
arXiv Detail & Related papers (2024-07-17T03:24:09Z) - Robust width: A lightweight and certifiable adversarial defense [0.0]
Adversarial examples are intentionally constructed to cause the model to make incorrect predictions or classifications.
In this work, we study an adversarial defense based on the robust width property (RWP), which was recently introduced for compressed sensing.
We show that a specific input purification scheme based on the RWP gives theoretical robustness guarantees for images that are approximately sparse.
arXiv Detail & Related papers (2024-05-24T22:50:50Z) - Latent Feature Relation Consistency for Adversarial Robustness [80.24334635105829]
misclassification will occur when deep neural networks predict adversarial examples which add human-imperceptible adversarial noise to natural examples.
We propose textbfLatent textbfFeature textbfRelation textbfConsistency (textbfLFRC)
LFRC constrains the relation of adversarial examples in latent space to be consistent with the natural examples.
arXiv Detail & Related papers (2023-03-29T13:50:01Z) - Detection and Mitigation of Byzantine Attacks in Distributed Training [24.951227624475443]
An abnormal Byzantine behavior of the worker nodes can derail the training and compromise the quality of the inference.
Recent work considers a wide range of attack models and has explored robust aggregation and/or computational redundancy to correct the distorted gradients.
In this work, we consider attack models ranging from strong ones: $q$ omniscient adversaries with full knowledge of the defense protocol that can change from iteration to iteration to weak ones: $q$ randomly chosen adversaries with limited collusion abilities.
arXiv Detail & Related papers (2022-08-17T05:49:52Z) - Versatile Weight Attack via Flipping Limited Bits [68.45224286690932]
We study a novel attack paradigm, which modifies model parameters in the deployment stage.
Considering the effectiveness and stealthiness goals, we provide a general formulation to perform the bit-flip based weight attack.
We present two cases of the general formulation with different malicious purposes, i.e., single sample attack (SSA) and triggered samples attack (TSA)
arXiv Detail & Related papers (2022-07-25T03:24:58Z) - ADC: Adversarial attacks against object Detection that evade Context
consistency checks [55.8459119462263]
We show that even context consistency checks can be brittle to properly crafted adversarial examples.
We propose an adaptive framework to generate examples that subvert such defenses.
Our results suggest that how to robustly model context and check its consistency, is still an open problem.
arXiv Detail & Related papers (2021-10-24T00:25:09Z) - Sample Efficient Detection and Classification of Adversarial Attacks via
Self-Supervised Embeddings [40.332149464256496]
Adrial robustness of deep models is pivotal in ensuring safe deployment in real world settings.
We propose a self-supervised method to detect adversarial attacks and classify them to their respective threat models.
We use a SimCLR encoder in our experiments, since we show the SimCLR embedding distance is a good proxy for human perceptibility.
arXiv Detail & Related papers (2021-08-30T16:39:52Z) - Self-Supervised Adversarial Example Detection by Disentangled
Representation [16.98476232162835]
We train an autoencoder, assisted by a discriminator network, over both correctly paired class/semantic features and incorrectly paired class/semantic features to reconstruct benign and counterexamples.
This mimics the behavior of adversarial examples and can reduce the unnecessary generalization ability of autoencoder.
Compared with the state-of-the-art self-supervised detection methods, our method exhibits better performance in various measurements.
arXiv Detail & Related papers (2021-05-08T12:48:18Z) - A Self-supervised Approach for Adversarial Robustness [105.88250594033053]
Adversarial examples can cause catastrophic mistakes in Deep Neural Network (DNNs) based vision systems.
This paper proposes a self-supervised adversarial training mechanism in the input space.
It provides significant robustness against the textbfunseen adversarial attacks.
arXiv Detail & Related papers (2020-06-08T20:42:39Z) - Toward Adversarial Robustness via Semi-supervised Robust Training [93.36310070269643]
Adrial examples have been shown to be the severe threat to deep neural networks (DNNs)
We propose a novel defense method, the robust training (RT), by jointly minimizing two separated risks ($R_stand$ and $R_rob$)
arXiv Detail & Related papers (2020-03-16T02:14:08Z) - Defending Adversarial Attacks via Semantic Feature Manipulation [23.48763375455514]
We propose a one-off and attack-agnostic Feature Manipulation (FM)-Defense to detect and purify adversarial examples.
To enable manipulation of features, a combo-variational autoencoder is applied to learn disentangled latent codes that reveal semantic features.
Experiments show FM-Defense can detect nearly $100%$ of adversarial examples produced by different state-of-the-art adversarial attacks.
arXiv Detail & Related papers (2020-02-03T23:24:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.