Related papers: Are L2 adversarial examples intrinsically different?

Are L2 adversarial examples intrinsically different?

URL: http://arxiv.org/abs/2002.12527v2
Date: Mon, 7 Sep 2020 04:37:21 GMT
Title: Are L2 adversarial examples intrinsically different?
Authors: Mingxuan Li, Jingyuan Wang, Yufan Wu
Abstract summary: We unravel the properties that can intrinsically differentiate adversarial examples and normal inputs through theoretical analysis. We achieve a recovered classification accuracy of up to 99% on MNIST, 89% on CIFAR, and 87% on ImageNet subsets against $L$ attacks.
Score: 14.77179227968466
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deep Neural Network (DDN) has achieved notable success in various tasks, including many security concerning scenarios. However, a considerable amount of work has proved its vulnerability to adversaries. We unravel the properties that can intrinsically differentiate adversarial examples and normal inputs through theoretical analysis. That is, adversarial examples generated by $L_2$ attacks usually have larger input sensitivity which can be used to identify them efficiently. We also found that those generated by $L_\infty$ attacks will be different enough in the pixel domain to be detected empirically. To verify our analysis, we proposed a \textbf{G}uided \textbf{C}omplementary \textbf{D}efense module (\textbf{GCD}) integrating detection and recovery processes. When compared with adversarial detection methods, our detector achieves a detection AUC of over 0.98 against most of the attacks. When comparing our guided rectifier with commonly used adversarial training methods and other rectification methods, our rectifier outperforms them by a large margin. We achieve a recovered classification accuracy of up to 99\% on MNIST, 89\% on CIFAR-10, and 87\% on ImageNet subsets against $L_2$ attacks. Furthermore, under the white-box setting, our holistic defensive module shows a promising degree of robustness. Thus, we confirm that at least $L_2$ adversarial examples are intrinsically different enough from normal inputs both theoretically and empirically. And we shed light upon designing simple yet effective defensive methods with these properties.

Related papers

Towards Model Resistant to Transferable Adversarial Examples via Trigger Activation [95.3977252782181]
Adversarial examples, characterized by imperceptible perturbations, pose significant threats to deep neural networks by misleading their predictions. We introduce a novel training paradigm aimed at enhancing robustness against transferable adversarial examples (TAEs) in a more efficient and effective way.
arXiv Detail & Related papers (2025-04-20T09:07:10Z)
Any Target Can be Offense: Adversarial Example Generation via Generalized Latent Infection [83.72430401516674]
GAKer is able to construct adversarial examples to any target class. Our method achieves an approximately $14.13%$ higher attack success rate for unknown classes.
arXiv Detail & Related papers (2024-07-17T03:24:09Z)
Robust width: A lightweight and certifiable adversarial defense [0.0]
Adversarial examples are intentionally constructed to cause the model to make incorrect predictions or classifications. In this work, we study an adversarial defense based on the robust width property (RWP), which was recently introduced for compressed sensing. We show that a specific input purification scheme based on the RWP gives theoretical robustness guarantees for images that are approximately sparse.
arXiv Detail & Related papers (2024-05-24T22:50:50Z)
Latent Feature Relation Consistency for Adversarial Robustness [80.24334635105829]
misclassification will occur when deep neural networks predict adversarial examples which add human-imperceptible adversarial noise to natural examples. We propose textbfLatent textbfFeature textbfRelation textbfConsistency (textbfLFRC) LFRC constrains the relation of adversarial examples in latent space to be consistent with the natural examples.
arXiv Detail & Related papers (2023-03-29T13:50:01Z)
AdvCheck: Characterizing Adversarial Examples via Local Gradient Checking [3.425727850372357]
We introduce the concept of local gradient, and reveal that adversarial examples have a larger bound of local gradient than the benign ones. Specifically, by calculating the local gradient from a few benign examples and noise-added misclassified examples to train a detector, adversarial examples and even misclassified natural inputs can be precisely distinguished from benign ones. We have validated the AdvCheck's superior performance to the state-of-the-art (SOTA) baselines, with detection rate ($sim times 1.2$) on general adversarial attacks and ($sim times 1.4$) on misclassified natural inputs
arXiv Detail & Related papers (2023-03-25T17:46:09Z)
Detection and Mitigation of Byzantine Attacks in Distributed Training [24.951227624475443]
An abnormal Byzantine behavior of the worker nodes can derail the training and compromise the quality of the inference. Recent work considers a wide range of attack models and has explored robust aggregation and/or computational redundancy to correct the distorted gradients. In this work, we consider attack models ranging from strong ones: $q$ omniscient adversaries with full knowledge of the defense protocol that can change from iteration to iteration to weak ones: $q$ randomly chosen adversaries with limited collusion abilities.
arXiv Detail & Related papers (2022-08-17T05:49:52Z)
Versatile Weight Attack via Flipping Limited Bits [68.45224286690932]
We study a novel attack paradigm, which modifies model parameters in the deployment stage. Considering the effectiveness and stealthiness goals, we provide a general formulation to perform the bit-flip based weight attack. We present two cases of the general formulation with different malicious purposes, i.e., single sample attack (SSA) and triggered samples attack (TSA)
arXiv Detail & Related papers (2022-07-25T03:24:58Z)
ADC: Adversarial attacks against object Detection that evade Context consistency checks [55.8459119462263]
We show that even context consistency checks can be brittle to properly crafted adversarial examples. We propose an adaptive framework to generate examples that subvert such defenses. Our results suggest that how to robustly model context and check its consistency, is still an open problem.
arXiv Detail & Related papers (2021-10-24T00:25:09Z)
Sample Efficient Detection and Classification of Adversarial Attacks via Self-Supervised Embeddings [40.332149464256496]
Adrial robustness of deep models is pivotal in ensuring safe deployment in real world settings. We propose a self-supervised method to detect adversarial attacks and classify them to their respective threat models. We use a SimCLR encoder in our experiments, since we show the SimCLR embedding distance is a good proxy for human perceptibility.
arXiv Detail & Related papers (2021-08-30T16:39:52Z)
Self-Supervised Adversarial Example Detection by Disentangled Representation [16.98476232162835]
We train an autoencoder, assisted by a discriminator network, over both correctly paired class/semantic features and incorrectly paired class/semantic features to reconstruct benign and counterexamples. This mimics the behavior of adversarial examples and can reduce the unnecessary generalization ability of autoencoder. Compared with the state-of-the-art self-supervised detection methods, our method exhibits better performance in various measurements.
arXiv Detail & Related papers (2021-05-08T12:48:18Z)
A Self-supervised Approach for Adversarial Robustness [105.88250594033053]
Adversarial examples can cause catastrophic mistakes in Deep Neural Network (DNNs) based vision systems. This paper proposes a self-supervised adversarial training mechanism in the input space. It provides significant robustness against the textbfunseen adversarial attacks.
arXiv Detail & Related papers (2020-06-08T20:42:39Z)
Toward Adversarial Robustness via Semi-supervised Robust Training [93.36310070269643]
Adrial examples have been shown to be the severe threat to deep neural networks (DNNs) We propose a novel defense method, the robust training (RT), by jointly minimizing two separated risks ($R_stand$ and $R_rob$)
arXiv Detail & Related papers (2020-03-16T02:14:08Z)
Defending Adversarial Attacks via Semantic Feature Manipulation [23.48763375455514]
We propose a one-off and attack-agnostic Feature Manipulation (FM)-Defense to detect and purify adversarial examples. To enable manipulation of features, a combo-variational autoencoder is applied to learn disentangled latent codes that reveal semantic features. Experiments show FM-Defense can detect nearly $100%$ of adversarial examples produced by different state-of-the-art adversarial attacks.
arXiv Detail & Related papers (2020-02-03T23:24:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.