On the reversibility of adversarial attacks
- URL: http://arxiv.org/abs/2206.00772v1
- Date: Wed, 1 Jun 2022 21:18:11 GMT
- Title: On the reversibility of adversarial attacks
- Authors: Chau Yi Li, Ricardo S\'anchez-Matilla, Ali Shahin Shamsabadi, Riccardo
Mazzon, Andrea Cavallaro
- Abstract summary: Adversarial attacks modify images with perturbations that change the prediction of classifiers.
We investigate the predictability of the mapping between the classes predicted for original images and for their corresponding adversarial examples.
We quantify reversibility as the accuracy in retrieving the original class or the true class of an adversarial example.
- Score: 41.94594666541757
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Adversarial attacks modify images with perturbations that change the
prediction of classifiers. These modified images, known as adversarial
examples, expose the vulnerabilities of deep neural network classifiers. In
this paper, we investigate the predictability of the mapping between the
classes predicted for original images and for their corresponding adversarial
examples. This predictability relates to the possibility of retrieving the
original predictions and hence reversing the induced misclassification. We
refer to this property as the reversibility of an adversarial attack, and
quantify reversibility as the accuracy in retrieving the original class or the
true class of an adversarial example. We present an approach that reverses the
effect of an adversarial attack on a classifier using a prior set of
classification results. We analyse the reversibility of state-of-the-art
adversarial attacks on benchmark classifiers and discuss the factors that
affect the reversibility.
Related papers
- On the Effect of Adversarial Training Against Invariance-based
Adversarial Examples [0.23624125155742057]
This work addresses the impact of adversarial training with invariance-based adversarial examples on a convolutional neural network (CNN)
We show that when adversarial training with invariance-based and perturbation-based adversarial examples is applied, it should be conducted simultaneously and not consecutively.
arXiv Detail & Related papers (2023-02-16T12:35:37Z) - Towards A Conceptually Simple Defensive Approach for Few-shot
classifiers Against Adversarial Support Samples [107.38834819682315]
We study a conceptually simple approach to defend few-shot classifiers against adversarial attacks.
We propose a simple attack-agnostic detection method, using the concept of self-similarity and filtering.
Our evaluation on the miniImagenet (MI) and CUB datasets exhibit good attack detection performance.
arXiv Detail & Related papers (2021-10-24T05:46:03Z) - Localized Uncertainty Attacks [9.36341602283533]
We present localized uncertainty attacks against deep learning models.
We create adversarial examples by perturbing only regions in the inputs where a classifier is uncertain.
Unlike $ell_p$ ball or functional attacks which perturb inputs indiscriminately, our targeted changes can be less perceptible.
arXiv Detail & Related papers (2021-06-17T03:07:22Z) - Towards Defending against Adversarial Examples via Attack-Invariant
Features [147.85346057241605]
Deep neural networks (DNNs) are vulnerable to adversarial noise.
adversarial robustness can be improved by exploiting adversarial examples.
Models trained on seen types of adversarial examples generally cannot generalize well to unseen types of adversarial examples.
arXiv Detail & Related papers (2021-06-09T12:49:54Z) - Learning to Separate Clusters of Adversarial Representations for Robust
Adversarial Detection [50.03939695025513]
We propose a new probabilistic adversarial detector motivated by a recently introduced non-robust feature.
In this paper, we consider the non-robust features as a common property of adversarial examples, and we deduce it is possible to find a cluster in representation space corresponding to the property.
This idea leads us to probability estimate distribution of adversarial representations in a separate cluster, and leverage the distribution for a likelihood based adversarial detector.
arXiv Detail & Related papers (2020-12-07T07:21:18Z) - On the Transferability of Adversarial Attacksagainst Neural Text
Classifier [121.6758865857686]
We investigate the transferability of adversarial examples for text classification models.
We propose a genetic algorithm to find an ensemble of models that can induce adversarial examples to fool almost all existing models.
We derive word replacement rules that can be used for model diagnostics from these adversarial examples.
arXiv Detail & Related papers (2020-11-17T10:45:05Z) - ATRO: Adversarial Training with a Rejection Option [10.36668157679368]
This paper proposes a classification framework with a rejection option to mitigate the performance deterioration caused by adversarial examples.
Applying the adversarial training objective to both a classifier and a rejection function simultaneously, we can choose to abstain from classification when it has insufficient confidence to classify a test data point.
arXiv Detail & Related papers (2020-10-24T14:05:03Z) - Fundamental Tradeoffs between Invariance and Sensitivity to Adversarial
Perturbations [65.05561023880351]
Adversarial examples are malicious inputs crafted to induce misclassification.
This paper studies a complementary failure mode, invariance-based adversarial examples.
We show that defenses against sensitivity-based attacks actively harm a model's accuracy on invariance-based attacks.
arXiv Detail & Related papers (2020-02-11T18:50:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.