Identifying Adversarially Attackable and Robust Samples
- URL: http://arxiv.org/abs/2301.12896v3
- Date: Sun, 25 Jun 2023 01:53:15 GMT
- Title: Identifying Adversarially Attackable and Robust Samples
- Authors: Vyas Raina and Mark Gales
- Abstract summary: Adrial attacks insert small, imperceptible perturbations to input samples that cause large, undesired changes to the output of deep learning models.
This work introduces the notion of sample attackability, where we aim to identify samples that are most susceptible to adversarial attacks.
We propose a deep-learning-based detector to identify the adversarially attackable and robust samples in an unseen dataset for an unseen target model.
- Score: 1.4213973379473654
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Adversarial attacks insert small, imperceptible perturbations to input
samples that cause large, undesired changes to the output of deep learning
models. Despite extensive research on generating adversarial attacks and
building defense systems, there has been limited research on understanding
adversarial attacks from an input-data perspective. This work introduces the
notion of sample attackability, where we aim to identify samples that are most
susceptible to adversarial attacks (attackable samples) and conversely also
identify the least susceptible samples (robust samples). We propose a
deep-learning-based detector to identify the adversarially attackable and
robust samples in an unseen dataset for an unseen target model. Experiments on
standard image classification datasets enables us to assess the portability of
the deep attackability detector across a range of architectures. We find that
the deep attackability detector performs better than simple model
uncertainty-based measures for identifying the attackable/robust samples. This
suggests that uncertainty is an inadequate proxy for measuring sample distance
to a decision boundary. In addition to better understanding adversarial attack
theory, it is found that the ability to identify the adversarially attackable
and robust samples has implications for improving the efficiency of
sample-selection tasks.
Related papers
- Detecting Adversarial Data via Perturbation Forgery [28.637963515748456]
adversarial detection aims to identify and filter out adversarial data from the data flow based on discrepancies in distribution and noise patterns between natural and adversarial data.
New attacks based on generative models with imbalanced and anisotropic noise patterns evade detection.
We propose Perturbation Forgery, which includes noise distribution perturbation, sparse mask generation, and pseudo-adversarial data production, to train an adversarial detector capable of detecting unseen gradient-based, generative-model-based, and physical adversarial attacks.
arXiv Detail & Related papers (2024-05-25T13:34:16Z) - Confidence-driven Sampling for Backdoor Attacks [49.72680157684523]
Backdoor attacks aim to surreptitiously insert malicious triggers into DNN models, granting unauthorized control during testing scenarios.
Existing methods lack robustness against defense strategies and predominantly focus on enhancing trigger stealthiness while randomly selecting poisoned samples.
We introduce a straightforward yet highly effective sampling methodology that leverages confidence scores. Specifically, it selects samples with lower confidence scores, significantly increasing the challenge for defenders in identifying and countering these attacks.
arXiv Detail & Related papers (2023-10-08T18:57:36Z) - On the Universal Adversarial Perturbations for Efficient Data-free
Adversarial Detection [55.73320979733527]
We propose a data-agnostic adversarial detection framework, which induces different responses between normal and adversarial samples to UAPs.
Experimental results show that our method achieves competitive detection performance on various text classification tasks.
arXiv Detail & Related papers (2023-06-27T02:54:07Z) - Sample Attackability in Natural Language Adversarial Attacks [1.4213973379473654]
This work formally extends the definition of sample attackability/robustness for NLP attacks.
Experiments on two popular NLP datasets, four state of the art models and four different NLP adversarial attack methods.
arXiv Detail & Related papers (2023-06-21T06:20:51Z) - Exploring Model Dynamics for Accumulative Poisoning Discovery [62.08553134316483]
We propose a novel information measure, namely, Memorization Discrepancy, to explore the defense via the model-level information.
By implicitly transferring the changes in the data manipulation to that in the model outputs, Memorization Discrepancy can discover the imperceptible poison samples.
We thoroughly explore its properties and propose Discrepancy-aware Sample Correction (DSC) to defend against accumulative poisoning attacks.
arXiv Detail & Related papers (2023-06-06T14:45:24Z) - Improving Adversarial Robustness to Sensitivity and Invariance Attacks
with Deep Metric Learning [80.21709045433096]
A standard method in adversarial robustness assumes a framework to defend against samples crafted by minimally perturbing a sample.
We use metric learning to frame adversarial regularization as an optimal transport problem.
Our preliminary results indicate that regularizing over invariant perturbations in our framework improves both invariant and sensitivity defense.
arXiv Detail & Related papers (2022-11-04T13:54:02Z) - Towards Generating Adversarial Examples on Mixed-type Data [32.41305735919529]
We propose a novel attack algorithm M-Attack, which can effectively generate adversarial examples in mixed-type data.
Based on M-Attack, attackers can attempt to mislead the targeted classification model's prediction, by only slightly perturbing both the numerical and categorical features in the given data samples.
Our generated adversarial examples can evade potential detection models, which makes the attack indeed insidious.
arXiv Detail & Related papers (2022-10-17T20:17:21Z) - Identifying a Training-Set Attack's Target Using Renormalized Influence
Estimation [11.663072799764542]
This work proposes the task of target identification, which determines whether a specific test instance is the target of a training-set attack.
Rather than focusing on a single attack method or data modality, we build on influence estimation, which quantifies each training instance's contribution to a model's prediction.
arXiv Detail & Related papers (2022-01-25T02:36:34Z) - Learning and Certification under Instance-targeted Poisoning [49.55596073963654]
We study PAC learnability and certification under instance-targeted poisoning attacks.
We show that when the budget of the adversary scales sublinearly with the sample complexity, PAC learnability and certification are achievable.
We empirically study the robustness of K nearest neighbour, logistic regression, multi-layer perceptron, and convolutional neural network on real data sets.
arXiv Detail & Related papers (2021-05-18T17:48:15Z) - Learning to Separate Clusters of Adversarial Representations for Robust
Adversarial Detection [50.03939695025513]
We propose a new probabilistic adversarial detector motivated by a recently introduced non-robust feature.
In this paper, we consider the non-robust features as a common property of adversarial examples, and we deduce it is possible to find a cluster in representation space corresponding to the property.
This idea leads us to probability estimate distribution of adversarial representations in a separate cluster, and leverage the distribution for a likelihood based adversarial detector.
arXiv Detail & Related papers (2020-12-07T07:21:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.