A Generative Approach to Surrogate-based Black-box Attacks
- URL: http://arxiv.org/abs/2402.02732v1
- Date: Mon, 5 Feb 2024 05:22:58 GMT
- Title: A Generative Approach to Surrogate-based Black-box Attacks
- Authors: Raha Moraffah, Huan Liu
- Abstract summary: State-of-the-art surrogate-based attacks involve training a discriminative surrogate that mimics the target's outputs.
We propose a generative surrogate that learns the distribution of samples residing on or close to the target's decision boundaries.
The proposed generative approach results in attacks with remarkably high attack success rates on various targets and datasets.
- Score: 18.37537526008645
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Surrogate-based black-box attacks have exposed the heightened vulnerability
of DNNs. These attacks are designed to craft adversarial examples for any
samples with black-box target feedback for only a given set of samples.
State-of-the-art surrogate-based attacks involve training a discriminative
surrogate that mimics the target's outputs. The goal is to learn the decision
boundaries of the target. The surrogate is then attacked by white-box attacks
to craft adversarial examples similar to the original samples but belong to
other classes. With limited samples, the discriminative surrogate fails to
accurately learn the target's decision boundaries, and these surrogate-based
attacks suffer from low success rates. Different from the discriminative
approach, we propose a generative surrogate that learns the distribution of
samples residing on or close to the target's decision boundaries. The
distribution learned by the generative surrogate can be used to craft
adversarial examples that have imperceptible differences from the original
samples but belong to other classes. The proposed generative approach results
in attacks with remarkably high attack success rates on various targets and
datasets.
Related papers
- Indiscriminate Disruption of Conditional Inference on Multivariate Gaussians [60.22542847840578]
Despite advances in adversarial machine learning, inference for Gaussian models in the presence of an adversary is notably understudied.
We consider a self-interested attacker who wishes to disrupt a decisionmaker's conditional inference and subsequent actions by corrupting a set of evidentiary variables.
To avoid detection, the attacker also desires the attack to appear plausible wherein plausibility is determined by the density of the corrupted evidence.
arXiv Detail & Related papers (2024-11-21T17:46:55Z) - DALA: A Distribution-Aware LoRA-Based Adversarial Attack against
Language Models [64.79319733514266]
Adversarial attacks can introduce subtle perturbations to input data.
Recent attack methods can achieve a relatively high attack success rate (ASR)
We propose a Distribution-Aware LoRA-based Adversarial Attack (DALA) method.
arXiv Detail & Related papers (2023-11-14T23:43:47Z) - Object-fabrication Targeted Attack for Object Detection [54.10697546734503]
adversarial attack for object detection contains targeted attack and untargeted attack.
New object-fabrication targeted attack mode can mislead detectors tofabricate extra false objects with specific target labels.
arXiv Detail & Related papers (2022-12-13T08:42:39Z) - Transferability Ranking of Adversarial Examples [20.41013432717447]
This paper introduces a ranking strategy that refines the transfer attack process.
By leveraging a set of diverse surrogate models, our method can predict transferability of adversarial examples.
Using our strategy, we were able to raise the transferability of adversarial examples from a mere 20% - akin to random selection-up to near upper-bound levels.
arXiv Detail & Related papers (2022-08-23T11:25:16Z) - Adversarial Pixel Restoration as a Pretext Task for Transferable
Perturbations [54.1807206010136]
Transferable adversarial attacks optimize adversaries from a pretrained surrogate model and known label space to fool the unknown black-box models.
We propose Adversarial Pixel Restoration as a self-supervised alternative to train an effective surrogate model from scratch.
Our training approach is based on a min-max objective which reduces overfitting via an adversarial objective.
arXiv Detail & Related papers (2022-07-18T17:59:58Z) - Identifying a Training-Set Attack's Target Using Renormalized Influence
Estimation [11.663072799764542]
This work proposes the task of target identification, which determines whether a specific test instance is the target of a training-set attack.
Rather than focusing on a single attack method or data modality, we build on influence estimation, which quantifies each training instance's contribution to a model's prediction.
arXiv Detail & Related papers (2022-01-25T02:36:34Z) - Towards A Conceptually Simple Defensive Approach for Few-shot
classifiers Against Adversarial Support Samples [107.38834819682315]
We study a conceptually simple approach to defend few-shot classifiers against adversarial attacks.
We propose a simple attack-agnostic detection method, using the concept of self-similarity and filtering.
Our evaluation on the miniImagenet (MI) and CUB datasets exhibit good attack detection performance.
arXiv Detail & Related papers (2021-10-24T05:46:03Z) - Boosting Transferability of Targeted Adversarial Examples via
Hierarchical Generative Networks [56.96241557830253]
Transfer-based adversarial attacks can effectively evaluate model robustness in the black-box setting.
We propose a conditional generative attacking model, which can generate the adversarial examples targeted at different classes.
Our method improves the success rates of targeted black-box attacks by a significant margin over the existing methods.
arXiv Detail & Related papers (2021-07-05T06:17:47Z) - Tricking Adversarial Attacks To Fail [0.05076419064097732]
Our white-box defense tricks untargeted attacks into becoming attacks targeted at designated target classes.
Our Target Training defense tricks the minimization at the core of untargeted, gradient-based adversarial attacks.
arXiv Detail & Related papers (2020-06-08T12:22:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.