Training Meta-Surrogate Model for Transferable Adversarial Attack
- URL: http://arxiv.org/abs/2109.01983v2
- Date: Tue, 7 Sep 2021 05:13:52 GMT
- Title: Training Meta-Surrogate Model for Transferable Adversarial Attack
- Authors: Yunxiao Qin, Yuanhao Xiong, Jinfeng Yi, Cho-Jui Hsieh
- Abstract summary: We consider adversarial attacks to a black-box model when no queries are allowed.
In this setting, many methods directly attack surrogate models and transfer the obtained adversarial examples to fool the target model.
We show we can obtain a Meta-Surrogate Model (MSM) such that attacks to this model can be easier transferred to other models.
- Score: 98.13178217557193
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider adversarial attacks to a black-box model when no queries are
allowed. In this setting, many methods directly attack surrogate models and
transfer the obtained adversarial examples to fool the target model. Plenty of
previous works investigated what kind of attacks to the surrogate model can
generate more transferable adversarial examples, but their performances are
still limited due to the mismatches between surrogate models and the target
model. In this paper, we tackle this problem from a novel angle -- instead of
using the original surrogate models, can we obtain a Meta-Surrogate Model (MSM)
such that attacks to this model can be easier transferred to other models? We
show that this goal can be mathematically formulated as a well-posed
(bi-level-like) optimization problem and design a differentiable attacker to
make training feasible. Given one or a set of surrogate models, our method can
thus obtain an MSM such that adversarial examples generated on MSM enjoy
eximious transferability. Comprehensive experiments on Cifar-10 and ImageNet
demonstrate that by attacking the MSM, we can obtain stronger transferable
adversarial examples to fool black-box models including adversarially trained
ones, with much higher success rates than existing methods. The proposed method
reveals significant security challenges of deep models and is promising to be
served as a state-of-the-art benchmark for evaluating the robustness of deep
models in the black-box setting.
Related papers
- Scaling Laws for Black box Adversarial Attacks [37.744814957775965]
Adversarial examples exhibit cross-model transferability, enabling to attack black-box models.
Model ensembling is an effective strategy to improve the transferability by attacking multiple surrogate models simultaneously.
We show that scaled attacks bring better interpretability in semantics, indicating that the common features of models are captured.
arXiv Detail & Related papers (2024-11-25T08:14:37Z) - An Adaptive Model Ensemble Adversarial Attack for Boosting Adversarial
Transferability [26.39964737311377]
We propose an adaptive ensemble attack, dubbed AdaEA, to adaptively control the fusion of the outputs from each model.
We achieve considerable improvement over the existing ensemble attacks on various datasets.
arXiv Detail & Related papers (2023-08-05T15:12:36Z) - Minimizing Maximum Model Discrepancy for Transferable Black-box Targeted
Attacks [30.863450425927613]
We study the black-box targeted attack problem from the model discrepancy perspective.
We present a generalization error bound for black-box targeted attacks, which gives a rigorous theoretical analysis for guaranteeing the success of the attack.
We derive a new algorithm for black-box targeted attacks based on our theoretical analysis.
arXiv Detail & Related papers (2022-12-18T08:19:08Z) - Frequency Domain Model Augmentation for Adversarial Attack [91.36850162147678]
For black-box attacks, the gap between the substitute model and the victim model is usually large.
We propose a novel spectrum simulation attack to craft more transferable adversarial examples against both normally trained and defense models.
arXiv Detail & Related papers (2022-07-12T08:26:21Z) - Boosting the Adversarial Transferability of Surrogate Models with Dark
Knowledge [5.702679709305404]
Deep neural networks (DNNs) are vulnerable to adversarial examples.
adversarial examples have transferability, which means that an adversarial example for a DNN model can fool another model with a non-trivial probability.
This paper proposes a method for training a surrogate model with dark knowledge to boost the transferability of the adversarial examples generated by the surrogate model.
arXiv Detail & Related papers (2022-06-16T17:22:40Z) - "What's in the box?!": Deflecting Adversarial Attacks by Randomly
Deploying Adversarially-Disjoint Models [71.91835408379602]
adversarial examples have been long considered a real threat to machine learning models.
We propose an alternative deployment-based defense paradigm that goes beyond the traditional white-box and black-box threat models.
arXiv Detail & Related papers (2021-02-09T20:07:13Z) - Two Sides of the Same Coin: White-box and Black-box Attacks for Transfer
Learning [60.784641458579124]
We show that fine-tuning effectively enhances model robustness under white-box FGSM attacks.
We also propose a black-box attack method for transfer learning models which attacks the target model with the adversarial examples produced by its source model.
To systematically measure the effect of both white-box and black-box attacks, we propose a new metric to evaluate how transferable are the adversarial examples produced by a source model to a target model.
arXiv Detail & Related papers (2020-08-25T15:04:32Z) - Orthogonal Deep Models As Defense Against Black-Box Attacks [71.23669614195195]
We study the inherent weakness of deep models in black-box settings where the attacker may develop the attack using a model similar to the targeted model.
We introduce a novel gradient regularization scheme that encourages the internal representation of a deep model to be orthogonal to another.
We verify the effectiveness of our technique on a variety of large-scale models.
arXiv Detail & Related papers (2020-06-26T08:29:05Z) - Boosting Black-Box Attack with Partially Transferred Conditional
Adversarial Distribution [83.02632136860976]
We study black-box adversarial attacks against deep neural networks (DNNs)
We develop a novel mechanism of adversarial transferability, which is robust to the surrogate biases.
Experiments on benchmark datasets and attacking against real-world API demonstrate the superior attack performance of the proposed method.
arXiv Detail & Related papers (2020-06-15T16:45:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.