Related papers: Training Meta-Surrogate Model for Transferable Adversarial Attack

Training Meta-Surrogate Model for Transferable Adversarial Attack

URL: http://arxiv.org/abs/2109.01983v2
Date: Tue, 7 Sep 2021 05:13:52 GMT
Title: Training Meta-Surrogate Model for Transferable Adversarial Attack
Authors: Yunxiao Qin, Yuanhao Xiong, Jinfeng Yi, Cho-Jui Hsieh
Abstract summary: We consider adversarial attacks to a black-box model when no queries are allowed. In this setting, many methods directly attack surrogate models and transfer the obtained adversarial examples to fool the target model. We show we can obtain a Meta-Surrogate Model (MSM) such that attacks to this model can be easier transferred to other models.
Score: 98.13178217557193
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We consider adversarial attacks to a black-box model when no queries are allowed. In this setting, many methods directly attack surrogate models and transfer the obtained adversarial examples to fool the target model. Plenty of previous works investigated what kind of attacks to the surrogate model can generate more transferable adversarial examples, but their performances are still limited due to the mismatches between surrogate models and the target model. In this paper, we tackle this problem from a novel angle -- instead of using the original surrogate models, can we obtain a Meta-Surrogate Model (MSM) such that attacks to this model can be easier transferred to other models? We show that this goal can be mathematically formulated as a well-posed (bi-level-like) optimization problem and design a differentiable attacker to make training feasible. Given one or a set of surrogate models, our method can thus obtain an MSM such that adversarial examples generated on MSM enjoy eximious transferability. Comprehensive experiments on Cifar-10 and ImageNet demonstrate that by attacking the MSM, we can obtain stronger transferable adversarial examples to fool black-box models including adversarially trained ones, with much higher success rates than existing methods. The proposed method reveals significant security challenges of deep models and is promising to be served as a state-of-the-art benchmark for evaluating the robustness of deep models in the black-box setting.

Related papers

A Simple DropConnect Approach to Transfer-based Targeted Attack [43.039945949426546]
We study the problem of transfer-based black-box attack, where adversarial samples generated using a single surrogate model are directly applied to target models. We propose to Mitigate perturbation Co-adaptation by DropConnect to enhance transferability. In the challenging scenario of transferring from a CNN-based model to Transformer-based models, MCD achieves 13% higher average ASRs compared with state-of-the-art baselines.
arXiv Detail & Related papers (2025-04-24T12:29:23Z)
Scaling Laws for Black box Adversarial Attacks [37.744814957775965]
Adversarial examples exhibit cross-model transferability, enabling to attack black-box models. Model ensembling is an effective strategy to improve the transferability by attacking multiple surrogate models simultaneously. We show that scaled attacks bring better interpretability in semantics, indicating that the common features of models are captured.
arXiv Detail & Related papers (2024-11-25T08:14:37Z)
An Adaptive Model Ensemble Adversarial Attack for Boosting Adversarial Transferability [26.39964737311377]
We propose an adaptive ensemble attack, dubbed AdaEA, to adaptively control the fusion of the outputs from each model. We achieve considerable improvement over the existing ensemble attacks on various datasets.
arXiv Detail & Related papers (2023-08-05T15:12:36Z)
Minimizing Maximum Model Discrepancy for Transferable Black-box Targeted Attacks [30.863450425927613]
We study the black-box targeted attack problem from the model discrepancy perspective. We present a generalization error bound for black-box targeted attacks, which gives a rigorous theoretical analysis for guaranteeing the success of the attack. We derive a new algorithm for black-box targeted attacks based on our theoretical analysis.
arXiv Detail & Related papers (2022-12-18T08:19:08Z)
Frequency Domain Model Augmentation for Adversarial Attack [91.36850162147678]
For black-box attacks, the gap between the substitute model and the victim model is usually large. We propose a novel spectrum simulation attack to craft more transferable adversarial examples against both normally trained and defense models.
arXiv Detail & Related papers (2022-07-12T08:26:21Z)
Boosting the Adversarial Transferability of Surrogate Models with Dark Knowledge [5.702679709305404]
Deep neural networks (DNNs) are vulnerable to adversarial examples. adversarial examples have transferability, which means that an adversarial example for a DNN model can fool another model with a non-trivial probability. This paper proposes a method for training a surrogate model with dark knowledge to boost the transferability of the adversarial examples generated by the surrogate model.
arXiv Detail & Related papers (2022-06-16T17:22:40Z)
"What's in the box?!": Deflecting Adversarial Attacks by Randomly Deploying Adversarially-Disjoint Models [71.91835408379602]
adversarial examples have been long considered a real threat to machine learning models. We propose an alternative deployment-based defense paradigm that goes beyond the traditional white-box and black-box threat models.
arXiv Detail & Related papers (2021-02-09T20:07:13Z)
Two Sides of the Same Coin: White-box and Black-box Attacks for Transfer Learning [60.784641458579124]
We show that fine-tuning effectively enhances model robustness under white-box FGSM attacks. We also propose a black-box attack method for transfer learning models which attacks the target model with the adversarial examples produced by its source model. To systematically measure the effect of both white-box and black-box attacks, we propose a new metric to evaluate how transferable are the adversarial examples produced by a source model to a target model.
arXiv Detail & Related papers (2020-08-25T15:04:32Z)
Orthogonal Deep Models As Defense Against Black-Box Attacks [71.23669614195195]
We study the inherent weakness of deep models in black-box settings where the attacker may develop the attack using a model similar to the targeted model. We introduce a novel gradient regularization scheme that encourages the internal representation of a deep model to be orthogonal to another. We verify the effectiveness of our technique on a variety of large-scale models.
arXiv Detail & Related papers (2020-06-26T08:29:05Z)
Boosting Black-Box Attack with Partially Transferred Conditional Adversarial Distribution [83.02632136860976]
We study black-box adversarial attacks against deep neural networks (DNNs) We develop a novel mechanism of adversarial transferability, which is robust to the surrogate biases. Experiments on benchmark datasets and attacking against real-world API demonstrate the superior attack performance of the proposed method.
arXiv Detail & Related papers (2020-06-15T16:45:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.