Minimizing Maximum Model Discrepancy for Transferable Black-box Targeted
Attacks
- URL: http://arxiv.org/abs/2212.09035v1
- Date: Sun, 18 Dec 2022 08:19:08 GMT
- Title: Minimizing Maximum Model Discrepancy for Transferable Black-box Targeted
Attacks
- Authors: Anqi Zhao, Tong Chu, Yahao Liu, Wen Li, Jingjing Li, Lixin Duan
- Abstract summary: We study the black-box targeted attack problem from the model discrepancy perspective.
We present a generalization error bound for black-box targeted attacks, which gives a rigorous theoretical analysis for guaranteeing the success of the attack.
We derive a new algorithm for black-box targeted attacks based on our theoretical analysis.
- Score: 30.863450425927613
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this work, we study the black-box targeted attack problem from the model
discrepancy perspective. On the theoretical side, we present a generalization
error bound for black-box targeted attacks, which gives a rigorous theoretical
analysis for guaranteeing the success of the attack. We reveal that the attack
error on a target model mainly depends on empirical attack error on the
substitute model and the maximum model discrepancy among substitute models. On
the algorithmic side, we derive a new algorithm for black-box targeted attacks
based on our theoretical analysis, in which we additionally minimize the
maximum model discrepancy(M3D) of the substitute models when training the
generator to generate adversarial examples. In this way, our model is capable
of crafting highly transferable adversarial examples that are robust to the
model variation, thus improving the success rate for attacking the black-box
model. We conduct extensive experiments on the ImageNet dataset with different
classification models, and our proposed approach outperforms existing
state-of-the-art methods by a significant margin. Our codes will be released.
Related papers
- Efficient Black-box Adversarial Attacks via Bayesian Optimization Guided by a Function Prior [36.101904669291436]
This paper studies the challenging black-box adversarial attack that aims to generate examples against a black-box model by only using output feedback of the model to input queries.
We propose a Prior-guided Bayesian Optimization (P-BO) algorithm that leverages the surrogate model as a global function prior in black-box adversarial attacks.
Our theoretical analysis on the regret bound indicates that the performance of P-BO may be affected by a bad prior.
arXiv Detail & Related papers (2024-05-29T14:05:16Z) - Defense Against Model Extraction Attacks on Recommender Systems [53.127820987326295]
We introduce Gradient-based Ranking Optimization (GRO) to defend against model extraction attacks on recommender systems.
GRO aims to minimize the loss of the protected target model while maximizing the loss of the attacker's surrogate model.
Results show GRO's superior effectiveness in defending against model extraction attacks.
arXiv Detail & Related papers (2023-10-25T03:30:42Z) - Understanding the Robustness of Randomized Feature Defense Against
Query-Based Adversarial Attacks [23.010308600769545]
Deep neural networks are vulnerable to adversarial examples that find samples close to the original image but can make the model misclassify.
We propose a simple and lightweight defense against black-box attacks by adding random noise to hidden features at intermediate layers of the model at inference time.
Our method effectively enhances the model's resilience against both score-based and decision-based black-box attacks.
arXiv Detail & Related papers (2023-10-01T03:53:23Z) - Frequency Domain Model Augmentation for Adversarial Attack [91.36850162147678]
For black-box attacks, the gap between the substitute model and the victim model is usually large.
We propose a novel spectrum simulation attack to craft more transferable adversarial examples against both normally trained and defense models.
arXiv Detail & Related papers (2022-07-12T08:26:21Z) - Training Meta-Surrogate Model for Transferable Adversarial Attack [98.13178217557193]
We consider adversarial attacks to a black-box model when no queries are allowed.
In this setting, many methods directly attack surrogate models and transfer the obtained adversarial examples to fool the target model.
We show we can obtain a Meta-Surrogate Model (MSM) such that attacks to this model can be easier transferred to other models.
arXiv Detail & Related papers (2021-09-05T03:27:46Z) - Query-free Black-box Adversarial Attacks on Graphs [37.88689315688314]
We propose a query-free black-box adversarial attack on graphs, in which the attacker has no knowledge of the target model and no query access to the model.
We prove that the impact of the flipped links on the target model can be quantified by spectral changes, and thus be approximated using the eigenvalue perturbation theory.
Due to its simplicity and scalability, the proposed model is not only generic in various graph-based models, but can be easily extended when different knowledge levels are accessible as well.
arXiv Detail & Related papers (2020-12-12T08:52:56Z) - Two Sides of the Same Coin: White-box and Black-box Attacks for Transfer
Learning [60.784641458579124]
We show that fine-tuning effectively enhances model robustness under white-box FGSM attacks.
We also propose a black-box attack method for transfer learning models which attacks the target model with the adversarial examples produced by its source model.
To systematically measure the effect of both white-box and black-box attacks, we propose a new metric to evaluate how transferable are the adversarial examples produced by a source model to a target model.
arXiv Detail & Related papers (2020-08-25T15:04:32Z) - Orthogonal Deep Models As Defense Against Black-Box Attacks [71.23669614195195]
We study the inherent weakness of deep models in black-box settings where the attacker may develop the attack using a model similar to the targeted model.
We introduce a novel gradient regularization scheme that encourages the internal representation of a deep model to be orthogonal to another.
We verify the effectiveness of our technique on a variety of large-scale models.
arXiv Detail & Related papers (2020-06-26T08:29:05Z) - Boosting Black-Box Attack with Partially Transferred Conditional
Adversarial Distribution [83.02632136860976]
We study black-box adversarial attacks against deep neural networks (DNNs)
We develop a novel mechanism of adversarial transferability, which is robust to the surrogate biases.
Experiments on benchmark datasets and attacking against real-world API demonstrate the superior attack performance of the proposed method.
arXiv Detail & Related papers (2020-06-15T16:45:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.