GNP Attack: Transferable Adversarial Examples via Gradient Norm Penalty
- URL: http://arxiv.org/abs/2307.04099v1
- Date: Sun, 9 Jul 2023 05:21:31 GMT
- Title: GNP Attack: Transferable Adversarial Examples via Gradient Norm Penalty
- Authors: Tao Wu, Tie Luo, Donald C. Wunsch
- Abstract summary: Adversarial examples (AE) with good transferability enable practical black-box attacks on diverse target models.
We propose a novel approach to enhance AE transferability using Gradient Norm Penalty (GNP)
By attacking 11 state-of-the-art deep learning models and 6 advanced defense methods, we empirically show that GNP is very effective in generating AE with high transferability.
- Score: 14.82389560064876
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Adversarial examples (AE) with good transferability enable practical
black-box attacks on diverse target models, where insider knowledge about the
target models is not required. Previous methods often generate AE with no or
very limited transferability; that is, they easily overfit to the particular
architecture and feature representation of the source, white-box model and the
generated AE barely work for target, black-box models. In this paper, we
propose a novel approach to enhance AE transferability using Gradient Norm
Penalty (GNP). It drives the loss function optimization procedure to converge
to a flat region of local optima in the loss landscape. By attacking 11
state-of-the-art (SOTA) deep learning models and 6 advanced defense methods, we
empirically show that GNP is very effective in generating AE with high
transferability. We also demonstrate that it is very flexible in that it can be
easily integrated with other gradient based methods for stronger transfer-based
attacks.
Related papers
- Enhancing Transferability of Adversarial Attacks with GE-AdvGAN+: A Comprehensive Framework for Gradient Editing [12.131163373757383]
Transferable adversarial attacks pose significant threats to deep neural networks.
We propose a novel framework for gradient editing-based transferable attacks, named GE-AdvGAN+.
Our framework integrates nearly all mainstream attack methods to enhance transferability while significantly reducing computational resource consumption.
arXiv Detail & Related papers (2024-08-22T18:26:31Z) - Enhancing Adversarial Transferability with Adversarial Weight Tuning [36.09966860069978]
adversarial examples (AEs) mislead the model while appearing benign to human observers.
AWT is a data-free tuning method that combines gradient-based and model-based attack methods to enhance the transferability of AEs.
arXiv Detail & Related papers (2024-08-18T13:31:26Z) - Enhancing targeted transferability via feature space fine-tuning [21.131915084053894]
Adrial examples (AEs) have been extensively studied due to their potential for privacy protection and inspiring robust neural networks.
We propose fine-tuning an AE crafted by existing simple iterative attacks to make it transferable across unknown models.
arXiv Detail & Related papers (2024-01-05T09:46:42Z) - MaskBlock: Transferable Adversarial Examples with Bayes Approach [35.237713022434235]
The transferability of adversarial examples across diverse models is of critical importance for black-box adversarial attacks.
We show that vanilla black-box attacks craft AEs via solving a maximum likelihood estimation (MLE) problem.
We re-formulate crafting transferable AEs as the maximizing a posteriori probability estimation problem, which is an effective approach to boost the generalization of results with limited available data.
arXiv Detail & Related papers (2022-08-13T01:20:39Z) - Query-Efficient Black-box Adversarial Attacks Guided by a Transfer-based
Prior [50.393092185611536]
We consider the black-box adversarial setting, where the adversary needs to craft adversarial examples without access to the gradients of a target model.
Previous methods attempted to approximate the true gradient either by using the transfer gradient of a surrogate white-box model or based on the feedback of model queries.
We propose two prior-guided random gradient-free (PRGF) algorithms based on biased sampling and gradient averaging.
arXiv Detail & Related papers (2022-03-13T04:06:27Z) - DI-AA: An Interpretable White-box Attack for Fooling Deep Neural
Networks [6.704751710867746]
White-box Adversarial Example (AE) attacks towards Deep Neural Networks (DNNs) have a more powerful destructive capacity than black-box AE attacks.
We propose an interpretable white-box AE attack approach, DI-AA, which explores the application of the interpretable approach of the deep Taylor decomposition.
arXiv Detail & Related papers (2021-10-14T12:15:58Z) - Discriminator-Free Generative Adversarial Attack [87.71852388383242]
Agenerative-based adversarial attacks can get rid of this limitation.
ASymmetric Saliency-based Auto-Encoder (SSAE) generates the perturbations.
The adversarial examples generated by SSAE not only make thewidely-used models collapse, but also achieves good visual quality.
arXiv Detail & Related papers (2021-07-20T01:55:21Z) - Boosting Transferability of Targeted Adversarial Examples via
Hierarchical Generative Networks [56.96241557830253]
Transfer-based adversarial attacks can effectively evaluate model robustness in the black-box setting.
We propose a conditional generative attacking model, which can generate the adversarial examples targeted at different classes.
Our method improves the success rates of targeted black-box attacks by a significant margin over the existing methods.
arXiv Detail & Related papers (2021-07-05T06:17:47Z) - Patch-wise++ Perturbation for Adversarial Targeted Attacks [132.58673733817838]
We propose a patch-wise iterative method (PIM) aimed at crafting adversarial examples with high transferability.
Specifically, we introduce an amplification factor to the step size in each iteration, and one pixel's overall gradient overflowing the $epsilon$-constraint is properly assigned to its surrounding regions.
Compared with the current state-of-the-art attack methods, we significantly improve the success rate by 35.9% for defense models and 32.7% for normally trained models.
arXiv Detail & Related papers (2020-12-31T08:40:42Z) - Decision-based Universal Adversarial Attack [55.76371274622313]
In black-box setting, current universal adversarial attack methods utilize substitute models to generate the perturbation.
We propose an efficient Decision-based Universal Attack (DUAttack)
The effectiveness of DUAttack is validated through comparisons with other state-of-the-art attacks.
arXiv Detail & Related papers (2020-09-15T12:49:03Z) - Towards Query-Efficient Black-Box Adversary with Zeroth-Order Natural
Gradient Descent [92.4348499398224]
Black-box adversarial attack methods have received special attentions owing to their practicality and simplicity.
We propose a zeroth-order natural gradient descent (ZO-NGD) method to design the adversarial attacks.
ZO-NGD can obtain significantly lower model query complexities compared with state-of-the-art attack methods.
arXiv Detail & Related papers (2020-02-18T21:48:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.