Gradient-based Adversarial Attacks against Text Transformers
- URL: http://arxiv.org/abs/2104.13733v1
- Date: Thu, 15 Apr 2021 17:43:43 GMT
- Title: Gradient-based Adversarial Attacks against Text Transformers
- Authors: Chuan Guo, Alexandre Sablayrolles, Herv\'e J\'egou, Douwe Kiela
- Abstract summary: We propose the first general-purpose gradient-based attack against transformer models.
We empirically demonstrate that our white-box attack attains state-of-the-art attack performance on a variety of natural language tasks.
- Score: 96.73493433809419
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose the first general-purpose gradient-based attack against
transformer models. Instead of searching for a single adversarial example, we
search for a distribution of adversarial examples parameterized by a
continuous-valued matrix, hence enabling gradient-based optimization. We
empirically demonstrate that our white-box attack attains state-of-the-art
attack performance on a variety of natural language tasks. Furthermore, we show
that a powerful black-box transfer attack, enabled by sampling from the
adversarial distribution, matches or exceeds existing methods, while only
requiring hard-label outputs.
Related papers
- Efficient Generation of Targeted and Transferable Adversarial Examples for Vision-Language Models Via Diffusion Models [17.958154849014576]
Adversarial attacks can be used to assess the robustness of large visual-language models (VLMs)
Previous transfer-based adversarial attacks incur high costs due to high iteration counts and complex method structure.
We propose AdvDiffVLM, which uses diffusion models to generate natural, unrestricted and targeted adversarial examples.
arXiv Detail & Related papers (2024-04-16T07:19:52Z) - Mutual-modality Adversarial Attack with Semantic Perturbation [81.66172089175346]
We propose a novel approach that generates adversarial attacks in a mutual-modality optimization scheme.
Our approach outperforms state-of-the-art attack methods and can be readily deployed as a plug-and-play solution.
arXiv Detail & Related papers (2023-12-20T05:06:01Z) - Generalizable Black-Box Adversarial Attack with Meta Learning [54.196613395045595]
In black-box adversarial attack, the target model's parameters are unknown, and the attacker aims to find a successful perturbation based on query feedback under a query budget.
We propose to utilize the feedback information across historical attacks, dubbed example-level adversarial transferability.
The proposed framework with the two types of adversarial transferability can be naturally combined with any off-the-shelf query-based attack methods to boost their performance.
arXiv Detail & Related papers (2023-01-01T07:24:12Z) - Adversarial Pixel Restoration as a Pretext Task for Transferable
Perturbations [54.1807206010136]
Transferable adversarial attacks optimize adversaries from a pretrained surrogate model and known label space to fool the unknown black-box models.
We propose Adversarial Pixel Restoration as a self-supervised alternative to train an effective surrogate model from scratch.
Our training approach is based on a min-max objective which reduces overfitting via an adversarial objective.
arXiv Detail & Related papers (2022-07-18T17:59:58Z) - Adversarially Robust Classification by Conditional Generative Model
Inversion [4.913248451323163]
We propose a classification model that does not obfuscate gradients and is robust by construction without assuming prior knowledge about the attack.
Our method casts classification as an optimization problem where we "invert" a conditional generator trained on unperturbed, natural images.
We demonstrate that our model is extremely robust against black-box attacks and has improved robustness against white-box attacks.
arXiv Detail & Related papers (2022-01-12T23:11:16Z) - Stochastic Variance Reduced Ensemble Adversarial Attack for Boosting the
Adversarial Transferability [20.255708227671573]
Black-box adversarial attacks can be transferred from one model to another.
In this work, we propose a novel ensemble attack method called the variance reduced ensemble attack.
Empirical results on the standard ImageNet demonstrate that the proposed method could boost the adversarial transferability and outperforms existing ensemble attacks significantly.
arXiv Detail & Related papers (2021-11-21T06:33:27Z) - Boosting Transferability of Targeted Adversarial Examples via
Hierarchical Generative Networks [56.96241557830253]
Transfer-based adversarial attacks can effectively evaluate model robustness in the black-box setting.
We propose a conditional generative attacking model, which can generate the adversarial examples targeted at different classes.
Our method improves the success rates of targeted black-box attacks by a significant margin over the existing methods.
arXiv Detail & Related papers (2021-07-05T06:17:47Z) - Enhancing the Transferability of Adversarial Attacks through Variance
Tuning [6.5328074334512]
We propose a new method called variance tuning to enhance the class of iterative gradient based attack methods.
Empirical results on the standard ImageNet dataset demonstrate that our method could significantly improve the transferability of gradient-based adversarial attacks.
arXiv Detail & Related papers (2021-03-29T12:41:55Z) - Decision-based Universal Adversarial Attack [55.76371274622313]
In black-box setting, current universal adversarial attack methods utilize substitute models to generate the perturbation.
We propose an efficient Decision-based Universal Attack (DUAttack)
The effectiveness of DUAttack is validated through comparisons with other state-of-the-art attacks.
arXiv Detail & Related papers (2020-09-15T12:49:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.