BERT-ATTACK: Adversarial Attack Against BERT Using BERT
- URL: http://arxiv.org/abs/2004.09984v3
- Date: Fri, 2 Oct 2020 03:08:04 GMT
- Title: BERT-ATTACK: Adversarial Attack Against BERT Using BERT
- Authors: Linyang Li, Ruotian Ma, Qipeng Guo, Xiangyang Xue, Xipeng Qiu
- Abstract summary: Adrial attacks for discrete data (such as texts) are more challenging than continuous data (such as images)
We propose textbfBERT-Attack, a high-quality and effective method to generate adversarial samples.
Our method outperforms state-of-the-art attack strategies in both success rate and perturb percentage.
- Score: 77.82947768158132
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Adversarial attacks for discrete data (such as texts) have been proved
significantly more challenging than continuous data (such as images) since it
is difficult to generate adversarial samples with gradient-based methods.
Current successful attack methods for texts usually adopt heuristic replacement
strategies on the character or word level, which remains challenging to find
the optimal solution in the massive space of possible combinations of
replacements while preserving semantic consistency and language fluency. In
this paper, we propose \textbf{BERT-Attack}, a high-quality and effective
method to generate adversarial samples using pre-trained masked language models
exemplified by BERT. We turn BERT against its fine-tuned models and other deep
neural models in downstream tasks so that we can successfully mislead the
target models to predict incorrectly. Our method outperforms state-of-the-art
attack strategies in both success rate and perturb percentage, while the
generated adversarial samples are fluent and semantically preserved. Also, the
cost of calculation is low, thus possible for large-scale generations. The code
is available at https://github.com/LinyangLee/BERT-Attack.
Related papers
- Microbial Genetic Algorithm-based Black-box Attack against Interpretable
Deep Learning Systems [16.13790238416691]
In white-box environments, interpretable deep learning systems (IDLSes) have been shown to be vulnerable to malicious manipulations.
We propose a Query-efficient Score-based black-box attack against IDLSes, QuScore, which requires no knowledge of the target model and its coupled interpretation model.
arXiv Detail & Related papers (2023-07-13T00:08:52Z) - In and Out-of-Domain Text Adversarial Robustness via Label Smoothing [64.66809713499576]
We study the adversarial robustness provided by various label smoothing strategies in foundational models for diverse NLP tasks.
Our experiments show that label smoothing significantly improves adversarial robustness in pre-trained models like BERT, against various popular attacks.
We also analyze the relationship between prediction confidence and robustness, showing that label smoothing reduces over-confident errors on adversarial examples.
arXiv Detail & Related papers (2022-12-20T14:06:50Z) - Learning to Ignore Adversarial Attacks [14.24585085013907]
We introduce the use of rationale models that can explicitly learn to ignore attack tokens.
We find that the rationale models can successfully ignore over 90% of attack tokens.
arXiv Detail & Related papers (2022-05-23T18:01:30Z) - Bridge the Gap Between CV and NLP! A Gradient-based Textual Adversarial
Attack Framework [17.17479625646699]
We propose a unified framework to craft textual adversarial samples.
In this paper, we instantiate our framework with an attack algorithm named Textual Projected Gradient Descent (T-PGD)
arXiv Detail & Related papers (2021-10-28T17:31:51Z) - Virtual Data Augmentation: A Robust and General Framework for
Fine-tuning Pre-trained Models [51.46732511844122]
Powerful pre-trained language models (PLM) can be fooled by small perturbations or intentional attacks.
We present Virtual Data Augmentation (VDA), a general framework for robustly fine-tuning PLMs.
Our approach is able to improve the robustness of PLMs and alleviate the performance degradation under adversarial attacks.
arXiv Detail & Related papers (2021-09-13T09:15:28Z) - Self-Supervised Contrastive Learning with Adversarial Perturbations for
Robust Pretrained Language Models [18.726529370845256]
This paper improves the robustness of the pretrained language model BERT against word substitution-based adversarial attacks.
We also create an adversarial attack for word-level adversarial training on BERT.
arXiv Detail & Related papers (2021-07-15T21:03:34Z) - Towards Variable-Length Textual Adversarial Attacks [68.27995111870712]
It is non-trivial to conduct textual adversarial attacks on natural language processing tasks due to the discreteness of data.
In this paper, we propose variable-length textual adversarial attacks(VL-Attack)
Our method can achieve $33.18$ BLEU score on IWSLT14 German-English translation, achieving an improvement of $1.47$ over the baseline model.
arXiv Detail & Related papers (2021-04-16T14:37:27Z) - Targeted Attack against Deep Neural Networks via Flipping Limited Weight
Bits [55.740716446995805]
We study a novel attack paradigm, which modifies model parameters in the deployment stage for malicious purposes.
Our goal is to misclassify a specific sample into a target class without any sample modification.
By utilizing the latest technique in integer programming, we equivalently reformulate this BIP problem as a continuous optimization problem.
arXiv Detail & Related papers (2021-02-21T03:13:27Z) - Defense against Adversarial Attacks in NLP via Dirichlet Neighborhood
Ensemble [163.3333439344695]
Dirichlet Neighborhood Ensemble (DNE) is a randomized smoothing method for training a robust model to defense substitution-based attacks.
DNE forms virtual sentences by sampling embedding vectors for each word in an input sentence from a convex hull spanned by the word and its synonyms, and it augments them with the training data.
We demonstrate through extensive experimentation that our method consistently outperforms recently proposed defense methods by a significant margin across different network architectures and multiple data sets.
arXiv Detail & Related papers (2020-06-20T18:01:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.