Generating Natural Language Adversarial Examples through An Improved
Beam Search Algorithm
- URL: http://arxiv.org/abs/2110.08036v1
- Date: Fri, 15 Oct 2021 12:09:04 GMT
- Title: Generating Natural Language Adversarial Examples through An Improved
Beam Search Algorithm
- Authors: Tengfei Zhao, Zhaocheng Ge, Hanping Hu, Dingmeng Shi
- Abstract summary: In this paper, a novel attack model is proposed, its attack success rate surpasses the benchmark attack methods.
The novel method is empirically evaluated by attacking WordCNN, LSTM, BiLSTM, and BERT on four benchmark datasets.
It achieves a 100% attack success rate higher than the state-of-the-art method when attacking BERT and BiLSTM on IMDB.
- Score: 0.5735035463793008
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The research of adversarial attacks in the text domain attracts many
interests in the last few years, and many methods with a high attack success
rate have been proposed. However, these attack methods are inefficient as they
require lots of queries for the victim model when crafting text adversarial
examples. In this paper, a novel attack model is proposed, its attack success
rate surpasses the benchmark attack methods, but more importantly, its attack
efficiency is much higher than the benchmark attack methods. The novel method
is empirically evaluated by attacking WordCNN, LSTM, BiLSTM, and BERT on four
benchmark datasets. For instance, it achieves a 100\% attack success rate
higher than the state-of-the-art method when attacking BERT and BiLSTM on IMDB,
but the number of queries for the victim models only is 1/4 and 1/6.5 of the
state-of-the-art method, respectively. Also, further experiments show the novel
method has a good transferability on the generated adversarial examples.
Related papers
- A Realistic Threat Model for Large Language Model Jailbreaks [87.64278063236847]
In this work, we propose a unified threat model for the principled comparison of jailbreak attacks.
Our threat model combines constraints in perplexity, measuring how far a jailbreak deviates from natural text.
We adapt popular attacks to this new, realistic threat model, with which we, for the first time, benchmark these attacks on equal footing.
arXiv Detail & Related papers (2024-10-21T17:27:01Z) - Revisiting Character-level Adversarial Attacks for Language Models [53.446619686108754]
We introduce Charmer, an efficient query-based adversarial attack capable of achieving high attack success rate (ASR)
Our method successfully targets both small (BERT) and large (Llama 2) models.
arXiv Detail & Related papers (2024-05-07T14:23:22Z) - DALA: A Distribution-Aware LoRA-Based Adversarial Attack against
Language Models [64.79319733514266]
Adversarial attacks can introduce subtle perturbations to input data.
Recent attack methods can achieve a relatively high attack success rate (ASR)
We propose a Distribution-Aware LoRA-based Adversarial Attack (DALA) method.
arXiv Detail & Related papers (2023-11-14T23:43:47Z) - Sample Attackability in Natural Language Adversarial Attacks [1.4213973379473654]
This work formally extends the definition of sample attackability/robustness for NLP attacks.
Experiments on two popular NLP datasets, four state of the art models and four different NLP adversarial attack methods.
arXiv Detail & Related papers (2023-06-21T06:20:51Z) - Multi-granularity Textual Adversarial Attack with Behavior Cloning [4.727534308759158]
We propose MAYA, a Multi-grAnularitY Attack model to generate high-quality adversarial samples with fewer queries to victim models.
We conduct comprehensive experiments to evaluate our attack models by attacking BiLSTM, BERT and RoBERTa in two different black-box attack settings and three benchmark datasets.
arXiv Detail & Related papers (2021-09-09T15:46:45Z) - Boosting Transferability of Targeted Adversarial Examples via
Hierarchical Generative Networks [56.96241557830253]
Transfer-based adversarial attacks can effectively evaluate model robustness in the black-box setting.
We propose a conditional generative attacking model, which can generate the adversarial examples targeted at different classes.
Our method improves the success rates of targeted black-box attacks by a significant margin over the existing methods.
arXiv Detail & Related papers (2021-07-05T06:17:47Z) - Adversarial example generation with AdaBelief Optimizer and Crop
Invariance [8.404340557720436]
Adversarial attacks can be an important method to evaluate and select robust models in safety-critical applications.
We propose AdaBelief Iterative Fast Gradient Method (ABI-FGM) and Crop-Invariant attack Method (CIM) to improve the transferability of adversarial examples.
Our method has higher success rates than state-of-the-art gradient-based attack methods.
arXiv Detail & Related papers (2021-02-07T06:00:36Z) - Learning to Attack: Towards Textual Adversarial Attacking in Real-world
Situations [81.82518920087175]
Adversarial attacking aims to fool deep neural networks with adversarial examples.
We propose a reinforcement learning based attack model, which can learn from attack history and launch attacks more efficiently.
arXiv Detail & Related papers (2020-09-19T09:12:24Z) - BERT-ATTACK: Adversarial Attack Against BERT Using BERT [77.82947768158132]
Adrial attacks for discrete data (such as texts) are more challenging than continuous data (such as images)
We propose textbfBERT-Attack, a high-quality and effective method to generate adversarial samples.
Our method outperforms state-of-the-art attack strategies in both success rate and perturb percentage.
arXiv Detail & Related papers (2020-04-21T13:30:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.