LimeAttack: Local Explainable Method for Textual Hard-Label Adversarial
Attack
- URL: http://arxiv.org/abs/2308.00319v2
- Date: Wed, 10 Jan 2024 13:26:18 GMT
- Title: LimeAttack: Local Explainable Method for Textual Hard-Label Adversarial
Attack
- Authors: Hai Zhu and Zhaoqing Yang and Weiwei Shang and Yuren Wu
- Abstract summary: We propose a novel hard-label attack algorithm named LimeAttack.
We show that LimeAttack achieves the better attacking performance compared with existing hard-label attack.
adversarial examples crafted by LimeAttack are highly transferable and effectively improve model robustness in adversarial training.
- Score: 3.410883081705873
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Natural language processing models are vulnerable to adversarial examples.
Previous textual adversarial attacks adopt gradients or confidence scores to
calculate word importance ranking and generate adversarial examples. However,
this information is unavailable in the real world. Therefore, we focus on a
more realistic and challenging setting, named hard-label attack, in which the
attacker can only query the model and obtain a discrete prediction label.
Existing hard-label attack algorithms tend to initialize adversarial examples
by random substitution and then utilize complex heuristic algorithms to
optimize the adversarial perturbation. These methods require a lot of model
queries and the attack success rate is restricted by adversary initialization.
In this paper, we propose a novel hard-label attack algorithm named LimeAttack,
which leverages a local explainable method to approximate word importance
ranking, and then adopts beam search to find the optimal solution. Extensive
experiments show that LimeAttack achieves the better attacking performance
compared with existing hard-label attack under the same query budget. In
addition, we evaluate the effectiveness of LimeAttack on large language models,
and results indicate that adversarial examples remain a significant threat to
large language models. The adversarial examples crafted by LimeAttack are
highly transferable and effectively improve model robustness in adversarial
training.
Related papers
- BeamAttack: Generating High-quality Textual Adversarial Examples through
Beam Search and Mixed Semantic Spaces [3.8029070240258678]
adversarial examples are imperceptible to human readers.
In a black-box setting, attacker can fool the model without knowing model's parameters and architecture.
We propose BeamAttack, a textual attack algorithm that makes use of mixed semantic spaces and improved beam search.
arXiv Detail & Related papers (2023-03-09T03:30:52Z) - A Word is Worth A Thousand Dollars: Adversarial Attack on Tweets Fools
Stock Prediction [100.9772316028191]
In this paper, we experiment with a variety of adversarial attack configurations to fool three stock prediction victim models.
Our results show that the proposed attack method can achieve consistent success rates and cause significant monetary loss in trading simulation.
arXiv Detail & Related papers (2022-05-01T05:12:22Z) - Learning-based Hybrid Local Search for the Hard-label Textual Attack [53.92227690452377]
We consider a rarely investigated but more rigorous setting, namely hard-label attack, in which the attacker could only access the prediction label.
Based on this observation, we propose a novel hard-label attack, called Learning-based Hybrid Local Search (LHLS) algorithm.
Our LHLS significantly outperforms existing hard-label attacks regarding the attack performance as well as adversary quality.
arXiv Detail & Related papers (2022-01-20T14:16:07Z) - RamBoAttack: A Robust Query Efficient Deep Neural Network Decision
Exploit [9.93052896330371]
We develop a robust query efficient attack capable of avoiding entrapment in a local minimum and misdirection from noisy gradients.
The RamBoAttack is more robust to the different sample inputs available to an adversary and the targeted class.
arXiv Detail & Related papers (2021-12-10T01:25:24Z) - A Differentiable Language Model Adversarial Attack on Text Classifiers [10.658675415759697]
We propose a new black-box sentence-level attack for natural language processing.
Our method fine-tunes a pre-trained language model to generate adversarial examples.
We show that the proposed attack outperforms competitors on a diverse set of NLP problems for both computed metrics and human evaluation.
arXiv Detail & Related papers (2021-07-23T14:43:13Z) - Poisoning Attack against Estimating from Pairwise Comparisons [140.9033911097995]
Attackers have strong motivation and incentives to manipulate the ranking list.
Data poisoning attacks on pairwise ranking algorithms can be formalized as the dynamic and static games between the ranker and the attacker.
We propose two efficient poisoning attack algorithms and establish the associated theoretical guarantees.
arXiv Detail & Related papers (2021-07-05T08:16:01Z) - Towards Variable-Length Textual Adversarial Attacks [68.27995111870712]
It is non-trivial to conduct textual adversarial attacks on natural language processing tasks due to the discreteness of data.
In this paper, we propose variable-length textual adversarial attacks(VL-Attack)
Our method can achieve $33.18$ BLEU score on IWSLT14 German-English translation, achieving an improvement of $1.47$ over the baseline model.
arXiv Detail & Related papers (2021-04-16T14:37:27Z) - Generating Natural Language Attacks in a Hard Label Black Box Setting [3.52359746858894]
We study an important and challenging task of attacking natural language processing models in a hard label black box setting.
We propose a decision-based attack strategy that crafts high quality adversarial examples on text classification and entailment tasks.
arXiv Detail & Related papers (2020-12-29T22:01:38Z) - Learning to Attack: Towards Textual Adversarial Attacking in Real-world
Situations [81.82518920087175]
Adversarial attacking aims to fool deep neural networks with adversarial examples.
We propose a reinforcement learning based attack model, which can learn from attack history and launch attacks more efficiently.
arXiv Detail & Related papers (2020-09-19T09:12:24Z) - BERT-ATTACK: Adversarial Attack Against BERT Using BERT [77.82947768158132]
Adrial attacks for discrete data (such as texts) are more challenging than continuous data (such as images)
We propose textbfBERT-Attack, a high-quality and effective method to generate adversarial samples.
Our method outperforms state-of-the-art attack strategies in both success rate and perturb percentage.
arXiv Detail & Related papers (2020-04-21T13:30:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.