BeamAttack: Generating High-quality Textual Adversarial Examples through
Beam Search and Mixed Semantic Spaces
- URL: http://arxiv.org/abs/2303.07199v1
- Date: Thu, 9 Mar 2023 03:30:52 GMT
- Title: BeamAttack: Generating High-quality Textual Adversarial Examples through
Beam Search and Mixed Semantic Spaces
- Authors: Hai Zhu and Qingyang Zhao and Yuren Wu
- Abstract summary: adversarial examples are imperceptible to human readers.
In a black-box setting, attacker can fool the model without knowing model's parameters and architecture.
We propose BeamAttack, a textual attack algorithm that makes use of mixed semantic spaces and improved beam search.
- Score: 3.8029070240258678
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Natural language processing models based on neural networks are vulnerable to
adversarial examples. These adversarial examples are imperceptible to human
readers but can mislead models to make the wrong predictions. In a black-box
setting, attacker can fool the model without knowing model's parameters and
architecture. Previous works on word-level attacks widely use single semantic
space and greedy search as a search strategy. However, these methods fail to
balance the attack success rate, quality of adversarial examples and time
consumption. In this paper, we propose BeamAttack, a textual attack algorithm
that makes use of mixed semantic spaces and improved beam search to craft
high-quality adversarial examples. Extensive experiments demonstrate that
BeamAttack can improve attack success rate while saving numerous queries and
time, e.g., improving at most 7\% attack success rate than greedy search when
attacking the examples from MR dataset. Compared with heuristic search,
BeamAttack can save at most 85\% model queries and achieve a competitive attack
success rate. The adversarial examples crafted by BeamAttack are highly
transferable and can effectively improve model's robustness during adversarial
training. Code is available at
https://github.com/zhuhai-ustc/beamattack/tree/master
Related papers
- AdvQDet: Detecting Query-Based Adversarial Attacks with Adversarial Contrastive Prompt Tuning [93.77763753231338]
Adversarial Contrastive Prompt Tuning (ACPT) is proposed to fine-tune the CLIP image encoder to extract similar embeddings for any two intermediate adversarial queries.
We show that ACPT can detect 7 state-of-the-art query-based attacks with $>99%$ detection rate within 5 shots.
We also show that ACPT is robust to 3 types of adaptive attacks.
arXiv Detail & Related papers (2024-08-04T09:53:50Z) - BruSLeAttack: A Query-Efficient Score-Based Black-Box Sparse Adversarial Attack [22.408968332454062]
We study the unique, less-well understood problem of generating sparse adversarial samples simply by observing the score-based replies to model queries.
We develop the BruSLeAttack-a new, faster (more query-efficient) algorithm for the problem.
Our work facilitates faster evaluation of model vulnerabilities and raises our vigilance on the safety, security and reliability of deployed systems.
arXiv Detail & Related papers (2024-04-08T08:59:26Z) - DALA: A Distribution-Aware LoRA-Based Adversarial Attack against
Language Models [64.79319733514266]
Adversarial attacks can introduce subtle perturbations to input data.
Recent attack methods can achieve a relatively high attack success rate (ASR)
We propose a Distribution-Aware LoRA-based Adversarial Attack (DALA) method.
arXiv Detail & Related papers (2023-11-14T23:43:47Z) - LimeAttack: Local Explainable Method for Textual Hard-Label Adversarial
Attack [3.410883081705873]
We propose a novel hard-label attack algorithm named LimeAttack.
We show that LimeAttack achieves the better attacking performance compared with existing hard-label attack.
adversarial examples crafted by LimeAttack are highly transferable and effectively improve model robustness in adversarial training.
arXiv Detail & Related papers (2023-08-01T06:30:37Z) - Generalizable Black-Box Adversarial Attack with Meta Learning [54.196613395045595]
In black-box adversarial attack, the target model's parameters are unknown, and the attacker aims to find a successful perturbation based on query feedback under a query budget.
We propose to utilize the feedback information across historical attacks, dubbed example-level adversarial transferability.
The proposed framework with the two types of adversarial transferability can be naturally combined with any off-the-shelf query-based attack methods to boost their performance.
arXiv Detail & Related papers (2023-01-01T07:24:12Z) - RamBoAttack: A Robust Query Efficient Deep Neural Network Decision
Exploit [9.93052896330371]
We develop a robust query efficient attack capable of avoiding entrapment in a local minimum and misdirection from noisy gradients.
The RamBoAttack is more robust to the different sample inputs available to an adversary and the targeted class.
arXiv Detail & Related papers (2021-12-10T01:25:24Z) - Multi-granularity Textual Adversarial Attack with Behavior Cloning [4.727534308759158]
We propose MAYA, a Multi-grAnularitY Attack model to generate high-quality adversarial samples with fewer queries to victim models.
We conduct comprehensive experiments to evaluate our attack models by attacking BiLSTM, BERT and RoBERTa in two different black-box attack settings and three benchmark datasets.
arXiv Detail & Related papers (2021-09-09T15:46:45Z) - Adversarial examples are useful too! [47.64219291655723]
I propose a new method to tell whether a model has been subject to a backdoor attack.
The idea is to generate adversarial examples, targeted or untargeted, using conventional attacks such as FGSM.
It is possible to visually locate the perturbed regions and unveil the attack.
arXiv Detail & Related papers (2020-05-13T01:38:56Z) - BERT-ATTACK: Adversarial Attack Against BERT Using BERT [77.82947768158132]
Adrial attacks for discrete data (such as texts) are more challenging than continuous data (such as images)
We propose textbfBERT-Attack, a high-quality and effective method to generate adversarial samples.
Our method outperforms state-of-the-art attack strategies in both success rate and perturb percentage.
arXiv Detail & Related papers (2020-04-21T13:30:02Z) - Adversarial Imitation Attack [63.76805962712481]
A practical adversarial attack should require as little as possible knowledge of attacked models.
Current substitute attacks need pre-trained models to generate adversarial examples.
In this study, we propose a novel adversarial imitation attack.
arXiv Detail & Related papers (2020-03-28T10:02:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.