Generating Natural Language Attacks in a Hard Label Black Box Setting
- URL: http://arxiv.org/abs/2012.14956v1
- Date: Tue, 29 Dec 2020 22:01:38 GMT
- Title: Generating Natural Language Attacks in a Hard Label Black Box Setting
- Authors: Rishabh Maheshwary, Saket Maheshwary and Vikram Pudi
- Abstract summary: We study an important and challenging task of attacking natural language processing models in a hard label black box setting.
We propose a decision-based attack strategy that crafts high quality adversarial examples on text classification and entailment tasks.
- Score: 3.52359746858894
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study an important and challenging task of attacking natural language
processing models in a hard label black box setting. We propose a
decision-based attack strategy that crafts high quality adversarial examples on
text classification and entailment tasks. Our proposed attack strategy
leverages population-based optimization algorithm to craft plausible and
semantically similar adversarial examples by observing only the top label
predicted by the target model. At each iteration, the optimization procedure
allow word replacements that maximizes the overall semantic similarity between
the original and the adversarial text. Further, our approach does not rely on
using substitute models or any kind of training data. We demonstrate the
efficacy of our proposed approach through extensive experimentation and
ablation studies on five state-of-the-art target models across seven benchmark
datasets. In comparison to attacks proposed in prior literature, we are able to
achieve a higher success rate with lower word perturbation percentage that too
in a highly restricted setting.
Related papers
- HQA-Attack: Toward High Quality Black-Box Hard-Label Adversarial Attack
on Text [40.58680960214544]
Black-box hard-label adversarial attack on text is a practical and challenging task.
We propose a framework to generate high quality adversarial examples under the black-box hard-label attack scenarios, named HQA-Attack.
arXiv Detail & Related papers (2024-02-02T10:06:43Z) - Mutual-modality Adversarial Attack with Semantic Perturbation [81.66172089175346]
We propose a novel approach that generates adversarial attacks in a mutual-modality optimization scheme.
Our approach outperforms state-of-the-art attack methods and can be readily deployed as a plug-and-play solution.
arXiv Detail & Related papers (2023-12-20T05:06:01Z) - LimeAttack: Local Explainable Method for Textual Hard-Label Adversarial
Attack [3.410883081705873]
We propose a novel hard-label attack algorithm named LimeAttack.
We show that LimeAttack achieves the better attacking performance compared with existing hard-label attack.
adversarial examples crafted by LimeAttack are highly transferable and effectively improve model robustness in adversarial training.
arXiv Detail & Related papers (2023-08-01T06:30:37Z) - In and Out-of-Domain Text Adversarial Robustness via Label Smoothing [64.66809713499576]
We study the adversarial robustness provided by various label smoothing strategies in foundational models for diverse NLP tasks.
Our experiments show that label smoothing significantly improves adversarial robustness in pre-trained models like BERT, against various popular attacks.
We also analyze the relationship between prediction confidence and robustness, showing that label smoothing reduces over-confident errors on adversarial examples.
arXiv Detail & Related papers (2022-12-20T14:06:50Z) - Learning-based Hybrid Local Search for the Hard-label Textual Attack [53.92227690452377]
We consider a rarely investigated but more rigorous setting, namely hard-label attack, in which the attacker could only access the prediction label.
Based on this observation, we propose a novel hard-label attack, called Learning-based Hybrid Local Search (LHLS) algorithm.
Our LHLS significantly outperforms existing hard-label attacks regarding the attack performance as well as adversary quality.
arXiv Detail & Related papers (2022-01-20T14:16:07Z) - Adversarial GLUE: A Multi-Task Benchmark for Robustness Evaluation of
Language Models [86.02610674750345]
Adversarial GLUE (AdvGLUE) is a new multi-task benchmark to explore and evaluate the vulnerabilities of modern large-scale language models under various types of adversarial attacks.
We apply 14 adversarial attack methods to GLUE tasks to construct AdvGLUE, which is further validated by humans for reliable annotations.
All the language models and robust training methods we tested perform poorly on AdvGLUE, with scores lagging far behind the benign accuracy.
arXiv Detail & Related papers (2021-11-04T12:59:55Z) - Bridge the Gap Between CV and NLP! A Gradient-based Textual Adversarial
Attack Framework [17.17479625646699]
We propose a unified framework to craft textual adversarial samples.
In this paper, we instantiate our framework with an attack algorithm named Textual Projected Gradient Descent (T-PGD)
arXiv Detail & Related papers (2021-10-28T17:31:51Z) - Boosting Transferability of Targeted Adversarial Examples via
Hierarchical Generative Networks [56.96241557830253]
Transfer-based adversarial attacks can effectively evaluate model robustness in the black-box setting.
We propose a conditional generative attacking model, which can generate the adversarial examples targeted at different classes.
Our method improves the success rates of targeted black-box attacks by a significant margin over the existing methods.
arXiv Detail & Related papers (2021-07-05T06:17:47Z) - Towards Variable-Length Textual Adversarial Attacks [68.27995111870712]
It is non-trivial to conduct textual adversarial attacks on natural language processing tasks due to the discreteness of data.
In this paper, we propose variable-length textual adversarial attacks(VL-Attack)
Our method can achieve $33.18$ BLEU score on IWSLT14 German-English translation, achieving an improvement of $1.47$ over the baseline model.
arXiv Detail & Related papers (2021-04-16T14:37:27Z) - A Context Aware Approach for Generating Natural Language Attacks [3.52359746858894]
We propose an attack strategy that crafts semantically similar adversarial examples on text classification and entailment tasks.
Our proposed attack finds candidate words by considering the information of both the original word and its surrounding context.
arXiv Detail & Related papers (2020-12-24T17:24:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.