HQA-Attack: Toward High Quality Black-Box Hard-Label Adversarial Attack
on Text
- URL: http://arxiv.org/abs/2402.01806v1
- Date: Fri, 2 Feb 2024 10:06:43 GMT
- Title: HQA-Attack: Toward High Quality Black-Box Hard-Label Adversarial Attack
on Text
- Authors: Han Liu, Zhi Xu, Xiaotong Zhang, Feng Zhang, Fenglong Ma, Hongyang
Chen, Hong Yu and Xianchao Zhang
- Abstract summary: Black-box hard-label adversarial attack on text is a practical and challenging task.
We propose a framework to generate high quality adversarial examples under the black-box hard-label attack scenarios, named HQA-Attack.
- Score: 40.58680960214544
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Black-box hard-label adversarial attack on text is a practical and
challenging task, as the text data space is inherently discrete and
non-differentiable, and only the predicted label is accessible. Research on
this problem is still in the embryonic stage and only a few methods are
available. Nevertheless, existing methods rely on the complex heuristic
algorithm or unreliable gradient estimation strategy, which probably fall into
the local optimum and inevitably consume numerous queries, thus are difficult
to craft satisfactory adversarial examples with high semantic similarity and
low perturbation rate in a limited query budget. To alleviate above issues, we
propose a simple yet effective framework to generate high quality textual
adversarial examples under the black-box hard-label attack scenarios, named
HQA-Attack. Specifically, after initializing an adversarial example randomly,
HQA-attack first constantly substitutes original words back as many as
possible, thus shrinking the perturbation rate. Then it leverages the synonym
set of the remaining changed words to further optimize the adversarial example
with the direction which can improve the semantic similarity and satisfy the
adversarial condition simultaneously. In addition, during the optimizing
procedure, it searches a transition synonym word for each changed word, thus
avoiding traversing the whole synonym set and reducing the query number to some
extent. Extensive experimental results on five text classification datasets,
three natural language inference datasets and two real-world APIs have shown
that the proposed HQA-Attack method outperforms other strong baselines
significantly.
Related papers
- LimeAttack: Local Explainable Method for Textual Hard-Label Adversarial
Attack [3.410883081705873]
We propose a novel hard-label attack algorithm named LimeAttack.
We show that LimeAttack achieves the better attacking performance compared with existing hard-label attack.
adversarial examples crafted by LimeAttack are highly transferable and effectively improve model robustness in adversarial training.
arXiv Detail & Related papers (2023-08-01T06:30:37Z) - Block-Sparse Adversarial Attack to Fool Transformer-Based Text
Classifiers [49.50163349643615]
In this paper, we propose a gradient-based adversarial attack against transformer-based text classifiers.
Experimental results demonstrate that, while our adversarial attack maintains the semantics of the sentence, it can reduce the accuracy of GPT-2 to less than 5%.
arXiv Detail & Related papers (2022-03-11T14:37:41Z) - Learning-based Hybrid Local Search for the Hard-label Textual Attack [53.92227690452377]
We consider a rarely investigated but more rigorous setting, namely hard-label attack, in which the attacker could only access the prediction label.
Based on this observation, we propose a novel hard-label attack, called Learning-based Hybrid Local Search (LHLS) algorithm.
Our LHLS significantly outperforms existing hard-label attacks regarding the attack performance as well as adversary quality.
arXiv Detail & Related papers (2022-01-20T14:16:07Z) - Bridge the Gap Between CV and NLP! A Gradient-based Textual Adversarial
Attack Framework [17.17479625646699]
We propose a unified framework to craft textual adversarial samples.
In this paper, we instantiate our framework with an attack algorithm named Textual Projected Gradient Descent (T-PGD)
arXiv Detail & Related papers (2021-10-28T17:31:51Z) - Semantic-Preserving Adversarial Text Attacks [85.32186121859321]
We propose a Bigram and Unigram based adaptive Semantic Preservation Optimization (BU-SPO) method to examine the vulnerability of deep models.
Our method achieves the highest attack success rates and semantics rates by changing the smallest number of words compared with existing methods.
arXiv Detail & Related papers (2021-08-23T09:05:18Z) - Towards Variable-Length Textual Adversarial Attacks [68.27995111870712]
It is non-trivial to conduct textual adversarial attacks on natural language processing tasks due to the discreteness of data.
In this paper, we propose variable-length textual adversarial attacks(VL-Attack)
Our method can achieve $33.18$ BLEU score on IWSLT14 German-English translation, achieving an improvement of $1.47$ over the baseline model.
arXiv Detail & Related papers (2021-04-16T14:37:27Z) - Generating Natural Language Attacks in a Hard Label Black Box Setting [3.52359746858894]
We study an important and challenging task of attacking natural language processing models in a hard label black box setting.
We propose a decision-based attack strategy that crafts high quality adversarial examples on text classification and entailment tasks.
arXiv Detail & Related papers (2020-12-29T22:01:38Z) - BERT-ATTACK: Adversarial Attack Against BERT Using BERT [77.82947768158132]
Adrial attacks for discrete data (such as texts) are more challenging than continuous data (such as images)
We propose textbfBERT-Attack, a high-quality and effective method to generate adversarial samples.
Our method outperforms state-of-the-art attack strategies in both success rate and perturb percentage.
arXiv Detail & Related papers (2020-04-21T13:30:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.