Learning-based Hybrid Local Search for the Hard-label Textual Attack
- URL: http://arxiv.org/abs/2201.08193v1
- Date: Thu, 20 Jan 2022 14:16:07 GMT
- Title: Learning-based Hybrid Local Search for the Hard-label Textual Attack
- Authors: Zhen Yu, Xiaosen Wang, Wanxiang Che, Kun He
- Abstract summary: We consider a rarely investigated but more rigorous setting, namely hard-label attack, in which the attacker could only access the prediction label.
Based on this observation, we propose a novel hard-label attack, called Learning-based Hybrid Local Search (LHLS) algorithm.
Our LHLS significantly outperforms existing hard-label attacks regarding the attack performance as well as adversary quality.
- Score: 53.92227690452377
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep neural networks are vulnerable to adversarial examples in Natural
Language Processing. However, existing textual adversarial attacks usually
utilize the gradient or prediction confidence to generate adversarial examples,
making it hard to be deployed in real-world applications. To this end, we
consider a rarely investigated but more rigorous setting, namely hard-label
attack, in which the attacker could only access the prediction label. In
particular, we find that the changes on prediction label caused by word
substitutions on the adversarial example could precisely reflect the importance
of different words. Based on this observation, we propose a novel hard-label
attack, called Learning-based Hybrid Local Search (LHLS) algorithm, which
effectively estimates word importance with the prediction label from the attack
history and integrate such information into hybrid local search algorithm to
optimize the adversarial perturbation. Extensive evaluations for text
classification and textual entailment using various datasets and models show
that our LHLS significantly outperforms existing hard-label attacks regarding
the attack performance as well as adversary quality.
Related papers
- Semantic Stealth: Adversarial Text Attacks on NLP Using Several Methods [0.0]
A text adversarial attack involves the deliberate manipulation of input text to mislead the predictions of the model.
BERT, BERT-on-BERT attack, and Fraud Bargain's Attack (FBA) are explored in this paper.
PWWS emerges as the most potent adversary, consistently outperforming other methods across multiple evaluation scenarios.
arXiv Detail & Related papers (2024-04-08T02:55:01Z) - HQA-Attack: Toward High Quality Black-Box Hard-Label Adversarial Attack
on Text [40.58680960214544]
Black-box hard-label adversarial attack on text is a practical and challenging task.
We propose a framework to generate high quality adversarial examples under the black-box hard-label attack scenarios, named HQA-Attack.
arXiv Detail & Related papers (2024-02-02T10:06:43Z) - LimeAttack: Local Explainable Method for Textual Hard-Label Adversarial
Attack [3.410883081705873]
We propose a novel hard-label attack algorithm named LimeAttack.
We show that LimeAttack achieves the better attacking performance compared with existing hard-label attack.
adversarial examples crafted by LimeAttack are highly transferable and effectively improve model robustness in adversarial training.
arXiv Detail & Related papers (2023-08-01T06:30:37Z) - In and Out-of-Domain Text Adversarial Robustness via Label Smoothing [64.66809713499576]
We study the adversarial robustness provided by various label smoothing strategies in foundational models for diverse NLP tasks.
Our experiments show that label smoothing significantly improves adversarial robustness in pre-trained models like BERT, against various popular attacks.
We also analyze the relationship between prediction confidence and robustness, showing that label smoothing reduces over-confident errors on adversarial examples.
arXiv Detail & Related papers (2022-12-20T14:06:50Z) - Zero-Query Transfer Attacks on Context-Aware Object Detectors [95.18656036716972]
Adversarial attacks perturb images such that a deep neural network produces incorrect classification results.
A promising approach to defend against adversarial attacks on natural multi-object scenes is to impose a context-consistency check.
We present the first approach for generating context-consistent adversarial attacks that can evade the context-consistency check.
arXiv Detail & Related papers (2022-03-29T04:33:06Z) - Improving robustness of jet tagging algorithms with adversarial training [56.79800815519762]
We investigate the vulnerability of flavor tagging algorithms via application of adversarial attacks.
We present an adversarial training strategy that mitigates the impact of such simulated attacks.
arXiv Detail & Related papers (2022-03-25T19:57:19Z) - Generating Natural Language Attacks in a Hard Label Black Box Setting [3.52359746858894]
We study an important and challenging task of attacking natural language processing models in a hard label black box setting.
We propose a decision-based attack strategy that crafts high quality adversarial examples on text classification and entailment tasks.
arXiv Detail & Related papers (2020-12-29T22:01:38Z) - Adversarial Self-Supervised Contrastive Learning [62.17538130778111]
Existing adversarial learning approaches mostly use class labels to generate adversarial samples that lead to incorrect predictions.
We propose a novel adversarial attack for unlabeled data, which makes the model confuse the instance-level identities of the perturbed data samples.
We present a self-supervised contrastive learning framework to adversarially train a robust neural network without labeled data.
arXiv Detail & Related papers (2020-06-13T08:24:33Z) - Universal Adversarial Attacks with Natural Triggers for Text
Classification [30.74579821832117]
We develop adversarial attacks that appear closer to natural English phrases and yet confuse classification systems.
Our attacks effectively reduce model accuracy on classification tasks while being less identifiable than prior models.
arXiv Detail & Related papers (2020-05-01T01:58:24Z) - Adversarial Augmentation Policy Search for Domain and Cross-Lingual
Generalization in Reading Comprehension [96.62963688510035]
Reading comprehension models often overfit to nuances of training datasets and fail at adversarial evaluation.
We present several effective adversaries and automated data augmentation policy search methods with the goal of making reading comprehension models more robust to adversarial evaluation.
arXiv Detail & Related papers (2020-04-13T17:20:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.