Phrase-level Textual Adversarial Attack with Label Preservation
- URL: http://arxiv.org/abs/2205.10710v2
- Date: Tue, 24 May 2022 08:57:11 GMT
- Title: Phrase-level Textual Adversarial Attack with Label Preservation
- Authors: Yibin Lei, Yu Cao, Dianqi Li, Tianyi Zhou, Meng Fang, Mykola
Pechenizkiy
- Abstract summary: We propose Phrase-Level Textual Adrial aTtack (PLAT) that generates adversarial samples through phrase-level perturbations.
PLAT has a superior attack effectiveness as well as a better label consistency than strong baselines.
- Score: 34.42846737465045
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Generating high-quality textual adversarial examples is critical for
investigating the pitfalls of natural language processing (NLP) models and
further promoting their robustness. Existing attacks are usually realized
through word-level or sentence-level perturbations, which either limit the
perturbation space or sacrifice fluency and textual quality, both affecting the
attack effectiveness. In this paper, we propose Phrase-Level Textual
Adversarial aTtack (PLAT) that generates adversarial samples through
phrase-level perturbations. PLAT first extracts the vulnerable phrases as
attack targets by a syntactic parser, and then perturbs them by a pre-trained
blank-infilling model. Such flexible perturbation design substantially expands
the search space for more effective attacks without introducing too many
modifications, and meanwhile maintaining the textual fluency and grammaticality
via contextualized generation using surrounding texts. Moreover, we develop a
label-preservation filter leveraging the likelihoods of language models
fine-tuned on each class, rather than textual similarity, to rule out those
perturbations that potentially alter the original class label for humans.
Extensive experiments and human evaluation demonstrate that PLAT has a superior
attack effectiveness as well as a better label consistency than strong
baselines.
Related papers
- In and Out-of-Domain Text Adversarial Robustness via Label Smoothing [64.66809713499576]
We study the adversarial robustness provided by various label smoothing strategies in foundational models for diverse NLP tasks.
Our experiments show that label smoothing significantly improves adversarial robustness in pre-trained models like BERT, against various popular attacks.
We also analyze the relationship between prediction confidence and robustness, showing that label smoothing reduces over-confident errors on adversarial examples.
arXiv Detail & Related papers (2022-12-20T14:06:50Z) - Rethinking Textual Adversarial Defense for Pre-trained Language Models [79.18455635071817]
A literature review shows that pre-trained language models (PrLMs) are vulnerable to adversarial attacks.
We propose a novel metric (Degree of Anomaly) to enable current adversarial attack approaches to generate more natural and imperceptible adversarial examples.
We show that our universal defense framework achieves comparable or even higher after-attack accuracy with other specific defenses.
arXiv Detail & Related papers (2022-07-21T07:51:45Z) - Adversarial Text Normalization [2.9434930072968584]
Adversarial Text Normalizer restores baseline performance on attacked content with low computational overhead.
We find that text normalization provides a task-agnostic defense against character-level attacks.
arXiv Detail & Related papers (2022-06-08T19:44:03Z) - Learning-based Hybrid Local Search for the Hard-label Textual Attack [53.92227690452377]
We consider a rarely investigated but more rigorous setting, namely hard-label attack, in which the attacker could only access the prediction label.
Based on this observation, we propose a novel hard-label attack, called Learning-based Hybrid Local Search (LHLS) algorithm.
Our LHLS significantly outperforms existing hard-label attacks regarding the attack performance as well as adversary quality.
arXiv Detail & Related papers (2022-01-20T14:16:07Z) - Towards Variable-Length Textual Adversarial Attacks [68.27995111870712]
It is non-trivial to conduct textual adversarial attacks on natural language processing tasks due to the discreteness of data.
In this paper, we propose variable-length textual adversarial attacks(VL-Attack)
Our method can achieve $33.18$ BLEU score on IWSLT14 German-English translation, achieving an improvement of $1.47$ over the baseline model.
arXiv Detail & Related papers (2021-04-16T14:37:27Z) - Generating Natural Language Attacks in a Hard Label Black Box Setting [3.52359746858894]
We study an important and challenging task of attacking natural language processing models in a hard label black box setting.
We propose a decision-based attack strategy that crafts high quality adversarial examples on text classification and entailment tasks.
arXiv Detail & Related papers (2020-12-29T22:01:38Z) - Contextualized Perturbation for Textual Adversarial Attack [56.370304308573274]
Adversarial examples expose the vulnerabilities of natural language processing (NLP) models.
This paper presents CLARE, a ContextuaLized AdversaRial Example generation model that produces fluent and grammatical outputs.
arXiv Detail & Related papers (2020-09-16T06:53:15Z) - Adversarial Watermarking Transformer: Towards Tracing Text Provenance
with Data Hiding [80.3811072650087]
We study natural language watermarking as a defense to help better mark and trace the provenance of text.
We introduce the Adversarial Watermarking Transformer (AWT) with a jointly trained encoder-decoder and adversarial training.
AWT is the first end-to-end model to hide data in text by automatically learning -- without ground truth -- word substitutions along with their locations.
arXiv Detail & Related papers (2020-09-07T11:01:24Z) - Universal Adversarial Attacks with Natural Triggers for Text
Classification [30.74579821832117]
We develop adversarial attacks that appear closer to natural English phrases and yet confuse classification systems.
Our attacks effectively reduce model accuracy on classification tasks while being less identifiable than prior models.
arXiv Detail & Related papers (2020-05-01T01:58:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.