Contextualized Perturbation for Textual Adversarial Attack
- URL: http://arxiv.org/abs/2009.07502v2
- Date: Mon, 15 Mar 2021 04:56:31 GMT
- Title: Contextualized Perturbation for Textual Adversarial Attack
- Authors: Dianqi Li, Yizhe Zhang, Hao Peng, Liqun Chen, Chris Brockett,
Ming-Ting Sun, Bill Dolan
- Abstract summary: Adversarial examples expose the vulnerabilities of natural language processing (NLP) models.
This paper presents CLARE, a ContextuaLized AdversaRial Example generation model that produces fluent and grammatical outputs.
- Score: 56.370304308573274
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Adversarial examples expose the vulnerabilities of natural language
processing (NLP) models, and can be used to evaluate and improve their
robustness. Existing techniques of generating such examples are typically
driven by local heuristic rules that are agnostic to the context, often
resulting in unnatural and ungrammatical outputs. This paper presents CLARE, a
ContextuaLized AdversaRial Example generation model that produces fluent and
grammatical outputs through a mask-then-infill procedure. CLARE builds on a
pre-trained masked language model and modifies the inputs in a context-aware
manner. We propose three contextualized perturbations, Replace, Insert and
Merge, allowing for generating outputs of varied lengths. With a richer range
of available strategies, CLARE is able to attack a victim model more
efficiently with fewer edits. Extensive experiments and human evaluation
demonstrate that CLARE outperforms the baselines in terms of attack success
rate, textual similarity, fluency and grammaticality.
Related papers
- A Constraint-Enforcing Reward for Adversarial Attacks on Text Classifiers [10.063169009242682]
We train an encoder-decoder paraphrase model to generate adversarial examples.
We adopt a reinforcement learning algorithm and propose a constraint-enforcing reward.
We show how key design choices impact the generated examples and discuss the strengths and weaknesses of the proposed approach.
arXiv Detail & Related papers (2024-05-20T09:33:43Z) - Click: Controllable Text Generation with Sequence Likelihood Contrastive
Learning [69.35360098882606]
We introduce Click for controllable text generation, which needs no modification to the model architecture.
It employs a contrastive loss on sequence likelihood, which fundamentally decreases the generation probability of negative samples.
It also adopts a novel likelihood ranking-based strategy to construct contrastive samples from model generations.
arXiv Detail & Related papers (2023-06-06T01:56:44Z) - Frauds Bargain Attack: Generating Adversarial Text Samples via Word
Manipulation Process [9.269657271777527]
This study proposes a new method called the Fraud's Bargain Attack.
It uses a randomization mechanism to expand the search space and produce high-quality adversarial examples.
It outperforms other methods in terms of success rate, imperceptibility and sentence quality.
arXiv Detail & Related papers (2023-03-01T06:04:25Z) - GanLM: Encoder-Decoder Pre-training with an Auxiliary Discriminator [114.8954615026781]
We propose a GAN-style model for encoder-decoder pre-training by introducing an auxiliary discriminator.
GanLM is trained with two pre-training objectives: replaced token detection and replaced token denoising.
Experiments in language generation benchmarks show that GanLM with the powerful language understanding capability outperforms various strong pre-trained language models.
arXiv Detail & Related papers (2022-12-20T12:51:11Z) - Phrase-level Textual Adversarial Attack with Label Preservation [34.42846737465045]
We propose Phrase-Level Textual Adrial aTtack (PLAT) that generates adversarial samples through phrase-level perturbations.
PLAT has a superior attack effectiveness as well as a better label consistency than strong baselines.
arXiv Detail & Related papers (2022-05-22T02:22:38Z) - Learning-based Hybrid Local Search for the Hard-label Textual Attack [53.92227690452377]
We consider a rarely investigated but more rigorous setting, namely hard-label attack, in which the attacker could only access the prediction label.
Based on this observation, we propose a novel hard-label attack, called Learning-based Hybrid Local Search (LHLS) algorithm.
Our LHLS significantly outperforms existing hard-label attacks regarding the attack performance as well as adversary quality.
arXiv Detail & Related papers (2022-01-20T14:16:07Z) - Towards Variable-Length Textual Adversarial Attacks [68.27995111870712]
It is non-trivial to conduct textual adversarial attacks on natural language processing tasks due to the discreteness of data.
In this paper, we propose variable-length textual adversarial attacks(VL-Attack)
Our method can achieve $33.18$ BLEU score on IWSLT14 German-English translation, achieving an improvement of $1.47$ over the baseline model.
arXiv Detail & Related papers (2021-04-16T14:37:27Z) - SDA: Improving Text Generation with Self Data Augmentation [88.24594090105899]
We propose to improve the standard maximum likelihood estimation (MLE) paradigm by incorporating a self-imitation-learning phase for automatic data augmentation.
Unlike most existing sentence-level augmentation strategies, our method is more general and could be easily adapted to any MLE-based training procedure.
arXiv Detail & Related papers (2021-01-02T01:15:57Z) - CAT-Gen: Improving Robustness in NLP Models via Controlled Adversarial
Text Generation [20.27052525082402]
We present a Controlled Adversarial Text Generation (CAT-Gen) model that generates adversarial texts through controllable attributes.
Experiments on real-world NLP datasets demonstrate that our method can generate more diverse and fluent adversarial texts.
arXiv Detail & Related papers (2020-10-05T21:07:45Z) - Generating Natural Language Adversarial Examples on a Large Scale with
Generative Models [41.85006993382117]
We propose an end to end solution to efficiently generate adversarial texts from scratch using generative models.
Specifically, we train a conditional variational autoencoder with an additional adversarial loss to guide the generation of adversarial examples.
To improve the validity of adversarial texts, we utilize discrimators and the training framework of generative adversarial networks.
arXiv Detail & Related papers (2020-03-10T03:21:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.