Related papers: Contextualized Perturbation for Textual Adversarial Attack

Contextualized Perturbation for Textual Adversarial Attack

URL: http://arxiv.org/abs/2009.07502v2
Date: Mon, 15 Mar 2021 04:56:31 GMT
Title: Contextualized Perturbation for Textual Adversarial Attack
Authors: Dianqi Li, Yizhe Zhang, Hao Peng, Liqun Chen, Chris Brockett, Ming-Ting Sun, Bill Dolan
Abstract summary: Adversarial examples expose the vulnerabilities of natural language processing (NLP) models. This paper presents CLARE, a ContextuaLized AdversaRial Example generation model that produces fluent and grammatical outputs.
Score: 56.370304308573274
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Adversarial examples expose the vulnerabilities of natural language processing (NLP) models, and can be used to evaluate and improve their robustness. Existing techniques of generating such examples are typically driven by local heuristic rules that are agnostic to the context, often resulting in unnatural and ungrammatical outputs. This paper presents CLARE, a ContextuaLized AdversaRial Example generation model that produces fluent and grammatical outputs through a mask-then-infill procedure. CLARE builds on a pre-trained masked language model and modifies the inputs in a context-aware manner. We propose three contextualized perturbations, Replace, Insert and Merge, allowing for generating outputs of varied lengths. With a richer range of available strategies, CLARE is able to attack a victim model more efficiently with fewer edits. Extensive experiments and human evaluation demonstrate that CLARE outperforms the baselines in terms of attack success rate, textual similarity, fluency and grammaticality.

Related papers

Learning Robust Negation Text Representations [60.23044940174016]
We propose a strategy to improve negation of text encoders using diverse patterns of negation and hedging.<n>We observe large improvement in negation understanding capabilities while maintaining competitive performance on general benchmarks.<n>Our method can be adapted to LLMs, leading to improved performance on negation benchmarks.
arXiv Detail & Related papers (2025-07-17T04:48:54Z)
A Constraint-Enforcing Reward for Adversarial Attacks on Text Classifiers [10.063169009242682]
We train an encoder-decoder paraphrase model to generate adversarial examples. We adopt a reinforcement learning algorithm and propose a constraint-enforcing reward. We show how key design choices impact the generated examples and discuss the strengths and weaknesses of the proposed approach.
arXiv Detail & Related papers (2024-05-20T09:33:43Z)
Click: Controllable Text Generation with Sequence Likelihood Contrastive Learning [69.35360098882606]
We introduce Click for controllable text generation, which needs no modification to the model architecture. It employs a contrastive loss on sequence likelihood, which fundamentally decreases the generation probability of negative samples. It also adopts a novel likelihood ranking-based strategy to construct contrastive samples from model generations.
arXiv Detail & Related papers (2023-06-06T01:56:44Z)
Frauds Bargain Attack: Generating Adversarial Text Samples via Word Manipulation Process [9.269657271777527]
This study proposes a new method called the Fraud's Bargain Attack. It uses a randomization mechanism to expand the search space and produce high-quality adversarial examples. It outperforms other methods in terms of success rate, imperceptibility and sentence quality.
arXiv Detail & Related papers (2023-03-01T06:04:25Z)
GanLM: Encoder-Decoder Pre-training with an Auxiliary Discriminator [114.8954615026781]
We propose a GAN-style model for encoder-decoder pre-training by introducing an auxiliary discriminator. GanLM is trained with two pre-training objectives: replaced token detection and replaced token denoising. Experiments in language generation benchmarks show that GanLM with the powerful language understanding capability outperforms various strong pre-trained language models.
arXiv Detail & Related papers (2022-12-20T12:51:11Z)
Phrase-level Textual Adversarial Attack with Label Preservation [34.42846737465045]
We propose Phrase-Level Textual Adrial aTtack (PLAT) that generates adversarial samples through phrase-level perturbations. PLAT has a superior attack effectiveness as well as a better label consistency than strong baselines.
arXiv Detail & Related papers (2022-05-22T02:22:38Z)
Learning-based Hybrid Local Search for the Hard-label Textual Attack [53.92227690452377]
We consider a rarely investigated but more rigorous setting, namely hard-label attack, in which the attacker could only access the prediction label. Based on this observation, we propose a novel hard-label attack, called Learning-based Hybrid Local Search (LHLS) algorithm. Our LHLS significantly outperforms existing hard-label attacks regarding the attack performance as well as adversary quality.
arXiv Detail & Related papers (2022-01-20T14:16:07Z)
Towards Variable-Length Textual Adversarial Attacks [68.27995111870712]
It is non-trivial to conduct textual adversarial attacks on natural language processing tasks due to the discreteness of data. In this paper, we propose variable-length textual adversarial attacks(VL-Attack) Our method can achieve $33.18$ BLEU score on IWSLT14 German-English translation, achieving an improvement of $1.47$ over the baseline model.
arXiv Detail & Related papers (2021-04-16T14:37:27Z)
SDA: Improving Text Generation with Self Data Augmentation [88.24594090105899]
We propose to improve the standard maximum likelihood estimation (MLE) paradigm by incorporating a self-imitation-learning phase for automatic data augmentation. Unlike most existing sentence-level augmentation strategies, our method is more general and could be easily adapted to any MLE-based training procedure.
arXiv Detail & Related papers (2021-01-02T01:15:57Z)
CAT-Gen: Improving Robustness in NLP Models via Controlled Adversarial Text Generation [20.27052525082402]
We present a Controlled Adversarial Text Generation (CAT-Gen) model that generates adversarial texts through controllable attributes. Experiments on real-world NLP datasets demonstrate that our method can generate more diverse and fluent adversarial texts.
arXiv Detail & Related papers (2020-10-05T21:07:45Z)
Generating Natural Language Adversarial Examples on a Large Scale with Generative Models [41.85006993382117]
We propose an end to end solution to efficiently generate adversarial texts from scratch using generative models. Specifically, we train a conditional variational autoencoder with an additional adversarial loss to guide the generation of adversarial examples. To improve the validity of adversarial texts, we utilize discrimators and the training framework of generative adversarial networks.
arXiv Detail & Related papers (2020-03-10T03:21:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.