Generating Natural Language Adversarial Examples on a Large Scale with
Generative Models
- URL: http://arxiv.org/abs/2003.10388v1
- Date: Tue, 10 Mar 2020 03:21:35 GMT
- Title: Generating Natural Language Adversarial Examples on a Large Scale with
Generative Models
- Authors: Yankun Ren and Jianbin Lin and Siliang Tang and Jun Zhou and Shuang
Yang and Yuan Qi and Xiang Ren
- Abstract summary: We propose an end to end solution to efficiently generate adversarial texts from scratch using generative models.
Specifically, we train a conditional variational autoencoder with an additional adversarial loss to guide the generation of adversarial examples.
To improve the validity of adversarial texts, we utilize discrimators and the training framework of generative adversarial networks.
- Score: 41.85006993382117
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Today text classification models have been widely used. However, these
classifiers are found to be easily fooled by adversarial examples. Fortunately,
standard attacking methods generate adversarial texts in a pair-wise way, that
is, an adversarial text can only be created from a real-world text by replacing
a few words. In many applications, these texts are limited in numbers,
therefore their corresponding adversarial examples are often not diverse enough
and sometimes hard to read, thus can be easily detected by humans and cannot
create chaos at a large scale. In this paper, we propose an end to end solution
to efficiently generate adversarial texts from scratch using generative models,
which are not restricted to perturbing the given texts. We call it unrestricted
adversarial text generation. Specifically, we train a conditional variational
autoencoder (VAE) with an additional adversarial loss to guide the generation
of adversarial examples. Moreover, to improve the validity of adversarial
texts, we utilize discrimators and the training framework of generative
adversarial networks (GANs) to make adversarial texts consistent with real
data. Experimental results on sentiment analysis demonstrate the scalability
and efficiency of our method. It can attack text classification models with a
higher success rate than existing methods, and provide acceptable quality for
humans in the meantime.
Related papers
- A Constraint-Enforcing Reward for Adversarial Attacks on Text Classifiers [10.063169009242682]
We train an encoder-decoder paraphrase model to generate adversarial examples.
We adopt a reinforcement learning algorithm and propose a constraint-enforcing reward.
We show how key design choices impact the generated examples and discuss the strengths and weaknesses of the proposed approach.
arXiv Detail & Related papers (2024-05-20T09:33:43Z) - Forging the Forger: An Attempt to Improve Authorship Verification via Data Augmentation [52.72682366640554]
Authorship Verification (AV) is a text classification task concerned with inferring whether a candidate text has been written by one specific author or by someone else.
It has been shown that many AV systems are vulnerable to adversarial attacks, where a malicious author actively tries to fool the classifier by either concealing their writing style, or by imitating the style of another author.
arXiv Detail & Related papers (2024-03-17T16:36:26Z) - Verifying the Robustness of Automatic Credibility Assessment [50.55687778699995]
We show that meaning-preserving changes in input text can mislead the models.
We also introduce BODEGA: a benchmark for testing both victim models and attack methods on misinformation detection tasks.
Our experimental results show that modern large language models are often more vulnerable to attacks than previous, smaller solutions.
arXiv Detail & Related papers (2023-03-14T16:11:47Z) - Generating Watermarked Adversarial Texts [25.285034639688377]
Adversarial example generation has been a hot spot in recent years because it can cause deep neural networks (DNNs) to misclassify the generated adversarial examples.
We present a general framework for generating watermarked adversarial text examples.
arXiv Detail & Related papers (2021-10-25T13:37:23Z) - Contrasting Human- and Machine-Generated Word-Level Adversarial Examples
for Text Classification [12.750016480098262]
We report on crowdsourcing studies in which we task humans with iteratively modifying words in an input text.
We analyze how human-generated adversarial examples compare to the recently proposed TextFooler, Genetic, BAE and SememePSO attack algorithms.
arXiv Detail & Related papers (2021-09-09T16:16:04Z) - A Differentiable Language Model Adversarial Attack on Text Classifiers [10.658675415759697]
We propose a new black-box sentence-level attack for natural language processing.
Our method fine-tunes a pre-trained language model to generate adversarial examples.
We show that the proposed attack outperforms competitors on a diverse set of NLP problems for both computed metrics and human evaluation.
arXiv Detail & Related papers (2021-07-23T14:43:13Z) - Contextualized Perturbation for Textual Adversarial Attack [56.370304308573274]
Adversarial examples expose the vulnerabilities of natural language processing (NLP) models.
This paper presents CLARE, a ContextuaLized AdversaRial Example generation model that produces fluent and grammatical outputs.
arXiv Detail & Related papers (2020-09-16T06:53:15Z) - Adversarial Watermarking Transformer: Towards Tracing Text Provenance
with Data Hiding [80.3811072650087]
We study natural language watermarking as a defense to help better mark and trace the provenance of text.
We introduce the Adversarial Watermarking Transformer (AWT) with a jointly trained encoder-decoder and adversarial training.
AWT is the first end-to-end model to hide data in text by automatically learning -- without ground truth -- word substitutions along with their locations.
arXiv Detail & Related papers (2020-09-07T11:01:24Z) - BERT-ATTACK: Adversarial Attack Against BERT Using BERT [77.82947768158132]
Adrial attacks for discrete data (such as texts) are more challenging than continuous data (such as images)
We propose textbfBERT-Attack, a high-quality and effective method to generate adversarial samples.
Our method outperforms state-of-the-art attack strategies in both success rate and perturb percentage.
arXiv Detail & Related papers (2020-04-21T13:30:02Z) - BAE: BERT-based Adversarial Examples for Text Classification [9.188318506016898]
We present BAE, a black box attack for generating adversarial examples using contextual perturbations from a BERT masked language model.
We show that BAE performs a stronger attack, in addition to generating adversarial examples with improved grammaticality and semantic coherence as compared to prior work.
arXiv Detail & Related papers (2020-04-04T16:25:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.