BAE: BERT-based Adversarial Examples for Text Classification
- URL: http://arxiv.org/abs/2004.01970v3
- Date: Thu, 8 Oct 2020 00:41:43 GMT
- Title: BAE: BERT-based Adversarial Examples for Text Classification
- Authors: Siddhant Garg, Goutham Ramakrishnan
- Abstract summary: We present BAE, a black box attack for generating adversarial examples using contextual perturbations from a BERT masked language model.
We show that BAE performs a stronger attack, in addition to generating adversarial examples with improved grammaticality and semantic coherence as compared to prior work.
- Score: 9.188318506016898
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Modern text classification models are susceptible to adversarial examples,
perturbed versions of the original text indiscernible by humans which get
misclassified by the model. Recent works in NLP use rule-based synonym
replacement strategies to generate adversarial examples. These strategies can
lead to out-of-context and unnaturally complex token replacements, which are
easily identifiable by humans. We present BAE, a black box attack for
generating adversarial examples using contextual perturbations from a BERT
masked language model. BAE replaces and inserts tokens in the original text by
masking a portion of the text and leveraging the BERT-MLM to generate
alternatives for the masked tokens. Through automatic and human evaluations, we
show that BAE performs a stronger attack, in addition to generating adversarial
examples with improved grammaticality and semantic coherence as compared to
prior work.
Related papers
- Arabic Synonym BERT-based Adversarial Examples for Text Classification [0.0]
This paper introduces the first word-level study of adversarial attacks in Arabic.
We assess the robustness of the state-of-the-art text classification models to adversarial attacks in Arabic.
We study the transferability of these newly produced Arabic adversarial examples to various models and investigate the effectiveness of defense mechanisms.
arXiv Detail & Related papers (2024-02-05T19:39:07Z) - Verifying the Robustness of Automatic Credibility Assessment [50.55687778699995]
We show that meaning-preserving changes in input text can mislead the models.
We also introduce BODEGA: a benchmark for testing both victim models and attack methods on misinformation detection tasks.
Our experimental results show that modern large language models are often more vulnerable to attacks than previous, smaller solutions.
arXiv Detail & Related papers (2023-03-14T16:11:47Z) - Contrasting Human- and Machine-Generated Word-Level Adversarial Examples
for Text Classification [12.750016480098262]
We report on crowdsourcing studies in which we task humans with iteratively modifying words in an input text.
We analyze how human-generated adversarial examples compare to the recently proposed TextFooler, Genetic, BAE and SememePSO attack algorithms.
arXiv Detail & Related papers (2021-09-09T16:16:04Z) - Towards Variable-Length Textual Adversarial Attacks [68.27995111870712]
It is non-trivial to conduct textual adversarial attacks on natural language processing tasks due to the discreteness of data.
In this paper, we propose variable-length textual adversarial attacks(VL-Attack)
Our method can achieve $33.18$ BLEU score on IWSLT14 German-English translation, achieving an improvement of $1.47$ over the baseline model.
arXiv Detail & Related papers (2021-04-16T14:37:27Z) - Generating Adversarial Examples in Chinese Texts Using Sentence-Pieces [60.58900627906269]
We propose a pre-train language model as the substitutes generator using sentence-pieces to craft adversarial examples in Chinese.
The substitutions in the generated adversarial examples are not characters or words but textit'pieces', which are more natural to Chinese readers.
arXiv Detail & Related papers (2020-12-29T14:28:07Z) - Contextualized Perturbation for Textual Adversarial Attack [56.370304308573274]
Adversarial examples expose the vulnerabilities of natural language processing (NLP) models.
This paper presents CLARE, a ContextuaLized AdversaRial Example generation model that produces fluent and grammatical outputs.
arXiv Detail & Related papers (2020-09-16T06:53:15Z) - BERT-ATTACK: Adversarial Attack Against BERT Using BERT [77.82947768158132]
Adrial attacks for discrete data (such as texts) are more challenging than continuous data (such as images)
We propose textbfBERT-Attack, a high-quality and effective method to generate adversarial samples.
Our method outperforms state-of-the-art attack strategies in both success rate and perturb percentage.
arXiv Detail & Related papers (2020-04-21T13:30:02Z) - PALM: Pre-training an Autoencoding&Autoregressive Language Model for
Context-conditioned Generation [92.7366819044397]
Self-supervised pre-training has emerged as a powerful technique for natural language understanding and generation.
This work presents PALM with a novel scheme that jointly pre-trains an autoencoding and autoregressive language model on a large unlabeled corpus.
An extensive set of experiments show that PALM achieves new state-of-the-art results on a variety of language generation benchmarks.
arXiv Detail & Related papers (2020-04-14T06:25:36Z) - Generating Natural Language Adversarial Examples on a Large Scale with
Generative Models [41.85006993382117]
We propose an end to end solution to efficiently generate adversarial texts from scratch using generative models.
Specifically, we train a conditional variational autoencoder with an additional adversarial loss to guide the generation of adversarial examples.
To improve the validity of adversarial texts, we utilize discrimators and the training framework of generative adversarial networks.
arXiv Detail & Related papers (2020-03-10T03:21:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.