Related papers: BAE: BERT-based Adversarial Examples for Text Classification

BAE: BERT-based Adversarial Examples for Text Classification

URL: http://arxiv.org/abs/2004.01970v3
Date: Thu, 8 Oct 2020 00:41:43 GMT
Title: BAE: BERT-based Adversarial Examples for Text Classification
Authors: Siddhant Garg, Goutham Ramakrishnan
Abstract summary: We present BAE, a black box attack for generating adversarial examples using contextual perturbations from a BERT masked language model. We show that BAE performs a stronger attack, in addition to generating adversarial examples with improved grammaticality and semantic coherence as compared to prior work.
Score: 9.188318506016898
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Modern text classification models are susceptible to adversarial examples, perturbed versions of the original text indiscernible by humans which get misclassified by the model. Recent works in NLP use rule-based synonym replacement strategies to generate adversarial examples. These strategies can lead to out-of-context and unnaturally complex token replacements, which are easily identifiable by humans. We present BAE, a black box attack for generating adversarial examples using contextual perturbations from a BERT masked language model. BAE replaces and inserts tokens in the original text by masking a portion of the text and leveraging the BERT-MLM to generate alternatives for the masked tokens. Through automatic and human evaluations, we show that BAE performs a stronger attack, in addition to generating adversarial examples with improved grammaticality and semantic coherence as compared to prior work.

Related papers

Arabic Synonym BERT-based Adversarial Examples for Text Classification [0.0]
This paper introduces the first word-level study of adversarial attacks in Arabic. We assess the robustness of the state-of-the-art text classification models to adversarial attacks in Arabic. We study the transferability of these newly produced Arabic adversarial examples to various models and investigate the effectiveness of defense mechanisms.
arXiv Detail & Related papers (2024-02-05T19:39:07Z)
Verifying the Robustness of Automatic Credibility Assessment [50.55687778699995]
We show that meaning-preserving changes in input text can mislead the models. We also introduce BODEGA: a benchmark for testing both victim models and attack methods on misinformation detection tasks. Our experimental results show that modern large language models are often more vulnerable to attacks than previous, smaller solutions.
arXiv Detail & Related papers (2023-03-14T16:11:47Z)
Generating Watermarked Adversarial Texts [25.285034639688377]
Adversarial example generation has been a hot spot in recent years because it can cause deep neural networks (DNNs) to misclassify the generated adversarial examples. We present a general framework for generating watermarked adversarial text examples.
arXiv Detail & Related papers (2021-10-25T13:37:23Z)
Contrasting Human- and Machine-Generated Word-Level Adversarial Examples for Text Classification [12.750016480098262]
We report on crowdsourcing studies in which we task humans with iteratively modifying words in an input text. We analyze how human-generated adversarial examples compare to the recently proposed TextFooler, Genetic, BAE and SememePSO attack algorithms.
arXiv Detail & Related papers (2021-09-09T16:16:04Z)
Towards Variable-Length Textual Adversarial Attacks [68.27995111870712]
It is non-trivial to conduct textual adversarial attacks on natural language processing tasks due to the discreteness of data. In this paper, we propose variable-length textual adversarial attacks(VL-Attack) Our method can achieve $33.18$ BLEU score on IWSLT14 German-English translation, achieving an improvement of $1.47$ over the baseline model.
arXiv Detail & Related papers (2021-04-16T14:37:27Z)
Generating Adversarial Examples in Chinese Texts Using Sentence-Pieces [60.58900627906269]
We propose a pre-train language model as the substitutes generator using sentence-pieces to craft adversarial examples in Chinese. The substitutions in the generated adversarial examples are not characters or words but textit'pieces', which are more natural to Chinese readers.
arXiv Detail & Related papers (2020-12-29T14:28:07Z)
Contextualized Perturbation for Textual Adversarial Attack [56.370304308573274]
Adversarial examples expose the vulnerabilities of natural language processing (NLP) models. This paper presents CLARE, a ContextuaLized AdversaRial Example generation model that produces fluent and grammatical outputs.
arXiv Detail & Related papers (2020-09-16T06:53:15Z)
BERT-ATTACK: Adversarial Attack Against BERT Using BERT [77.82947768158132]
Adrial attacks for discrete data (such as texts) are more challenging than continuous data (such as images) We propose textbfBERT-Attack, a high-quality and effective method to generate adversarial samples. Our method outperforms state-of-the-art attack strategies in both success rate and perturb percentage.
arXiv Detail & Related papers (2020-04-21T13:30:02Z)
PALM: Pre-training an Autoencoding&Autoregressive Language Model for Context-conditioned Generation [92.7366819044397]
Self-supervised pre-training has emerged as a powerful technique for natural language understanding and generation. This work presents PALM with a novel scheme that jointly pre-trains an autoencoding and autoregressive language model on a large unlabeled corpus. An extensive set of experiments show that PALM achieves new state-of-the-art results on a variety of language generation benchmarks.
arXiv Detail & Related papers (2020-04-14T06:25:36Z)
Generating Natural Language Adversarial Examples on a Large Scale with Generative Models [41.85006993382117]
We propose an end to end solution to efficiently generate adversarial texts from scratch using generative models. Specifically, we train a conditional variational autoencoder with an additional adversarial loss to guide the generation of adversarial examples. To improve the validity of adversarial texts, we utilize discrimators and the training framework of generative adversarial networks.
arXiv Detail & Related papers (2020-03-10T03:21:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.