AdvExpander: Generating Natural Language Adversarial Examples by
Expanding Text
- URL: http://arxiv.org/abs/2012.10235v1
- Date: Fri, 18 Dec 2020 13:50:17 GMT
- Title: AdvExpander: Generating Natural Language Adversarial Examples by
Expanding Text
- Authors: Zhihong Shao, Zitao Liu, Jiyong Zhang, Zhongqin Wu, Minlie Huang
- Abstract summary: We present AdvExpander, a method that crafts new adversarial examples by expanding text.
We first utilize linguistic rules to determine which constituents to expand.
We then expand each constituent by inserting an adversarial modifier searched from a CVAE-based generative model.
- Score: 39.09728700494304
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Adversarial examples are vital to expose the vulnerability of machine
learning models. Despite the success of the most popular substitution-based
methods which substitutes some characters or words in the original examples,
only substitution is insufficient to uncover all robustness issues of models.
In this paper, we present AdvExpander, a method that crafts new adversarial
examples by expanding text, which is complementary to previous
substitution-based methods. We first utilize linguistic rules to determine
which constituents to expand and what types of modifiers to expand with. We
then expand each constituent by inserting an adversarial modifier searched from
a CVAE-based generative model which is pre-trained on a large scale corpus. To
search adversarial modifiers, we directly search adversarial latent codes in
the latent space without tuning the pre-trained parameters. To ensure that our
adversarial examples are label-preserving for text matching, we also constrain
the modifications with a heuristic rule. Experiments on three classification
tasks verify the effectiveness of AdvExpander and the validity of our
adversarial examples. AdvExpander crafts a new type of adversarial examples by
text expansion, thereby promising to reveal new robustness issues.
Related papers
- Attacking Misinformation Detection Using Adversarial Examples Generated by Language Models [0.0]
We investigate the challenge of generating adversarial examples to test the robustness of text classification algorithms.
We focus on simulation of content moderation by setting realistic limits on the number of queries an attacker is allowed to attempt.
arXiv Detail & Related papers (2024-10-28T11:46:30Z) - A Constraint-Enforcing Reward for Adversarial Attacks on Text Classifiers [10.063169009242682]
We train an encoder-decoder paraphrase model to generate adversarial examples.
We adopt a reinforcement learning algorithm and propose a constraint-enforcing reward.
We show how key design choices impact the generated examples and discuss the strengths and weaknesses of the proposed approach.
arXiv Detail & Related papers (2024-05-20T09:33:43Z) - Verifying the Robustness of Automatic Credibility Assessment [50.55687778699995]
We show that meaning-preserving changes in input text can mislead the models.
We also introduce BODEGA: a benchmark for testing both victim models and attack methods on misinformation detection tasks.
Our experimental results show that modern large language models are often more vulnerable to attacks than previous, smaller solutions.
arXiv Detail & Related papers (2023-03-14T16:11:47Z) - Towards Variable-Length Textual Adversarial Attacks [68.27995111870712]
It is non-trivial to conduct textual adversarial attacks on natural language processing tasks due to the discreteness of data.
In this paper, we propose variable-length textual adversarial attacks(VL-Attack)
Our method can achieve $33.18$ BLEU score on IWSLT14 German-English translation, achieving an improvement of $1.47$ over the baseline model.
arXiv Detail & Related papers (2021-04-16T14:37:27Z) - Generating Adversarial Examples in Chinese Texts Using Sentence-Pieces [60.58900627906269]
We propose a pre-train language model as the substitutes generator using sentence-pieces to craft adversarial examples in Chinese.
The substitutions in the generated adversarial examples are not characters or words but textit'pieces', which are more natural to Chinese readers.
arXiv Detail & Related papers (2020-12-29T14:28:07Z) - Contextualized Perturbation for Textual Adversarial Attack [56.370304308573274]
Adversarial examples expose the vulnerabilities of natural language processing (NLP) models.
This paper presents CLARE, a ContextuaLized AdversaRial Example generation model that produces fluent and grammatical outputs.
arXiv Detail & Related papers (2020-09-16T06:53:15Z) - Adversarial Watermarking Transformer: Towards Tracing Text Provenance
with Data Hiding [80.3811072650087]
We study natural language watermarking as a defense to help better mark and trace the provenance of text.
We introduce the Adversarial Watermarking Transformer (AWT) with a jointly trained encoder-decoder and adversarial training.
AWT is the first end-to-end model to hide data in text by automatically learning -- without ground truth -- word substitutions along with their locations.
arXiv Detail & Related papers (2020-09-07T11:01:24Z) - BAE: BERT-based Adversarial Examples for Text Classification [9.188318506016898]
We present BAE, a black box attack for generating adversarial examples using contextual perturbations from a BERT masked language model.
We show that BAE performs a stronger attack, in addition to generating adversarial examples with improved grammaticality and semantic coherence as compared to prior work.
arXiv Detail & Related papers (2020-04-04T16:25:48Z) - Generating Natural Language Adversarial Examples on a Large Scale with
Generative Models [41.85006993382117]
We propose an end to end solution to efficiently generate adversarial texts from scratch using generative models.
Specifically, we train a conditional variational autoencoder with an additional adversarial loss to guide the generation of adversarial examples.
To improve the validity of adversarial texts, we utilize discrimators and the training framework of generative adversarial networks.
arXiv Detail & Related papers (2020-03-10T03:21:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.