CAT-Gen: Improving Robustness in NLP Models via Controlled Adversarial
Text Generation
- URL: http://arxiv.org/abs/2010.02338v1
- Date: Mon, 5 Oct 2020 21:07:45 GMT
- Title: CAT-Gen: Improving Robustness in NLP Models via Controlled Adversarial
Text Generation
- Authors: Tianlu Wang, Xuezhi Wang, Yao Qin, Ben Packer, Kang Li, Jilin Chen,
Alex Beutel, Ed Chi
- Abstract summary: We present a Controlled Adversarial Text Generation (CAT-Gen) model that generates adversarial texts through controllable attributes.
Experiments on real-world NLP datasets demonstrate that our method can generate more diverse and fluent adversarial texts.
- Score: 20.27052525082402
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: NLP models are shown to suffer from robustness issues, i.e., a model's
prediction can be easily changed under small perturbations to the input. In
this work, we present a Controlled Adversarial Text Generation (CAT-Gen) model
that, given an input text, generates adversarial texts through controllable
attributes that are known to be invariant to task labels. For example, in order
to attack a model for sentiment classification over product reviews, we can use
the product categories as the controllable attribute which would not change the
sentiment of the reviews. Experiments on real-world NLP datasets demonstrate
that our method can generate more diverse and fluent adversarial texts,
compared to many existing adversarial text generation approaches. We further
use our generated adversarial examples to improve models through adversarial
training, and we demonstrate that our generated attacks are more robust against
model re-training and different model architectures.
Related papers
- Finding a Needle in the Adversarial Haystack: A Targeted Paraphrasing
Approach For Uncovering Edge Cases with Minimal Distribution Distortion [0.0]
Adversarial attacks against language models(LMs) are a significant concern.
We propose Targeted Paraphrasing via RL (TPRL), an approach to automatically learn a policy to generate challenging samples.
arXiv Detail & Related papers (2024-01-21T02:25:29Z) - SA-Attack: Improving Adversarial Transferability of Vision-Language
Pre-training Models via Self-Augmentation [56.622250514119294]
In contrast to white-box adversarial attacks, transfer attacks are more reflective of real-world scenarios.
We propose a self-augment-based transfer attack method, termed SA-Attack.
arXiv Detail & Related papers (2023-12-08T09:08:50Z) - SCAT: Robust Self-supervised Contrastive Learning via Adversarial
Training for Text Classification [15.932462099791307]
We propose a novel learning framework called SCAT (Self-supervised Contrastive Learning via Adversarial Training)
SCAT modifies random augmentations of the data in a fully labelfree manner to generate adversarial examples.
Our results show that SCAT can not only train robust language models from scratch, but it can also significantly improve the robustness of existing pre-trained language models.
arXiv Detail & Related papers (2023-07-04T05:41:31Z) - Click: Controllable Text Generation with Sequence Likelihood Contrastive
Learning [69.35360098882606]
We introduce Click for controllable text generation, which needs no modification to the model architecture.
It employs a contrastive loss on sequence likelihood, which fundamentally decreases the generation probability of negative samples.
It also adopts a novel likelihood ranking-based strategy to construct contrastive samples from model generations.
arXiv Detail & Related papers (2023-06-06T01:56:44Z) - In and Out-of-Domain Text Adversarial Robustness via Label Smoothing [64.66809713499576]
We study the adversarial robustness provided by various label smoothing strategies in foundational models for diverse NLP tasks.
Our experiments show that label smoothing significantly improves adversarial robustness in pre-trained models like BERT, against various popular attacks.
We also analyze the relationship between prediction confidence and robustness, showing that label smoothing reduces over-confident errors on adversarial examples.
arXiv Detail & Related papers (2022-12-20T14:06:50Z) - Experiments with adversarial attacks on text genres [0.0]
Neural models based on pre-trained transformers, such as BERT or XLM-RoBERTa, demonstrate SOTA results in many NLP tasks.
We show that embedding-based algorithms which can replace some of the most significant'' words with words similar to them, have the ability to influence model predictions in a significant proportion of cases.
arXiv Detail & Related papers (2021-07-05T19:37:59Z) - On the Transferability of Adversarial Attacksagainst Neural Text
Classifier [121.6758865857686]
We investigate the transferability of adversarial examples for text classification models.
We propose a genetic algorithm to find an ensemble of models that can induce adversarial examples to fool almost all existing models.
We derive word replacement rules that can be used for model diagnostics from these adversarial examples.
arXiv Detail & Related papers (2020-11-17T10:45:05Z) - Word Shape Matters: Robust Machine Translation with Visual Embedding [78.96234298075389]
We introduce a new encoding of the input symbols for character-level NLP models.
It encodes the shape of each character through the images depicting the letters when printed.
We name this new strategy visual embedding and it is expected to improve the robustness of NLP models.
arXiv Detail & Related papers (2020-10-20T04:08:03Z) - Contextualized Perturbation for Textual Adversarial Attack [56.370304308573274]
Adversarial examples expose the vulnerabilities of natural language processing (NLP) models.
This paper presents CLARE, a ContextuaLized AdversaRial Example generation model that produces fluent and grammatical outputs.
arXiv Detail & Related papers (2020-09-16T06:53:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.