In and Out-of-Domain Text Adversarial Robustness via Label Smoothing
- URL: http://arxiv.org/abs/2212.10258v2
- Date: Tue, 11 Jul 2023 19:33:44 GMT
- Title: In and Out-of-Domain Text Adversarial Robustness via Label Smoothing
- Authors: Yahan Yang, Soham Dan, Dan Roth, Insup Lee
- Abstract summary: We study the adversarial robustness provided by various label smoothing strategies in foundational models for diverse NLP tasks.
Our experiments show that label smoothing significantly improves adversarial robustness in pre-trained models like BERT, against various popular attacks.
We also analyze the relationship between prediction confidence and robustness, showing that label smoothing reduces over-confident errors on adversarial examples.
- Score: 64.66809713499576
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently it has been shown that state-of-the-art NLP models are vulnerable to
adversarial attacks, where the predictions of a model can be drastically
altered by slight modifications to the input (such as synonym substitutions).
While several defense techniques have been proposed, and adapted, to the
discrete nature of text adversarial attacks, the benefits of general-purpose
regularization methods such as label smoothing for language models, have not
been studied. In this paper, we study the adversarial robustness provided by
various label smoothing strategies in foundational models for diverse NLP tasks
in both in-domain and out-of-domain settings. Our experiments show that label
smoothing significantly improves adversarial robustness in pre-trained models
like BERT, against various popular attacks. We also analyze the relationship
between prediction confidence and robustness, showing that label smoothing
reduces over-confident errors on adversarial examples.
Related papers
- MirrorCheck: Efficient Adversarial Defense for Vision-Language Models [55.73581212134293]
We propose a novel, yet elegantly simple approach for detecting adversarial samples in Vision-Language Models.
Our method leverages Text-to-Image (T2I) models to generate images based on captions produced by target VLMs.
Empirical evaluations conducted on different datasets validate the efficacy of our approach.
arXiv Detail & Related papers (2024-06-13T15:55:04Z) - SA-Attack: Improving Adversarial Transferability of Vision-Language
Pre-training Models via Self-Augmentation [56.622250514119294]
In contrast to white-box adversarial attacks, transfer attacks are more reflective of real-world scenarios.
We propose a self-augment-based transfer attack method, termed SA-Attack.
arXiv Detail & Related papers (2023-12-08T09:08:50Z) - Advancing Adversarial Robustness Through Adversarial Logit Update [10.041289551532804]
Adversarial training and adversarial purification are among the most widely recognized defense strategies.
We propose a new principle, namely Adversarial Logit Update (ALU), to infer adversarial sample's labels.
Our solution achieves superior performance compared to state-of-the-art methods against a wide range of adversarial attacks.
arXiv Detail & Related papers (2023-08-29T07:13:31Z) - Improving Adversarial Robustness to Sensitivity and Invariance Attacks
with Deep Metric Learning [80.21709045433096]
A standard method in adversarial robustness assumes a framework to defend against samples crafted by minimally perturbing a sample.
We use metric learning to frame adversarial regularization as an optimal transport problem.
Our preliminary results indicate that regularizing over invariant perturbations in our framework improves both invariant and sensitivity defense.
arXiv Detail & Related papers (2022-11-04T13:54:02Z) - Detecting Word Sense Disambiguation Biases in Machine Translation for
Model-Agnostic Adversarial Attacks [84.61578555312288]
We introduce a method for the prediction of disambiguation errors based on statistical data properties.
We develop a simple adversarial attack strategy that minimally perturbs sentences in order to elicit disambiguation errors.
Our findings indicate that disambiguation robustness varies substantially between domains and that different models trained on the same data are vulnerable to different attacks.
arXiv Detail & Related papers (2020-11-03T17:01:44Z) - CAT-Gen: Improving Robustness in NLP Models via Controlled Adversarial
Text Generation [20.27052525082402]
We present a Controlled Adversarial Text Generation (CAT-Gen) model that generates adversarial texts through controllable attributes.
Experiments on real-world NLP datasets demonstrate that our method can generate more diverse and fluent adversarial texts.
arXiv Detail & Related papers (2020-10-05T21:07:45Z) - Fundamental Tradeoffs between Invariance and Sensitivity to Adversarial
Perturbations [65.05561023880351]
Adversarial examples are malicious inputs crafted to induce misclassification.
This paper studies a complementary failure mode, invariance-based adversarial examples.
We show that defenses against sensitivity-based attacks actively harm a model's accuracy on invariance-based attacks.
arXiv Detail & Related papers (2020-02-11T18:50:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.