Towards Robustness Against Natural Language Word Substitutions
- URL: http://arxiv.org/abs/2107.13541v1
- Date: Wed, 28 Jul 2021 17:55:08 GMT
- Title: Towards Robustness Against Natural Language Word Substitutions
- Authors: Xinshuai Dong, Anh Tuan Luu, Rongrong Ji, Hong Liu
- Abstract summary: Robustness against word substitutions has a well-defined and widely acceptable form, using semantically similar words as substitutions.
Previous defense methods capture word substitutions in vector space by using either $l$-ball or hyper-rectangle.
- Score: 87.56898475512703
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Robustness against word substitutions has a well-defined and widely
acceptable form, i.e., using semantically similar words as substitutions, and
thus it is considered as a fundamental stepping-stone towards broader
robustness in natural language processing. Previous defense methods capture
word substitutions in vector space by using either $l_2$-ball or
hyper-rectangle, which results in perturbation sets that are not inclusive
enough or unnecessarily large, and thus impedes mimicry of worst cases for
robust training. In this paper, we introduce a novel \textit{Adversarial Sparse
Convex Combination} (ASCC) method. We model the word substitution attack space
as a convex hull and leverages a regularization term to enforce perturbation
towards an actual substitution, thus aligning our modeling better with the
discrete textual space. Based on the ASCC method, we further propose
ASCC-defense, which leverages ASCC to generate worst-case perturbations and
incorporates adversarial training towards robustness. Experiments show that
ASCC-defense outperforms the current state-of-the-arts in terms of robustness
on two prevailing NLP tasks, \emph{i.e.}, sentiment analysis and natural
language inference, concerning several attacks across multiple model
architectures. Besides, we also envision a new class of defense towards
robustness in NLP, where our robustly trained word vectors can be plugged into
a normally trained model and enforce its robustness without applying any other
defense techniques.
Related papers
- SemRoDe: Macro Adversarial Training to Learn Representations That are Robust to Word-Level Attacks [29.942001958562567]
We propose a novel approach called Semantic Robust Defence (SemRoDeversa) to enhance the robustness of language models.
Our method learns a robust representation that bridges these two domains.
The results demonstrate promising state-of-the-art robustness.
arXiv Detail & Related papers (2024-03-27T10:24:25Z) - Fast Adversarial Training against Textual Adversarial Attacks [11.023035222098008]
We propose a Fast Adversarial Training (FAT) method to improve the model robustness in the synonym-unaware scenario.
FAT uses single-step and multi-step gradient ascent to craft adversarial examples in the embedding space.
Experiments demonstrate that FAT significantly boosts the robustness of BERT models in the synonym-unaware scenario.
arXiv Detail & Related papers (2024-01-23T03:03:57Z) - Text-CRS: A Generalized Certified Robustness Framework against Textual Adversarial Attacks [39.51297217854375]
We propose Text-CRS, a certified robustness framework for natural language processing (NLP) based on randomized smoothing.
We show that Text-CRS can address all four different word-level adversarial operations and achieve a significant accuracy improvement.
We also provide the first benchmark on certified accuracy and radius of four word-level operations, besides outperforming the state-of-the-art certification against synonym substitution attacks.
arXiv Detail & Related papers (2023-07-31T13:08:16Z) - In and Out-of-Domain Text Adversarial Robustness via Label Smoothing [64.66809713499576]
We study the adversarial robustness provided by various label smoothing strategies in foundational models for diverse NLP tasks.
Our experiments show that label smoothing significantly improves adversarial robustness in pre-trained models like BERT, against various popular attacks.
We also analyze the relationship between prediction confidence and robustness, showing that label smoothing reduces over-confident errors on adversarial examples.
arXiv Detail & Related papers (2022-12-20T14:06:50Z) - Certified Robustness Against Natural Language Attacks by Causal
Intervention [61.62348826831147]
Causal Intervention by Semantic Smoothing (CISS) is a novel framework towards robustness against natural language attacks.
CISS is provably robust against word substitution attacks, as well as empirically robust even when perturbations are strengthened by unknown attack algorithms.
arXiv Detail & Related papers (2022-05-24T19:20:48Z) - Quantifying Robustness to Adversarial Word Substitutions [24.164523751390053]
Deep-learning-based NLP models are found to be vulnerable to word substitution perturbations.
We propose a formal framework to evaluate word-level robustness.
metric helps us figure out why state-of-the-art models like BERT can be easily fooled by a few word substitutions.
arXiv Detail & Related papers (2022-01-11T08:18:39Z) - How Should Pre-Trained Language Models Be Fine-Tuned Towards Adversarial
Robustness? [121.57551065856164]
We propose Robust Informative Fine-Tuning (RIFT) as a novel adversarial fine-tuning method from an information-theoretical perspective.
RIFT encourages an objective model to retain the features learned from the pre-trained model throughout the entire fine-tuning process.
Experimental results show that RIFT consistently outperforms the state-of-the-arts on two popular NLP tasks.
arXiv Detail & Related papers (2021-12-22T05:04:41Z) - Adaptive Feature Alignment for Adversarial Training [56.17654691470554]
CNNs are typically vulnerable to adversarial attacks, which pose a threat to security-sensitive applications.
We propose the adaptive feature alignment (AFA) to generate features of arbitrary attacking strengths.
Our method is trained to automatically align features of arbitrary attacking strength.
arXiv Detail & Related papers (2021-05-31T17:01:05Z) - APo-VAE: Text Generation in Hyperbolic Space [116.11974607497986]
In this paper, we investigate text generation in a hyperbolic latent space to learn continuous hierarchical representations.
An Adrial Poincare Variversaational Autoencoder (APo-VAE) is presented, where both the prior and variational posterior of latent variables are defined over a Poincare ball via wrapped normal distributions.
Experiments in language modeling and dialog-response generation tasks demonstrate the winning effectiveness of the proposed APo-VAE model.
arXiv Detail & Related papers (2020-04-30T19:05:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.