Robust Textual Embedding against Word-level Adversarial Attacks
- URL: http://arxiv.org/abs/2202.13817v1
- Date: Mon, 28 Feb 2022 14:25:00 GMT
- Title: Robust Textual Embedding against Word-level Adversarial Attacks
- Authors: Yichen Yang, Xiaosen Wang, Kun He
- Abstract summary: We propose a novel robust training method, termed Fast Triplet Metric Learning (FTML)
We show that FTML can significantly promote the model robustness against various advanced adversarial attacks.
Our work shows the great potential of improving the textual robustness through robust word embedding.
- Score: 15.235449552083043
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We attribute the vulnerability of natural language processing models to the
fact that similar inputs are converted to dissimilar representations in the
embedding space, leading to inconsistent outputs, and propose a novel robust
training method, termed Fast Triplet Metric Learning (FTML). Specifically, we
argue that the original sample should have similar representation with its
adversarial counterparts and distinguish its representation from other samples
for better robustness. To this end, we adopt the triplet metric learning into
the standard training to pull the words closer to their positive samples (i.e.,
synonyms) and push away their negative samples (i.e., non-synonyms) in the
embedding space. Extensive experiments demonstrate that FTML can significantly
promote the model robustness against various advanced adversarial attacks while
keeping competitive classification accuracy on original samples. Besides, our
method is efficient as it only needs to adjust the embedding and introduces
very little overhead on the standard training. Our work shows the great
potential of improving the textual robustness through robust word embedding.
Related papers
- SemRoDe: Macro Adversarial Training to Learn Representations That are Robust to Word-Level Attacks [29.942001958562567]
We propose a novel approach called Semantic Robust Defence (SemRoDeversa) to enhance the robustness of language models.
Our method learns a robust representation that bridges these two domains.
The results demonstrate promising state-of-the-art robustness.
arXiv Detail & Related papers (2024-03-27T10:24:25Z) - Fast Adversarial Training against Textual Adversarial Attacks [11.023035222098008]
We propose a Fast Adversarial Training (FAT) method to improve the model robustness in the synonym-unaware scenario.
FAT uses single-step and multi-step gradient ascent to craft adversarial examples in the embedding space.
Experiments demonstrate that FAT significantly boosts the robustness of BERT models in the synonym-unaware scenario.
arXiv Detail & Related papers (2024-01-23T03:03:57Z) - Noisy Correspondence Learning with Self-Reinforcing Errors Mitigation [63.180725016463974]
Cross-modal retrieval relies on well-matched large-scale datasets that are laborious in practice.
We introduce a novel noisy correspondence learning framework, namely textbfSelf-textbfReinforcing textbfErrors textbfMitigation (SREM)
arXiv Detail & Related papers (2023-12-27T09:03:43Z) - In and Out-of-Domain Text Adversarial Robustness via Label Smoothing [64.66809713499576]
We study the adversarial robustness provided by various label smoothing strategies in foundational models for diverse NLP tasks.
Our experiments show that label smoothing significantly improves adversarial robustness in pre-trained models like BERT, against various popular attacks.
We also analyze the relationship between prediction confidence and robustness, showing that label smoothing reduces over-confident errors on adversarial examples.
arXiv Detail & Related papers (2022-12-20T14:06:50Z) - Language Model Pre-training on True Negatives [109.73819321246062]
Discriminative pre-trained language models (PLMs) learn to predict original texts from intentionally corrupted ones.
Existing PLMs simply treat all corrupted texts as equal negative without any examination.
We design enhanced pre-training methods to counteract false negative predictions and encourage pre-training language models on true negatives.
arXiv Detail & Related papers (2022-12-01T12:24:19Z) - Improving Gradient-based Adversarial Training for Text Classification by
Contrastive Learning and Auto-Encoder [18.375585982984845]
We focus on enhancing the model's ability to defend gradient-based adversarial attack during the model's training process.
We propose two novel adversarial training approaches: CARL and RAR.
Experiments show that the proposed two approaches outperform strong baselines on various text classification datasets.
arXiv Detail & Related papers (2021-09-14T09:08:58Z) - Self-Supervised Contrastive Learning with Adversarial Perturbations for
Robust Pretrained Language Models [18.726529370845256]
This paper improves the robustness of the pretrained language model BERT against word substitution-based adversarial attacks.
We also create an adversarial attack for word-level adversarial training on BERT.
arXiv Detail & Related papers (2021-07-15T21:03:34Z) - Disentangled Contrastive Learning for Learning Robust Textual
Representations [13.880693856907037]
We introduce the concept of momentum representation consistency to align features and leverage power normalization while conforming the uniformity.
Our experimental results for the NLP benchmarks demonstrate that our approach can obtain better results compared with the baselines.
arXiv Detail & Related papers (2021-04-11T03:32:49Z) - Contextualized Perturbation for Textual Adversarial Attack [56.370304308573274]
Adversarial examples expose the vulnerabilities of natural language processing (NLP) models.
This paper presents CLARE, a ContextuaLized AdversaRial Example generation model that produces fluent and grammatical outputs.
arXiv Detail & Related papers (2020-09-16T06:53:15Z) - Adversarial Self-Supervised Contrastive Learning [62.17538130778111]
Existing adversarial learning approaches mostly use class labels to generate adversarial samples that lead to incorrect predictions.
We propose a novel adversarial attack for unlabeled data, which makes the model confuse the instance-level identities of the perturbed data samples.
We present a self-supervised contrastive learning framework to adversarially train a robust neural network without labeled data.
arXiv Detail & Related papers (2020-06-13T08:24:33Z) - BERT-ATTACK: Adversarial Attack Against BERT Using BERT [77.82947768158132]
Adrial attacks for discrete data (such as texts) are more challenging than continuous data (such as images)
We propose textbfBERT-Attack, a high-quality and effective method to generate adversarial samples.
Our method outperforms state-of-the-art attack strategies in both success rate and perturb percentage.
arXiv Detail & Related papers (2020-04-21T13:30:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.