A Study on FGSM Adversarial Training for Neural Retrieval
- URL: http://arxiv.org/abs/2301.10576v1
- Date: Wed, 25 Jan 2023 13:28:54 GMT
- Title: A Study on FGSM Adversarial Training for Neural Retrieval
- Authors: Simon Lupart and St\'ephane Clinchant
- Abstract summary: Neural retrieval models have acquired significant effectiveness gains over the last few years compared to term-based methods.
However, those models may be brittle when faced to typos, distribution shifts or vulnerable to malicious attacks.
We show that one of the most simple adversarial training techniques -- the Fast Gradient Sign Method (FGSM) -- can improve first stage rankers robustness and effectiveness.
- Score: 3.2634122554914
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Neural retrieval models have acquired significant effectiveness gains over
the last few years compared to term-based methods. Nevertheless, those models
may be brittle when faced to typos, distribution shifts or vulnerable to
malicious attacks. For instance, several recent papers demonstrated that such
variations severely impacted models performances, and then tried to train more
resilient models. Usual approaches include synonyms replacements or typos
injections -- as data-augmentation -- and the use of more robust tokenizers
(characterBERT, BPE-dropout). To further complement the literature, we
investigate in this paper adversarial training as another possible solution to
this robustness issue. Our comparison includes the two main families of
BERT-based neural retrievers, i.e. dense and sparse, with and without
distillation techniques. We then demonstrate that one of the most simple
adversarial training techniques -- the Fast Gradient Sign Method (FGSM) -- can
improve first stage rankers robustness and effectiveness. In particular, FGSM
increases models performances on both in-domain and out-of-domain
distributions, and also on queries with typos, for multiple neural retrievers.
Related papers
- Depression detection in social media posts using transformer-based models and auxiliary features [6.390468088226495]
Detection of depression in social media posts is crucial due to the increasing prevalence of mental health issues.
Traditional machine learning algorithms often fail to capture intricate textual patterns, limiting their effectiveness in identifying depression.
This research proposes a neural network architecture leveraging transformer-based models combined with metadata and linguistic markers.
arXiv Detail & Related papers (2024-09-30T07:53:39Z) - Adversarial Robustification via Text-to-Image Diffusion Models [56.37291240867549]
Adrial robustness has been conventionally believed as a challenging property to encode for neural networks.
We develop a scalable and model-agnostic solution to achieve adversarial robustness without using any data.
arXiv Detail & Related papers (2024-07-26T10:49:14Z) - Purify Unlearnable Examples via Rate-Constrained Variational Autoencoders [101.42201747763178]
Unlearnable examples (UEs) seek to maximize testing error by making subtle modifications to training examples that are correctly labeled.
Our work provides a novel disentanglement mechanism to build an efficient pre-training purification method.
arXiv Detail & Related papers (2024-05-02T16:49:25Z) - Advancing Adversarial Robustness Through Adversarial Logit Update [10.041289551532804]
Adversarial training and adversarial purification are among the most widely recognized defense strategies.
We propose a new principle, namely Adversarial Logit Update (ALU), to infer adversarial sample's labels.
Our solution achieves superior performance compared to state-of-the-art methods against a wide range of adversarial attacks.
arXiv Detail & Related papers (2023-08-29T07:13:31Z) - TWINS: A Fine-Tuning Framework for Improved Transferability of
Adversarial Robustness and Generalization [89.54947228958494]
This paper focuses on the fine-tuning of an adversarially pre-trained model in various classification tasks.
We propose a novel statistics-based approach, Two-WIng NormliSation (TWINS) fine-tuning framework.
TWINS is shown to be effective on a wide range of image classification datasets in terms of both generalization and robustness.
arXiv Detail & Related papers (2023-03-20T14:12:55Z) - In and Out-of-Domain Text Adversarial Robustness via Label Smoothing [64.66809713499576]
We study the adversarial robustness provided by various label smoothing strategies in foundational models for diverse NLP tasks.
Our experiments show that label smoothing significantly improves adversarial robustness in pre-trained models like BERT, against various popular attacks.
We also analyze the relationship between prediction confidence and robustness, showing that label smoothing reduces over-confident errors on adversarial examples.
arXiv Detail & Related papers (2022-12-20T14:06:50Z) - Distributed Adversarial Training to Robustify Deep Neural Networks at
Scale [100.19539096465101]
Current deep neural networks (DNNs) are vulnerable to adversarial attacks, where adversarial perturbations to the inputs can change or manipulate classification.
To defend against such attacks, an effective approach, known as adversarial training (AT), has been shown to mitigate robust training.
We propose a large-batch adversarial training framework implemented over multiple machines.
arXiv Detail & Related papers (2022-06-13T15:39:43Z) - How Should Pre-Trained Language Models Be Fine-Tuned Towards Adversarial
Robustness? [121.57551065856164]
We propose Robust Informative Fine-Tuning (RIFT) as a novel adversarial fine-tuning method from an information-theoretical perspective.
RIFT encourages an objective model to retain the features learned from the pre-trained model throughout the entire fine-tuning process.
Experimental results show that RIFT consistently outperforms the state-of-the-arts on two popular NLP tasks.
arXiv Detail & Related papers (2021-12-22T05:04:41Z) - Improving Gradient-based Adversarial Training for Text Classification by
Contrastive Learning and Auto-Encoder [18.375585982984845]
We focus on enhancing the model's ability to defend gradient-based adversarial attack during the model's training process.
We propose two novel adversarial training approaches: CARL and RAR.
Experiments show that the proposed two approaches outperform strong baselines on various text classification datasets.
arXiv Detail & Related papers (2021-09-14T09:08:58Z) - DVERGE: Diversifying Vulnerabilities for Enhanced Robust Generation of
Ensembles [20.46399318111058]
Adversarial attacks can mislead CNN models with small perturbations, which can effectively transfer between different models trained on the same dataset.
We propose DVERGE, which isolates the adversarial vulnerability in each sub-model by distilling non-robust features.
The novel diversity metric and training procedure enables DVERGE to achieve higher robustness against transfer attacks.
arXiv Detail & Related papers (2020-09-30T14:57:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.