A Survey of Adversarial Defences and Robustness in NLP
- URL: http://arxiv.org/abs/2203.06414v4
- Date: Tue, 18 Apr 2023 05:00:29 GMT
- Title: A Survey of Adversarial Defences and Robustness in NLP
- Authors: Shreya Goyal, Sumanth Doddapaneni, Mitesh M.Khapra, Balaraman
Ravindran
- Abstract summary: It has become increasingly evident that deep neural networks are not resilient enough to withstand adversarial perturbations in input data.
Several methods for adversarial defense in NLP have been proposed, catering to different NLP tasks.
This survey aims to review the various methods proposed for adversarial defenses in NLP over the past few years by introducing a novel taxonomy.
- Score: 26.299507152320494
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In the past few years, it has become increasingly evident that deep neural
networks are not resilient enough to withstand adversarial perturbations in
input data, leaving them vulnerable to attack. Various authors have proposed
strong adversarial attacks for computer vision and Natural Language Processing
(NLP) tasks. As a response, many defense mechanisms have also been proposed to
prevent these networks from failing. The significance of defending neural
networks against adversarial attacks lies in ensuring that the model's
predictions remain unchanged even if the input data is perturbed. Several
methods for adversarial defense in NLP have been proposed, catering to
different NLP tasks such as text classification, named entity recognition, and
natural language inference. Some of these methods not only defend neural
networks against adversarial attacks but also act as a regularization mechanism
during training, saving the model from overfitting. This survey aims to review
the various methods proposed for adversarial defenses in NLP over the past few
years by introducing a novel taxonomy. The survey also highlights the fragility
of advanced deep neural networks in NLP and the challenges involved in
defending them.
Related papers
- Late Breaking Results: Fortifying Neural Networks: Safeguarding Against Adversarial Attacks with Stochastic Computing [1.523100574874007]
In neural network (NN) security, safeguarding model integrity and resilience against adversarial attacks has become paramount.
This study investigates the application of computing (SC) as a novel mechanism to fortify NN models.
arXiv Detail & Related papers (2024-07-05T20:49:32Z) - Adversarial Attacks and Defenses in Machine Learning-Powered Networks: A
Contemporary Survey [114.17568992164303]
Adrial attacks and defenses in machine learning and deep neural network have been gaining significant attention.
This survey provides a comprehensive overview of the recent advancements in the field of adversarial attack and defense techniques.
New avenues of attack are also explored, including search-based, decision-based, drop-based, and physical-world attacks.
arXiv Detail & Related papers (2023-03-11T04:19:31Z) - Unfolding Local Growth Rate Estimates for (Almost) Perfect Adversarial
Detection [22.99930028876662]
Convolutional neural networks (CNN) define the state-of-the-art solution on many perceptual tasks.
Current CNN approaches largely remain vulnerable against adversarial perturbations of the input that have been crafted specifically to fool the system.
We propose a simple and light-weight detector, which leverages recent findings on the relation between networks' local intrinsic dimensionality (LID) and adversarial attacks.
arXiv Detail & Related papers (2022-12-13T17:51:32Z) - Searching for the Essence of Adversarial Perturbations [73.96215665913797]
We show that adversarial perturbations contain human-recognizable information, which is the key conspirator responsible for a neural network's erroneous prediction.
This concept of human-recognizable information allows us to explain key features related to adversarial perturbations.
arXiv Detail & Related papers (2022-05-30T18:04:57Z) - Searching for an Effective Defender: Benchmarking Defense against
Adversarial Word Substitution [83.84968082791444]
Deep neural networks are vulnerable to intentionally crafted adversarial examples.
Various methods have been proposed to defend against adversarial word-substitution attacks for neural NLP models.
arXiv Detail & Related papers (2021-08-29T08:11:36Z) - Increasing the Confidence of Deep Neural Networks by Coverage Analysis [71.57324258813674]
This paper presents a lightweight monitoring architecture based on coverage paradigms to enhance the model against different unsafe inputs.
Experimental results show that the proposed approach is effective in detecting both powerful adversarial examples and out-of-distribution inputs.
arXiv Detail & Related papers (2021-01-28T16:38:26Z) - Adversarial Attack and Defense of Structured Prediction Models [58.49290114755019]
In this paper, we investigate attacks and defenses for structured prediction tasks in NLP.
The structured output of structured prediction models is sensitive to small perturbations in the input.
We propose a novel and unified framework that learns to attack a structured prediction model using a sequence-to-sequence model.
arXiv Detail & Related papers (2020-10-04T15:54:03Z) - Optimizing Information Loss Towards Robust Neural Networks [0.0]
Neural Networks (NNs) are vulnerable to adversarial examples.
We present a new training approach we call textitentropic retraining.
Based on an information-theoretic-inspired analysis, entropic retraining mimics the effects of adversarial training without the need of the laborious generation of adversarial examples.
arXiv Detail & Related papers (2020-08-07T10:12:31Z) - Defense against Adversarial Attacks in NLP via Dirichlet Neighborhood
Ensemble [163.3333439344695]
Dirichlet Neighborhood Ensemble (DNE) is a randomized smoothing method for training a robust model to defense substitution-based attacks.
DNE forms virtual sentences by sampling embedding vectors for each word in an input sentence from a convex hull spanned by the word and its synonyms, and it augments them with the training data.
We demonstrate through extensive experimentation that our method consistently outperforms recently proposed defense methods by a significant margin across different network architectures and multiple data sets.
arXiv Detail & Related papers (2020-06-20T18:01:16Z) - Defense of Word-level Adversarial Attacks via Random Substitution
Encoding [0.5964792400314836]
adversarial attacks against deep neural networks on computer vision tasks have spawned many new technologies that help protect models from avoiding false predictions.
Recently, word-level adversarial attacks on deep models of Natural Language Processing (NLP) tasks have also demonstrated strong power, e.g., fooling a sentiment classification neural network to make wrong decisions.
We propose a novel framework called Random Substitution RSE, which introduces a random substitution into the training process of original neural networks.
arXiv Detail & Related papers (2020-05-01T15:28:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.