Defense of Word-level Adversarial Attacks via Random Substitution
Encoding
- URL: http://arxiv.org/abs/2005.00446v2
- Date: Fri, 12 Jun 2020 05:55:34 GMT
- Title: Defense of Word-level Adversarial Attacks via Random Substitution
Encoding
- Authors: Zhaoyang Wang and Hongtao Wang
- Abstract summary: adversarial attacks against deep neural networks on computer vision tasks have spawned many new technologies that help protect models from avoiding false predictions.
Recently, word-level adversarial attacks on deep models of Natural Language Processing (NLP) tasks have also demonstrated strong power, e.g., fooling a sentiment classification neural network to make wrong decisions.
We propose a novel framework called Random Substitution RSE, which introduces a random substitution into the training process of original neural networks.
- Score: 0.5964792400314836
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The adversarial attacks against deep neural networks on computer vision tasks
have spawned many new technologies that help protect models from avoiding false
predictions. Recently, word-level adversarial attacks on deep models of Natural
Language Processing (NLP) tasks have also demonstrated strong power, e.g.,
fooling a sentiment classification neural network to make wrong decisions.
Unfortunately, few previous literatures have discussed the defense of such
word-level synonym substitution based attacks since they are hard to be
perceived and detected. In this paper, we shed light on this problem and
propose a novel defense framework called Random Substitution Encoding (RSE),
which introduces a random substitution encoder into the training process of
original neural networks. Extensive experiments on text classification tasks
demonstrate the effectiveness of our framework on defense of word-level
adversarial attacks, under various base and attack models.
Related papers
- GenFighter: A Generative and Evolutive Textual Attack Removal [6.044610337297754]
Adrial attacks pose significant challenges to deep neural networks (DNNs) such as Transformer models in natural language processing (NLP)
This paper introduces a novel defense strategy, called GenFighter, which enhances adversarial robustness by learning and reasoning on the training classification distribution.
We show that GenFighter outperforms state-of-the-art defenses in accuracy under attack and attack success rate metrics.
arXiv Detail & Related papers (2024-04-17T16:32:13Z) - Meta Invariance Defense Towards Generalizable Robustness to Unknown Adversarial Attacks [62.036798488144306]
Current defense mainly focuses on the known attacks, but the adversarial robustness to the unknown attacks is seriously overlooked.
We propose an attack-agnostic defense method named Meta Invariance Defense (MID)
We show that MID simultaneously achieves robustness to the imperceptible adversarial perturbations in high-level image classification and attack-suppression in low-level robust image regeneration.
arXiv Detail & Related papers (2024-04-04T10:10:38Z) - TextDefense: Adversarial Text Detection based on Word Importance Entropy [38.632552667871295]
We propose TextDefense, a new adversarial example detection framework for NLP models.
Our experiments show that TextDefense can be applied to different architectures, datasets, and attack methods.
We provide our insights into the adversarial attacks in NLP and the principles of our defense method.
arXiv Detail & Related papers (2023-02-12T11:12:44Z) - RSD-GAN: Regularized Sobolev Defense GAN Against Speech-to-Text
Adversarial Attacks [9.868221447090853]
This paper introduces a new synthesis-based defense algorithm for counteracting adversarial attacks developed for challenging the performance of speech-to-text transcription systems.
Our algorithm implements a Sobolev-based GAN and proposes a novel regularizer for effectively controlling over the functionality of the entire generative model.
arXiv Detail & Related papers (2022-07-14T12:22:19Z) - A Survey of Adversarial Defences and Robustness in NLP [26.299507152320494]
It has become increasingly evident that deep neural networks are not resilient enough to withstand adversarial perturbations in input data.
Several methods for adversarial defense in NLP have been proposed, catering to different NLP tasks.
This survey aims to review the various methods proposed for adversarial defenses in NLP over the past few years by introducing a novel taxonomy.
arXiv Detail & Related papers (2022-03-12T11:37:17Z) - Learning-based Hybrid Local Search for the Hard-label Textual Attack [53.92227690452377]
We consider a rarely investigated but more rigorous setting, namely hard-label attack, in which the attacker could only access the prediction label.
Based on this observation, we propose a novel hard-label attack, called Learning-based Hybrid Local Search (LHLS) algorithm.
Our LHLS significantly outperforms existing hard-label attacks regarding the attack performance as well as adversary quality.
arXiv Detail & Related papers (2022-01-20T14:16:07Z) - Searching for an Effective Defender: Benchmarking Defense against
Adversarial Word Substitution [83.84968082791444]
Deep neural networks are vulnerable to intentionally crafted adversarial examples.
Various methods have been proposed to defend against adversarial word-substitution attacks for neural NLP models.
arXiv Detail & Related papers (2021-08-29T08:11:36Z) - Adversarial Attack and Defense of Structured Prediction Models [58.49290114755019]
In this paper, we investigate attacks and defenses for structured prediction tasks in NLP.
The structured output of structured prediction models is sensitive to small perturbations in the input.
We propose a novel and unified framework that learns to attack a structured prediction model using a sequence-to-sequence model.
arXiv Detail & Related papers (2020-10-04T15:54:03Z) - Online Alternate Generator against Adversarial Attacks [144.45529828523408]
Deep learning models are notoriously sensitive to adversarial examples which are synthesized by adding quasi-perceptible noises on real images.
We propose a portable defense method, online alternate generator, which does not need to access or modify the parameters of the target networks.
The proposed method works by online synthesizing another image from scratch for an input image, instead of removing or destroying adversarial noises.
arXiv Detail & Related papers (2020-09-17T07:11:16Z) - Defense against Adversarial Attacks in NLP via Dirichlet Neighborhood
Ensemble [163.3333439344695]
Dirichlet Neighborhood Ensemble (DNE) is a randomized smoothing method for training a robust model to defense substitution-based attacks.
DNE forms virtual sentences by sampling embedding vectors for each word in an input sentence from a convex hull spanned by the word and its synonyms, and it augments them with the training data.
We demonstrate through extensive experimentation that our method consistently outperforms recently proposed defense methods by a significant margin across different network architectures and multiple data sets.
arXiv Detail & Related papers (2020-06-20T18:01:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.