Searching for an Effective Defender: Benchmarking Defense against
Adversarial Word Substitution
- URL: http://arxiv.org/abs/2108.12777v1
- Date: Sun, 29 Aug 2021 08:11:36 GMT
- Title: Searching for an Effective Defender: Benchmarking Defense against
Adversarial Word Substitution
- Authors: Zongyi Li, Jianhan Xu, Jiehang Zeng, Linyang Li, Xiaoqing Zheng, Qi
Zhang, Kai-Wei Chang, Cho-Jui Hsieh
- Abstract summary: Deep neural networks are vulnerable to intentionally crafted adversarial examples.
Various methods have been proposed to defend against adversarial word-substitution attacks for neural NLP models.
- Score: 83.84968082791444
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent studies have shown that deep neural networks are vulnerable to
intentionally crafted adversarial examples, and various methods have been
proposed to defend against adversarial word-substitution attacks for neural NLP
models. However, there is a lack of systematic study on comparing different
defense approaches under the same attacking setting. In this paper, we seek to
fill the gap of systematic studies through comprehensive researches on
understanding the behavior of neural text classifiers trained by various
defense methods under representative adversarial attacks. In addition, we
propose an effective method to further improve the robustness of neural text
classifiers against such attacks and achieved the highest accuracy on both
clean and adversarial examples on AGNEWS and IMDB datasets by a significant
margin.
Related papers
- Detecting Adversarial Examples [24.585379549997743]
We propose a novel method to detect adversarial examples by analyzing the layer outputs of Deep Neural Networks.
Our method is highly effective, compatible with any DNN architecture, and applicable across different domains, such as image, video, and audio.
arXiv Detail & Related papers (2024-10-22T21:42:59Z) - MirrorCheck: Efficient Adversarial Defense for Vision-Language Models [55.73581212134293]
We propose a novel, yet elegantly simple approach for detecting adversarial samples in Vision-Language Models.
Our method leverages Text-to-Image (T2I) models to generate images based on captions produced by target VLMs.
Empirical evaluations conducted on different datasets validate the efficacy of our approach.
arXiv Detail & Related papers (2024-06-13T15:55:04Z) - MPAT: Building Robust Deep Neural Networks against Textual Adversarial
Attacks [4.208423642716679]
We propose a malicious perturbation based adversarial training method (MPAT) for building robust deep neural networks against adversarial attacks.
Specifically, we construct a multi-level malicious example generation strategy to generate adversarial examples with malicious perturbations.
We employ a novel training objective function to ensure achieving the defense goal without compromising the performance on the original task.
arXiv Detail & Related papers (2024-02-29T01:49:18Z) - Adversarial Attacks and Defenses in Machine Learning-Powered Networks: A
Contemporary Survey [114.17568992164303]
Adrial attacks and defenses in machine learning and deep neural network have been gaining significant attention.
This survey provides a comprehensive overview of the recent advancements in the field of adversarial attack and defense techniques.
New avenues of attack are also explored, including search-based, decision-based, drop-based, and physical-world attacks.
arXiv Detail & Related papers (2023-03-11T04:19:31Z) - A Survey of Adversarial Defences and Robustness in NLP [26.299507152320494]
It has become increasingly evident that deep neural networks are not resilient enough to withstand adversarial perturbations in input data.
Several methods for adversarial defense in NLP have been proposed, catering to different NLP tasks.
This survey aims to review the various methods proposed for adversarial defenses in NLP over the past few years by introducing a novel taxonomy.
arXiv Detail & Related papers (2022-03-12T11:37:17Z) - A Review of Adversarial Attack and Defense for Classification Methods [78.50824774203495]
This paper focuses on the generation and guarding of adversarial examples.
It is the hope of the authors that this paper will encourage more statisticians to work on this important and exciting field of generating and defending against adversarial examples.
arXiv Detail & Related papers (2021-11-18T22:13:43Z) - Model-Agnostic Meta-Attack: Towards Reliable Evaluation of Adversarial
Robustness [53.094682754683255]
We propose a Model-Agnostic Meta-Attack (MAMA) approach to discover stronger attack algorithms automatically.
Our method learns the in adversarial attacks parameterized by a recurrent neural network.
We develop a model-agnostic training algorithm to improve the ability of the learned when attacking unseen defenses.
arXiv Detail & Related papers (2021-10-13T13:54:24Z) - TREATED:Towards Universal Defense against Textual Adversarial Attacks [28.454310179377302]
We propose TREATED, a universal adversarial detection method that can defend against attacks of various perturbation levels without making any assumptions.
Extensive experiments on three competitive neural networks and two widely used datasets show that our method achieves better detection performance than baselines.
arXiv Detail & Related papers (2021-09-13T03:31:20Z) - Increasing the Confidence of Deep Neural Networks by Coverage Analysis [71.57324258813674]
This paper presents a lightweight monitoring architecture based on coverage paradigms to enhance the model against different unsafe inputs.
Experimental results show that the proposed approach is effective in detecting both powerful adversarial examples and out-of-distribution inputs.
arXiv Detail & Related papers (2021-01-28T16:38:26Z) - Defense of Word-level Adversarial Attacks via Random Substitution
Encoding [0.5964792400314836]
adversarial attacks against deep neural networks on computer vision tasks have spawned many new technologies that help protect models from avoiding false predictions.
Recently, word-level adversarial attacks on deep models of Natural Language Processing (NLP) tasks have also demonstrated strong power, e.g., fooling a sentiment classification neural network to make wrong decisions.
We propose a novel framework called Random Substitution RSE, which introduces a random substitution into the training process of original neural networks.
arXiv Detail & Related papers (2020-05-01T15:28:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.