Related papers: Searching for an Effective Defender: Benchmarking Defense against Adversarial Word Substitution

Searching for an Effective Defender: Benchmarking Defense against Adversarial Word Substitution

URL: http://arxiv.org/abs/2108.12777v1
Date: Sun, 29 Aug 2021 08:11:36 GMT
Title: Searching for an Effective Defender: Benchmarking Defense against Adversarial Word Substitution
Authors: Zongyi Li, Jianhan Xu, Jiehang Zeng, Linyang Li, Xiaoqing Zheng, Qi Zhang, Kai-Wei Chang, Cho-Jui Hsieh
Abstract summary: Deep neural networks are vulnerable to intentionally crafted adversarial examples. Various methods have been proposed to defend against adversarial word-substitution attacks for neural NLP models.
Score: 83.84968082791444
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent studies have shown that deep neural networks are vulnerable to intentionally crafted adversarial examples, and various methods have been proposed to defend against adversarial word-substitution attacks for neural NLP models. However, there is a lack of systematic study on comparing different defense approaches under the same attacking setting. In this paper, we seek to fill the gap of systematic studies through comprehensive researches on understanding the behavior of neural text classifiers trained by various defense methods under representative adversarial attacks. In addition, we propose an effective method to further improve the robustness of neural text classifiers against such attacks and achieved the highest accuracy on both clean and adversarial examples on AGNEWS and IMDB datasets by a significant margin.

Related papers

Toward Spiking Neural Network Local Learning Modules Resistant to Adversarial Attacks [2.3312335998006306]
Recent research has shown the vulnerability of Spiking Neural Networks (SNNs) under adversarial examples. We introduce a hybrid adversarial attack paradigm that leverages the transferability of adversarial instances. The proposed hybrid approach demonstrates superior performance, outperforming existing adversarial attack methods.
arXiv Detail & Related papers (2025-04-11T18:07:59Z)
Sustainable Self-evolution Adversarial Training [51.25767996364584]
We propose a Sustainable Self-Evolution Adversarial Training (SSEAT) framework for adversarial training defense models. We introduce a continual adversarial defense pipeline to realize learning from various kinds of adversarial examples. We also propose an adversarial data replay module to better select more diverse and key relearning data.
arXiv Detail & Related papers (2024-12-03T08:41:11Z)
Detecting Adversarial Examples [24.585379549997743]
We propose a novel method to detect adversarial examples by analyzing the layer outputs of Deep Neural Networks. Our method is highly effective, compatible with any DNN architecture, and applicable across different domains, such as image, video, and audio.
arXiv Detail & Related papers (2024-10-22T21:42:59Z)
MirrorCheck: Efficient Adversarial Defense for Vision-Language Models [55.73581212134293]
We propose a novel, yet elegantly simple approach for detecting adversarial samples in Vision-Language Models. Our method leverages Text-to-Image (T2I) models to generate images based on captions produced by target VLMs. Empirical evaluations conducted on different datasets validate the efficacy of our approach.
arXiv Detail & Related papers (2024-06-13T15:55:04Z)
MPAT: Building Robust Deep Neural Networks against Textual Adversarial Attacks [4.208423642716679]
We propose a malicious perturbation based adversarial training method (MPAT) for building robust deep neural networks against adversarial attacks. Specifically, we construct a multi-level malicious example generation strategy to generate adversarial examples with malicious perturbations. We employ a novel training objective function to ensure achieving the defense goal without compromising the performance on the original task.
arXiv Detail & Related papers (2024-02-29T01:49:18Z)
Adversarial Attacks and Defenses in Machine Learning-Powered Networks: A Contemporary Survey [114.17568992164303]
Adrial attacks and defenses in machine learning and deep neural network have been gaining significant attention. This survey provides a comprehensive overview of the recent advancements in the field of adversarial attack and defense techniques. New avenues of attack are also explored, including search-based, decision-based, drop-based, and physical-world attacks.
arXiv Detail & Related papers (2023-03-11T04:19:31Z)
A Survey of Adversarial Defences and Robustness in NLP [26.299507152320494]
It has become increasingly evident that deep neural networks are not resilient enough to withstand adversarial perturbations in input data. Several methods for adversarial defense in NLP have been proposed, catering to different NLP tasks. This survey aims to review the various methods proposed for adversarial defenses in NLP over the past few years by introducing a novel taxonomy.
arXiv Detail & Related papers (2022-03-12T11:37:17Z)
A Review of Adversarial Attack and Defense for Classification Methods [78.50824774203495]
This paper focuses on the generation and guarding of adversarial examples. It is the hope of the authors that this paper will encourage more statisticians to work on this important and exciting field of generating and defending against adversarial examples.
arXiv Detail & Related papers (2021-11-18T22:13:43Z)
Model-Agnostic Meta-Attack: Towards Reliable Evaluation of Adversarial Robustness [53.094682754683255]
We propose a Model-Agnostic Meta-Attack (MAMA) approach to discover stronger attack algorithms automatically. Our method learns the in adversarial attacks parameterized by a recurrent neural network. We develop a model-agnostic training algorithm to improve the ability of the learned when attacking unseen defenses.
arXiv Detail & Related papers (2021-10-13T13:54:24Z)
TREATED:Towards Universal Defense against Textual Adversarial Attacks [28.454310179377302]
We propose TREATED, a universal adversarial detection method that can defend against attacks of various perturbation levels without making any assumptions. Extensive experiments on three competitive neural networks and two widely used datasets show that our method achieves better detection performance than baselines.
arXiv Detail & Related papers (2021-09-13T03:31:20Z)
Increasing the Confidence of Deep Neural Networks by Coverage Analysis [71.57324258813674]
This paper presents a lightweight monitoring architecture based on coverage paradigms to enhance the model against different unsafe inputs. Experimental results show that the proposed approach is effective in detecting both powerful adversarial examples and out-of-distribution inputs.
arXiv Detail & Related papers (2021-01-28T16:38:26Z)
Defense of Word-level Adversarial Attacks via Random Substitution Encoding [0.5964792400314836]
adversarial attacks against deep neural networks on computer vision tasks have spawned many new technologies that help protect models from avoiding false predictions. Recently, word-level adversarial attacks on deep models of Natural Language Processing (NLP) tasks have also demonstrated strong power, e.g., fooling a sentiment classification neural network to make wrong decisions. We propose a novel framework called Random Substitution RSE, which introduces a random substitution into the training process of original neural networks.
arXiv Detail & Related papers (2020-05-01T15:28:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.