MPAT: Building Robust Deep Neural Networks against Textual Adversarial
Attacks
- URL: http://arxiv.org/abs/2402.18792v1
- Date: Thu, 29 Feb 2024 01:49:18 GMT
- Title: MPAT: Building Robust Deep Neural Networks against Textual Adversarial
Attacks
- Authors: Fangyuan Zhang, Huichi Zhou, Shuangjiao Li, Hongtao Wang
- Abstract summary: We propose a malicious perturbation based adversarial training method (MPAT) for building robust deep neural networks against adversarial attacks.
Specifically, we construct a multi-level malicious example generation strategy to generate adversarial examples with malicious perturbations.
We employ a novel training objective function to ensure achieving the defense goal without compromising the performance on the original task.
- Score: 4.208423642716679
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep neural networks have been proven to be vulnerable to adversarial
examples and various methods have been proposed to defend against adversarial
attacks for natural language processing tasks. However, previous defense
methods have limitations in maintaining effective defense while ensuring the
performance of the original task. In this paper, we propose a malicious
perturbation based adversarial training method (MPAT) for building robust deep
neural networks against textual adversarial attacks. Specifically, we construct
a multi-level malicious example generation strategy to generate adversarial
examples with malicious perturbations, which are used instead of original
inputs for model training. Additionally, we employ a novel training objective
function to ensure achieving the defense goal without compromising the
performance on the original task. We conduct comprehensive experiments to
evaluate our defense method by attacking five victim models on three benchmark
datasets. The result demonstrates that our method is more effective against
malicious adversarial attacks compared with previous defense methods while
maintaining or further improving the performance on the original task.
Related papers
- Gradient-Free Adversarial Purification with Diffusion Models [10.917491144598575]
Adversarial training and adversarial purification are effective methods to enhance a model's robustness against adversarial attacks.
We propose an effective and efficient adversarial defense method that counters both perturbation-based and unrestricted adversarial attacks.
arXiv Detail & Related papers (2025-01-23T02:34:14Z) - Sustainable Self-evolution Adversarial Training [51.25767996364584]
We propose a Sustainable Self-Evolution Adversarial Training (SSEAT) framework for adversarial training defense models.
We introduce a continual adversarial defense pipeline to realize learning from various kinds of adversarial examples.
We also propose an adversarial data replay module to better select more diverse and key relearning data.
arXiv Detail & Related papers (2024-12-03T08:41:11Z) - GenFighter: A Generative and Evolutive Textual Attack Removal [6.044610337297754]
Adrial attacks pose significant challenges to deep neural networks (DNNs) such as Transformer models in natural language processing (NLP)
This paper introduces a novel defense strategy, called GenFighter, which enhances adversarial robustness by learning and reasoning on the training classification distribution.
We show that GenFighter outperforms state-of-the-art defenses in accuracy under attack and attack success rate metrics.
arXiv Detail & Related papers (2024-04-17T16:32:13Z) - Mutual-modality Adversarial Attack with Semantic Perturbation [81.66172089175346]
We propose a novel approach that generates adversarial attacks in a mutual-modality optimization scheme.
Our approach outperforms state-of-the-art attack methods and can be readily deployed as a plug-and-play solution.
arXiv Detail & Related papers (2023-12-20T05:06:01Z) - Model-Agnostic Meta-Attack: Towards Reliable Evaluation of Adversarial
Robustness [53.094682754683255]
We propose a Model-Agnostic Meta-Attack (MAMA) approach to discover stronger attack algorithms automatically.
Our method learns the in adversarial attacks parameterized by a recurrent neural network.
We develop a model-agnostic training algorithm to improve the ability of the learned when attacking unseen defenses.
arXiv Detail & Related papers (2021-10-13T13:54:24Z) - TREATED:Towards Universal Defense against Textual Adversarial Attacks [28.454310179377302]
We propose TREATED, a universal adversarial detection method that can defend against attacks of various perturbation levels without making any assumptions.
Extensive experiments on three competitive neural networks and two widely used datasets show that our method achieves better detection performance than baselines.
arXiv Detail & Related papers (2021-09-13T03:31:20Z) - Searching for an Effective Defender: Benchmarking Defense against
Adversarial Word Substitution [83.84968082791444]
Deep neural networks are vulnerable to intentionally crafted adversarial examples.
Various methods have been proposed to defend against adversarial word-substitution attacks for neural NLP models.
arXiv Detail & Related papers (2021-08-29T08:11:36Z) - Improving the Adversarial Robustness for Speaker Verification by Self-Supervised Learning [95.60856995067083]
This work is among the first to perform adversarial defense for ASV without knowing the specific attack algorithms.
We propose to perform adversarial defense from two perspectives: 1) adversarial perturbation purification and 2) adversarial perturbation detection.
Experimental results show that our detection module effectively shields the ASV by detecting adversarial samples with an accuracy of around 80%.
arXiv Detail & Related papers (2021-06-01T07:10:54Z) - Robust Tracking against Adversarial Attacks [69.59717023941126]
We first attempt to generate adversarial examples on top of video sequences to improve the tracking robustness against adversarial attacks.
We apply the proposed adversarial attack and defense approaches to state-of-the-art deep tracking algorithms.
arXiv Detail & Related papers (2020-07-20T08:05:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.