Robust Deep Learning Models Against Semantic-Preserving Adversarial
Attack
- URL: http://arxiv.org/abs/2304.03955v1
- Date: Sat, 8 Apr 2023 08:28:36 GMT
- Title: Robust Deep Learning Models Against Semantic-Preserving Adversarial
Attack
- Authors: Dashan Gao and Yunce Zhao and Yinghua Yao and Zeqi Zhang and Bifei Mao
and Xin Yao
- Abstract summary: Deep learning models can be fooled by small $l_p$-norm adversarial perturbations and natural perturbations in terms of attributes.
We propose a novel attack mechanism named Semantic-Preserving Adversarial (SPA) attack, which can then be used to enhance adversarial training.
- Score: 3.7264705684737893
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep learning models can be fooled by small $l_p$-norm adversarial
perturbations and natural perturbations in terms of attributes. Although the
robustness against each perturbation has been explored, it remains a challenge
to address the robustness against joint perturbations effectively. In this
paper, we study the robustness of deep learning models against joint
perturbations by proposing a novel attack mechanism named Semantic-Preserving
Adversarial (SPA) attack, which can then be used to enhance adversarial
training. Specifically, we introduce an attribute manipulator to generate
natural and human-comprehensible perturbations and a noise generator to
generate diverse adversarial noises. Based on such combined noises, we optimize
both the attribute value and the diversity variable to generate
jointly-perturbed samples. For robust training, we adversarially train the deep
learning model against the generated joint perturbations. Empirical results on
four benchmarks show that the SPA attack causes a larger performance decline
with small $l_{\infty}$ norm-ball constraints compared to existing approaches.
Furthermore, our SPA-enhanced training outperforms existing defense methods
against such joint perturbations.
Related papers
- Adversarial Fine-tuning of Compressed Neural Networks for Joint Improvement of Robustness and Efficiency [3.3490724063380215]
Adrial training has been presented as a mitigation strategy which can result in more robust models.
We explore the effects of two different model compression methods -- structured weight pruning and quantization -- on adversarial robustness.
We show that adversarial fine-tuning of compressed models can achieve robustness performance comparable to adversarially trained models.
arXiv Detail & Related papers (2024-03-14T14:34:25Z) - Extreme Miscalibration and the Illusion of Adversarial Robustness [66.29268991629085]
Adversarial Training is often used to increase model robustness.
We show that this observed gain in robustness is an illusion of robustness (IOR)
We urge the NLP community to incorporate test-time temperature scaling into their robustness evaluations.
arXiv Detail & Related papers (2024-02-27T13:49:12Z) - Mitigating Feature Gap for Adversarial Robustness by Feature
Disentanglement [61.048842737581865]
Adversarial fine-tuning methods aim to enhance adversarial robustness through fine-tuning the naturally pre-trained model in an adversarial training manner.
We propose a disentanglement-based approach to explicitly model and remove the latent features that cause the feature gap.
Empirical evaluations on three benchmark datasets demonstrate that our approach surpasses existing adversarial fine-tuning methods and adversarial training baselines.
arXiv Detail & Related papers (2024-01-26T08:38:57Z) - Improving Adversarial Robustness to Sensitivity and Invariance Attacks
with Deep Metric Learning [80.21709045433096]
A standard method in adversarial robustness assumes a framework to defend against samples crafted by minimally perturbing a sample.
We use metric learning to frame adversarial regularization as an optimal transport problem.
Our preliminary results indicate that regularizing over invariant perturbations in our framework improves both invariant and sensitivity defense.
arXiv Detail & Related papers (2022-11-04T13:54:02Z) - Resisting Adversarial Attacks in Deep Neural Networks using Diverse
Decision Boundaries [12.312877365123267]
Deep learning systems are vulnerable to crafted adversarial examples, which may be imperceptible to the human eye, but can lead the model to misclassify.
We develop a new ensemble-based solution that constructs defender models with diverse decision boundaries with respect to the original model.
We present extensive experimentations using standard image classification datasets, namely MNIST, CIFAR-10 and CIFAR-100 against state-of-the-art adversarial attacks.
arXiv Detail & Related papers (2022-08-18T08:19:26Z) - Policy Smoothing for Provably Robust Reinforcement Learning [109.90239627115336]
We study the provable robustness of reinforcement learning against norm-bounded adversarial perturbations of the inputs.
We generate certificates that guarantee that the total reward obtained by the smoothed policy will not fall below a certain threshold under a norm-bounded adversarial of perturbation the input.
arXiv Detail & Related papers (2021-06-21T21:42:08Z) - Attribute-Guided Adversarial Training for Robustness to Natural
Perturbations [64.35805267250682]
We propose an adversarial training approach which learns to generate new samples so as to maximize exposure of the classifier to the attributes-space.
Our approach enables deep neural networks to be robust against a wide range of naturally occurring perturbations.
arXiv Detail & Related papers (2020-12-03T10:17:30Z) - Learning to Generate Noise for Multi-Attack Robustness [126.23656251512762]
Adversarial learning has emerged as one of the successful techniques to circumvent the susceptibility of existing methods against adversarial perturbations.
In safety-critical applications, this makes these methods extraneous as the attacker can adopt diverse adversaries to deceive the system.
We propose a novel meta-learning framework that explicitly learns to generate noise to improve the model's robustness against multiple types of attacks.
arXiv Detail & Related papers (2020-06-22T10:44:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.