Robust Deep Learning Models Against Semantic-Preserving Adversarial
Attack
- URL: http://arxiv.org/abs/2304.03955v1
- Date: Sat, 8 Apr 2023 08:28:36 GMT
- Title: Robust Deep Learning Models Against Semantic-Preserving Adversarial
Attack
- Authors: Dashan Gao and Yunce Zhao and Yinghua Yao and Zeqi Zhang and Bifei Mao
and Xin Yao
- Abstract summary: Deep learning models can be fooled by small $l_p$-norm adversarial perturbations and natural perturbations in terms of attributes.
We propose a novel attack mechanism named Semantic-Preserving Adversarial (SPA) attack, which can then be used to enhance adversarial training.
- Score: 3.7264705684737893
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep learning models can be fooled by small $l_p$-norm adversarial
perturbations and natural perturbations in terms of attributes. Although the
robustness against each perturbation has been explored, it remains a challenge
to address the robustness against joint perturbations effectively. In this
paper, we study the robustness of deep learning models against joint
perturbations by proposing a novel attack mechanism named Semantic-Preserving
Adversarial (SPA) attack, which can then be used to enhance adversarial
training. Specifically, we introduce an attribute manipulator to generate
natural and human-comprehensible perturbations and a noise generator to
generate diverse adversarial noises. Based on such combined noises, we optimize
both the attribute value and the diversity variable to generate
jointly-perturbed samples. For robust training, we adversarially train the deep
learning model against the generated joint perturbations. Empirical results on
four benchmarks show that the SPA attack causes a larger performance decline
with small $l_{\infty}$ norm-ball constraints compared to existing approaches.
Furthermore, our SPA-enhanced training outperforms existing defense methods
against such joint perturbations.
Related papers
- DAT: Improving Adversarial Robustness via Generative Amplitude Mix-up in Frequency Domain [23.678658814438855]
adversarial training (AT) is developed to protect deep neural networks (DNNs) from adversarial attacks.
Recent studies show that adversarial attacks disproportionately impact the patterns within the phase of the sample's frequency spectrum.
We propose an optimized Adversarial Amplitude Generator (AAG) to achieve a better tradeoff between improving the model's robustness and retaining phase patterns.
arXiv Detail & Related papers (2024-10-16T07:18:36Z) - Robust VAEs via Generating Process of Noise Augmented Data [9.366139389037489]
This paper introduces a novel framework that enhances robustness by regularizing the latent space divergence between original and noise-augmented data.
Our empirical evaluations demonstrate that this approach, termed Robust Augmented Variational Auto-ENcoder (RAVEN), yields superior performance in resisting adversarial inputs.
arXiv Detail & Related papers (2024-07-26T09:55:34Z) - Extreme Miscalibration and the Illusion of Adversarial Robustness [66.29268991629085]
Adversarial Training is often used to increase model robustness.
We show that this observed gain in robustness is an illusion of robustness (IOR)
We urge the NLP community to incorporate test-time temperature scaling into their robustness evaluations.
arXiv Detail & Related papers (2024-02-27T13:49:12Z) - Mitigating Feature Gap for Adversarial Robustness by Feature
Disentanglement [61.048842737581865]
Adversarial fine-tuning methods aim to enhance adversarial robustness through fine-tuning the naturally pre-trained model in an adversarial training manner.
We propose a disentanglement-based approach to explicitly model and remove the latent features that cause the feature gap.
Empirical evaluations on three benchmark datasets demonstrate that our approach surpasses existing adversarial fine-tuning methods and adversarial training baselines.
arXiv Detail & Related papers (2024-01-26T08:38:57Z) - Improving Adversarial Robustness to Sensitivity and Invariance Attacks
with Deep Metric Learning [80.21709045433096]
A standard method in adversarial robustness assumes a framework to defend against samples crafted by minimally perturbing a sample.
We use metric learning to frame adversarial regularization as an optimal transport problem.
Our preliminary results indicate that regularizing over invariant perturbations in our framework improves both invariant and sensitivity defense.
arXiv Detail & Related papers (2022-11-04T13:54:02Z) - Resisting Adversarial Attacks in Deep Neural Networks using Diverse
Decision Boundaries [12.312877365123267]
Deep learning systems are vulnerable to crafted adversarial examples, which may be imperceptible to the human eye, but can lead the model to misclassify.
We develop a new ensemble-based solution that constructs defender models with diverse decision boundaries with respect to the original model.
We present extensive experimentations using standard image classification datasets, namely MNIST, CIFAR-10 and CIFAR-100 against state-of-the-art adversarial attacks.
arXiv Detail & Related papers (2022-08-18T08:19:26Z) - Policy Smoothing for Provably Robust Reinforcement Learning [109.90239627115336]
We study the provable robustness of reinforcement learning against norm-bounded adversarial perturbations of the inputs.
We generate certificates that guarantee that the total reward obtained by the smoothed policy will not fall below a certain threshold under a norm-bounded adversarial of perturbation the input.
arXiv Detail & Related papers (2021-06-21T21:42:08Z) - Attribute-Guided Adversarial Training for Robustness to Natural
Perturbations [64.35805267250682]
We propose an adversarial training approach which learns to generate new samples so as to maximize exposure of the classifier to the attributes-space.
Our approach enables deep neural networks to be robust against a wide range of naturally occurring perturbations.
arXiv Detail & Related papers (2020-12-03T10:17:30Z) - Learning to Generate Noise for Multi-Attack Robustness [126.23656251512762]
Adversarial learning has emerged as one of the successful techniques to circumvent the susceptibility of existing methods against adversarial perturbations.
In safety-critical applications, this makes these methods extraneous as the attacker can adopt diverse adversaries to deceive the system.
We propose a novel meta-learning framework that explicitly learns to generate noise to improve the model's robustness against multiple types of attacks.
arXiv Detail & Related papers (2020-06-22T10:44:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.