Mitigating Feature Gap for Adversarial Robustness by Feature
Disentanglement
- URL: http://arxiv.org/abs/2401.14707v1
- Date: Fri, 26 Jan 2024 08:38:57 GMT
- Title: Mitigating Feature Gap for Adversarial Robustness by Feature
Disentanglement
- Authors: Nuoyan Zhou, Dawei Zhou, Decheng Liu, Xinbo Gao, Nannan Wang
- Abstract summary: Adversarial fine-tuning methods aim to enhance adversarial robustness through fine-tuning the naturally pre-trained model in an adversarial training manner.
We propose a disentanglement-based approach to explicitly model and remove the latent features that cause the feature gap.
Empirical evaluations on three benchmark datasets demonstrate that our approach surpasses existing adversarial fine-tuning methods and adversarial training baselines.
- Score: 61.048842737581865
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep neural networks are vulnerable to adversarial samples. Adversarial
fine-tuning methods aim to enhance adversarial robustness through fine-tuning
the naturally pre-trained model in an adversarial training manner. However, we
identify that some latent features of adversarial samples are confused by
adversarial perturbation and lead to an unexpectedly increasing gap between
features in the last hidden layer of natural and adversarial samples. To
address this issue, we propose a disentanglement-based approach to explicitly
model and further remove the latent features that cause the feature gap.
Specifically, we introduce a feature disentangler to separate out the latent
features from the features of the adversarial samples, thereby boosting
robustness by eliminating the latent features. Besides, we align features in
the pre-trained model with features of adversarial samples in the fine-tuned
model, to further benefit from the features from natural samples without
confusion. Empirical evaluations on three benchmark datasets demonstrate that
our approach surpasses existing adversarial fine-tuning methods and adversarial
training baselines.
Related papers
- Enhancing Adversarial Robustness via Uncertainty-Aware Distributional Adversarial Training [43.766504246864045]
We propose a novel uncertainty-aware distributional adversarial training method.
Our approach achieves state-of-the-art adversarial robustness and maintains natural performance.
arXiv Detail & Related papers (2024-11-05T07:26:24Z) - Improving Adversarial Robustness via Feature Pattern Consistency Constraint [42.50500608175905]
Convolutional Neural Networks (CNNs) are well-known for their vulnerability to adversarial attacks, posing significant security concerns.
Most existing methods either focus on learning from adversarial perturbations, leading to overfitting to the adversarial examples, or aim to eliminate such perturbations during inference.
We introduce a novel and effective Feature Pattern Consistency Constraint (FPCC) method to reinforce the latent feature's capacity to maintain the correct feature pattern.
arXiv Detail & Related papers (2024-06-13T05:38:30Z) - The Risk of Federated Learning to Skew Fine-Tuning Features and
Underperform Out-of-Distribution Robustness [50.52507648690234]
Federated learning has the risk of skewing fine-tuning features and compromising the robustness of the model.
We introduce three robustness indicators and conduct experiments across diverse robust datasets.
Our approach markedly enhances the robustness across diverse scenarios, encompassing various parameter-efficient fine-tuning methods.
arXiv Detail & Related papers (2024-01-25T09:18:51Z) - Enhancing Robust Representation in Adversarial Training: Alignment and
Exclusion Criteria [61.048842737581865]
We show that Adversarial Training (AT) omits to learning robust features, resulting in poor performance of adversarial robustness.
We propose a generic framework of AT to gain robust representation, by the asymmetric negative contrast and reverse attention.
Empirical evaluations on three benchmark datasets show our methods greatly advance the robustness of AT and achieve state-of-the-art performance.
arXiv Detail & Related papers (2023-10-05T07:29:29Z) - Exploring Robust Features for Improving Adversarial Robustness [11.935612873688122]
We explore the robust features which are not affected by the adversarial perturbations to improve the model's adversarial robustness.
Specifically, we propose a feature disentanglement model to segregate the robust features from non-robust features and domain specific features.
The trained domain discriminator is able to identify the domain specific features from the clean images and adversarial examples almost perfectly.
arXiv Detail & Related papers (2023-09-09T00:30:04Z) - Improving Adversarial Robustness to Sensitivity and Invariance Attacks
with Deep Metric Learning [80.21709045433096]
A standard method in adversarial robustness assumes a framework to defend against samples crafted by minimally perturbing a sample.
We use metric learning to frame adversarial regularization as an optimal transport problem.
Our preliminary results indicate that regularizing over invariant perturbations in our framework improves both invariant and sensitivity defense.
arXiv Detail & Related papers (2022-11-04T13:54:02Z) - Robust Transferable Feature Extractors: Learning to Defend Pre-Trained
Networks Against White Box Adversaries [69.53730499849023]
We show that adversarial examples can be successfully transferred to another independently trained model to induce prediction errors.
We propose a deep learning-based pre-processing mechanism, which we refer to as a robust transferable feature extractor (RTFE)
arXiv Detail & Related papers (2022-09-14T21:09:34Z) - Real-centric Consistency Learning for Deepfake Detection [8.313889744011933]
We tackle the deepfake detection problem through learning the invariant representations of both classes.
We propose a novel forgery semantical-based pairing strategy to mine latent generation-related features.
At the feature level, based on the centers of natural faces at the representation space, we design a hard positive mining and synthesizing method to simulate the potential marginal features.
arXiv Detail & Related papers (2022-05-15T07:01:28Z) - Learning to Separate Clusters of Adversarial Representations for Robust
Adversarial Detection [50.03939695025513]
We propose a new probabilistic adversarial detector motivated by a recently introduced non-robust feature.
In this paper, we consider the non-robust features as a common property of adversarial examples, and we deduce it is possible to find a cluster in representation space corresponding to the property.
This idea leads us to probability estimate distribution of adversarial representations in a separate cluster, and leverage the distribution for a likelihood based adversarial detector.
arXiv Detail & Related papers (2020-12-07T07:21:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.