AFD: Mitigating Feature Gap for Adversarial Robustness by Feature Disentanglement
- URL: http://arxiv.org/abs/2401.14707v2
- Date: Tue, 10 Dec 2024 16:28:07 GMT
- Title: AFD: Mitigating Feature Gap for Adversarial Robustness by Feature Disentanglement
- Authors: Nuoyan Zhou, Dawei Zhou, Decheng Liu, Nannan Wang, Xinbo Gao,
- Abstract summary: Adversarial fine-tuning methods enhance adversarial robustness via fine-tuning the pre-trained model in an adversarial training manner.
We propose a disentanglement-based approach to explicitly model and remove the specific latent features.
Our approach surpasses existing adversarial fine-tuning methods and adversarial training baselines.
- Score: 56.90364259986057
- License:
- Abstract: Adversarial fine-tuning methods enhance adversarial robustness via fine-tuning the pre-trained model in an adversarial training manner. However, we identify that some specific latent features of adversarial samples are confused by adversarial perturbation and lead to an unexpectedly increasing gap between features in the last hidden layer of natural and adversarial samples. To address this issue, we propose a disentanglement-based approach to explicitly model and further remove the specific latent features. We introduce a feature disentangler to separate out the specific latent features from the features of the adversarial samples, thereby boosting robustness by eliminating the specific latent features. Besides, we align clean features in the pre-trained model with features of adversarial samples in the fine-tuned model, to benefit from the intrinsic features of natural samples. Empirical evaluations on three benchmark datasets demonstrate that our approach surpasses existing adversarial fine-tuning methods and adversarial training baselines.
Related papers
- Enhancing Adversarial Robustness via Uncertainty-Aware Distributional Adversarial Training [43.766504246864045]
We propose a novel uncertainty-aware distributional adversarial training method.
Our approach achieves state-of-the-art adversarial robustness and maintains natural performance.
arXiv Detail & Related papers (2024-11-05T07:26:24Z) - Improving Adversarial Robustness via Feature Pattern Consistency Constraint [42.50500608175905]
Convolutional Neural Networks (CNNs) are well-known for their vulnerability to adversarial attacks, posing significant security concerns.
Most existing methods either focus on learning from adversarial perturbations, leading to overfitting to the adversarial examples, or aim to eliminate such perturbations during inference.
We introduce a novel and effective Feature Pattern Consistency Constraint (FPCC) method to reinforce the latent feature's capacity to maintain the correct feature pattern.
arXiv Detail & Related papers (2024-06-13T05:38:30Z) - The Risk of Federated Learning to Skew Fine-Tuning Features and
Underperform Out-of-Distribution Robustness [50.52507648690234]
Federated learning has the risk of skewing fine-tuning features and compromising the robustness of the model.
We introduce three robustness indicators and conduct experiments across diverse robust datasets.
Our approach markedly enhances the robustness across diverse scenarios, encompassing various parameter-efficient fine-tuning methods.
arXiv Detail & Related papers (2024-01-25T09:18:51Z) - Enhancing Robust Representation in Adversarial Training: Alignment and
Exclusion Criteria [61.048842737581865]
We show that Adversarial Training (AT) omits to learning robust features, resulting in poor performance of adversarial robustness.
We propose a generic framework of AT to gain robust representation, by the asymmetric negative contrast and reverse attention.
Empirical evaluations on three benchmark datasets show our methods greatly advance the robustness of AT and achieve state-of-the-art performance.
arXiv Detail & Related papers (2023-10-05T07:29:29Z) - Exploring Robust Features for Improving Adversarial Robustness [11.935612873688122]
We explore the robust features which are not affected by the adversarial perturbations to improve the model's adversarial robustness.
Specifically, we propose a feature disentanglement model to segregate the robust features from non-robust features and domain specific features.
The trained domain discriminator is able to identify the domain specific features from the clean images and adversarial examples almost perfectly.
arXiv Detail & Related papers (2023-09-09T00:30:04Z) - Using Positive Matching Contrastive Loss with Facial Action Units to
mitigate bias in Facial Expression Recognition [6.015556590955814]
We propose to mitigate bias by guiding the model's focus towards task-relevant features using domain knowledge.
We show that incorporating task-relevant features via our method can improve model fairness at minimal cost to classification performance.
arXiv Detail & Related papers (2023-03-08T21:28:02Z) - Improving Adversarial Robustness to Sensitivity and Invariance Attacks
with Deep Metric Learning [80.21709045433096]
A standard method in adversarial robustness assumes a framework to defend against samples crafted by minimally perturbing a sample.
We use metric learning to frame adversarial regularization as an optimal transport problem.
Our preliminary results indicate that regularizing over invariant perturbations in our framework improves both invariant and sensitivity defense.
arXiv Detail & Related papers (2022-11-04T13:54:02Z) - Robust Transferable Feature Extractors: Learning to Defend Pre-Trained
Networks Against White Box Adversaries [69.53730499849023]
We show that adversarial examples can be successfully transferred to another independently trained model to induce prediction errors.
We propose a deep learning-based pre-processing mechanism, which we refer to as a robust transferable feature extractor (RTFE)
arXiv Detail & Related papers (2022-09-14T21:09:34Z) - Distilling Robust and Non-Robust Features in Adversarial Examples by
Information Bottleneck [33.18197518590706]
We propose a way of explicitly distilling feature representation into the robust and non-robust features, using Information Bottleneck.
We demonstrate that the distilled features are highly correlated with adversarial prediction, and they have human-perceptible semantic information by themselves.
We present an attack mechanism intensifying the gradient of non-robust features that is directly related to the model prediction, and validate its effectiveness of breaking model robustness.
arXiv Detail & Related papers (2022-04-06T11:22:46Z) - Fundamental Tradeoffs between Invariance and Sensitivity to Adversarial
Perturbations [65.05561023880351]
Adversarial examples are malicious inputs crafted to induce misclassification.
This paper studies a complementary failure mode, invariance-based adversarial examples.
We show that defenses against sensitivity-based attacks actively harm a model's accuracy on invariance-based attacks.
arXiv Detail & Related papers (2020-02-11T18:50:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.