Related papers: Vanilla Feature Distillation for Improving the Accuracy-Robustness Trade-Off in Adversarial Training

Vanilla Feature Distillation for Improving the Accuracy-Robustness Trade-Off in Adversarial Training

URL: http://arxiv.org/abs/2206.02158v1
Date: Sun, 5 Jun 2022 11:57:10 GMT
Title: Vanilla Feature Distillation for Improving the Accuracy-Robustness Trade-Off in Adversarial Training
Authors: Guodong Cao, Zhibo Wang, Xiaowei Dong, Zhifei Zhang, Hengchang Guo, Zhan Qin, Kui Ren
Abstract summary: We propose a Vanilla Feature Distillation Adversarial Training (VFD-Adv) to guide adversarial training towards higher accuracy. A key advantage of our method is that it can be universally adapted to and boost existing works.
Score: 37.5115141623558
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Adversarial training has been widely explored for mitigating attacks against deep models. However, most existing works are still trapped in the dilemma between higher accuracy and stronger robustness since they tend to fit a model towards robust features (not easily tampered with by adversaries) while ignoring those non-robust but highly predictive features. To achieve a better robustness-accuracy trade-off, we propose the Vanilla Feature Distillation Adversarial Training (VFD-Adv), which conducts knowledge distillation from a pre-trained model (optimized towards high accuracy) to guide adversarial training towards higher accuracy, i.e., preserving those non-robust but predictive features. More specifically, both adversarial examples and their clean counterparts are forced to be aligned in the feature space by distilling predictive representations from the pre-trained/clean model, while previous works barely utilize predictive features from clean models. Therefore, the adversarial training model is updated towards maximally preserving the accuracy as gaining robustness. A key advantage of our method is that it can be universally adapted to and boost existing works. Exhaustive experiments on various datasets, classification models, and adversarial training algorithms demonstrate the effectiveness of our proposed method.

Related papers

Dynamic Label Adversarial Training for Deep Learning Robustness Against Adversarial Attacks [11.389689242531327]
Adversarial training is one of the most effective methods for enhancing model robustness. Previous approaches primarily use static ground truth for adversarial training, but this often causes robust overfitting. We propose a dynamic label adversarial training (DYNAT) algorithm that enables the target model to gain robustness from the guide model's decisions.
arXiv Detail & Related papers (2024-08-23T14:25:12Z)
Selective Learning: Towards Robust Calibration with Dynamic Regularization [79.92633587914659]
Miscalibration in deep learning refers to there is a discrepancy between the predicted confidence and performance. We introduce Dynamic Regularization (DReg) which aims to learn what should be learned during training thereby circumventing the confidence adjusting trade-off.
arXiv Detail & Related papers (2024-02-13T11:25:20Z)
Pre-trained Model Guided Fine-Tuning for Zero-Shot Adversarial Robustness [52.9493817508055]
We propose Pre-trained Model Guided Adversarial Fine-Tuning (PMG-AFT) to enhance the model's zero-shot adversarial robustness. Our approach consistently improves clean accuracy by an average of 8.72%.
arXiv Detail & Related papers (2024-01-09T04:33:03Z)
Focus on Hiders: Exploring Hidden Threats for Enhancing Adversarial Training [20.1991376813843]
We propose a generalized adversarial training algorithm called Hider-Focused Adversarial Training (HFAT) HFAT combines the optimization directions of standard adversarial training and prevention hiders. We demonstrate the effectiveness of our method based on extensive experiments.
arXiv Detail & Related papers (2023-12-12T08:41:18Z)
Learn from the Past: A Proxy Guided Adversarial Defense Framework with Self Distillation Regularization [53.04697800214848]
Adversarial Training (AT) is pivotal in fortifying the robustness of deep learning models. AT methods, relying on direct iterative updates for target model's defense, frequently encounter obstacles such as unstable training and catastrophic overfitting. We present a general proxy guided defense framework, LAST' (bf Learn from the Pbf ast)
arXiv Detail & Related papers (2023-10-19T13:13:41Z)
Robust Pre-Training by Adversarial Contrastive Learning [120.33706897927391]
Recent work has shown that, when integrated with adversarial training, self-supervised pre-training can lead to state-of-the-art robustness. We improve robustness-aware self-supervised pre-training by learning representations consistent under both data augmentations and adversarial perturbations.
arXiv Detail & Related papers (2020-10-26T04:44:43Z)
Adversarial Robustness: From Self-Supervised Pre-Training to Fine-Tuning [134.15174177472807]
We introduce adversarial training into self-supervision, to provide general-purpose robust pre-trained models for the first time. We conduct extensive experiments to demonstrate that the proposed framework achieves large performance margins.
arXiv Detail & Related papers (2020-03-28T18:28:33Z)
Adversarial Robustness on In- and Out-Distribution Improves Explainability [109.68938066821246]
RATIO is a training procedure for robustness via Adversarial Training on In- and Out-distribution. RATIO achieves state-of-the-art $l$-adrial on CIFAR10 and maintains better clean accuracy.
arXiv Detail & Related papers (2020-03-20T18:57:52Z)
Revisiting Ensembles in an Adversarial Context: Improving Natural Accuracy [5.482532589225552]
There is still a significant gap in natural accuracy between robust and non-robust models. We consider a number of ensemble methods designed to mitigate this performance difference. We consider two schemes, one that combines predictions from several randomly robust models, and the other that fuses features from robust and standard models.
arXiv Detail & Related papers (2020-02-26T15:45:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.