Annealing Self-Distillation Rectification Improves Adversarial Training
- URL: http://arxiv.org/abs/2305.12118v2
- Date: Sat, 13 Apr 2024 15:01:14 GMT
- Title: Annealing Self-Distillation Rectification Improves Adversarial Training
- Authors: Yu-Yu Wu, Hung-Jui Wang, Shang-Tse Chen,
- Abstract summary: We analyze the characteristics of robust models and identify that robust models tend to produce smoother and well-calibrated outputs.
We propose Annealing Self-Distillation Rectification, which generates soft labels as a better guidance mechanism.
We demonstrate the efficacy of ADR through extensive experiments and strong performances across datasets.
- Score: 0.10241134756773226
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In standard adversarial training, models are optimized to fit one-hot labels within allowable adversarial perturbation budgets. However, the ignorance of underlying distribution shifts brought by perturbations causes the problem of robust overfitting. To address this issue and enhance adversarial robustness, we analyze the characteristics of robust models and identify that robust models tend to produce smoother and well-calibrated outputs. Based on the observation, we propose a simple yet effective method, Annealing Self-Distillation Rectification (ADR), which generates soft labels as a better guidance mechanism that accurately reflects the distribution shift under attack during adversarial training. By utilizing ADR, we can obtain rectified distributions that significantly improve model robustness without the need for pre-trained models or extensive extra computation. Moreover, our method facilitates seamless plug-and-play integration with other adversarial training techniques by replacing the hard labels in their objectives. We demonstrate the efficacy of ADR through extensive experiments and strong performances across datasets.
Related papers
- Dynamic Guidance Adversarial Distillation with Enhanced Teacher Knowledge [17.382306203152943]
Dynamic Guidance Adversarial Distillation (DGAD) framework tackles the challenge of differential sample importance.
DGAD employs Misclassification-Aware Partitioning (MAP) to dynamically tailor the distillation focus.
Error-corrective Label Swapping (ELS) corrects misclassifications of the teacher on both clean and adversarially perturbed inputs.
arXiv Detail & Related papers (2024-09-03T05:52:37Z) - Dynamic Label Adversarial Training for Deep Learning Robustness Against Adversarial Attacks [11.389689242531327]
Adversarial training is one of the most effective methods for enhancing model robustness.
Previous approaches primarily use static ground truth for adversarial training, but this often causes robust overfitting.
We propose a dynamic label adversarial training (DYNAT) algorithm that enables the target model to gain robustness from the guide model's decisions.
arXiv Detail & Related papers (2024-08-23T14:25:12Z) - Adversarial Robustification via Text-to-Image Diffusion Models [56.37291240867549]
Adrial robustness has been conventionally believed as a challenging property to encode for neural networks.
We develop a scalable and model-agnostic solution to achieve adversarial robustness without using any data.
arXiv Detail & Related papers (2024-07-26T10:49:14Z) - Focus on Hiders: Exploring Hidden Threats for Enhancing Adversarial
Training [20.1991376813843]
We propose a generalized adversarial training algorithm called Hider-Focused Adversarial Training (HFAT)
HFAT combines the optimization directions of standard adversarial training and prevention hiders.
We demonstrate the effectiveness of our method based on extensive experiments.
arXiv Detail & Related papers (2023-12-12T08:41:18Z) - Enhancing Multiple Reliability Measures via Nuisance-extended
Information Bottleneck [77.37409441129995]
In practical scenarios where training data is limited, many predictive signals in the data can be rather from some biases in data acquisition.
We consider an adversarial threat model under a mutual information constraint to cover a wider class of perturbations in training.
We propose an autoencoder-based training to implement the objective, as well as practical encoder designs to facilitate the proposed hybrid discriminative-generative training.
arXiv Detail & Related papers (2023-03-24T16:03:21Z) - Adversarial Fine-tune with Dynamically Regulated Adversary [27.034257769448914]
In many real-world applications such as health diagnosis and autonomous surgical robotics, the standard performance is more valued over model robustness against such extremely malicious attacks.
This work proposes a simple yet effective transfer learning-based adversarial training strategy that disentangles the negative effects of adversarial samples on model's standard performance.
In addition, we introduce a training-friendly adversarial attack algorithm, which facilitates the boost of adversarial robustness without introducing significant training complexity.
arXiv Detail & Related papers (2022-04-28T00:07:15Z) - Revisiting Adversarial Robustness Distillation: Robust Soft Labels Make
Student Better [66.69777970159558]
We propose a novel adversarial robustness distillation method called Robust Soft Label Adversarial Distillation (RSLAD)
RSLAD fully exploits the robust soft labels produced by a robust (adversarially-trained) large teacher model to guide the student's learning.
We empirically demonstrate the effectiveness of our RSLAD approach over existing adversarial training and distillation methods.
arXiv Detail & Related papers (2021-08-18T04:32:35Z) - Robust Pre-Training by Adversarial Contrastive Learning [120.33706897927391]
Recent work has shown that, when integrated with adversarial training, self-supervised pre-training can lead to state-of-the-art robustness.
We improve robustness-aware self-supervised pre-training by learning representations consistent under both data augmentations and adversarial perturbations.
arXiv Detail & Related papers (2020-10-26T04:44:43Z) - DVERGE: Diversifying Vulnerabilities for Enhanced Robust Generation of
Ensembles [20.46399318111058]
Adversarial attacks can mislead CNN models with small perturbations, which can effectively transfer between different models trained on the same dataset.
We propose DVERGE, which isolates the adversarial vulnerability in each sub-model by distilling non-robust features.
The novel diversity metric and training procedure enables DVERGE to achieve higher robustness against transfer attacks.
arXiv Detail & Related papers (2020-09-30T14:57:35Z) - Adversarial Robustness on In- and Out-Distribution Improves
Explainability [109.68938066821246]
RATIO is a training procedure for robustness via Adversarial Training on In- and Out-distribution.
RATIO achieves state-of-the-art $l$-adrial on CIFAR10 and maintains better clean accuracy.
arXiv Detail & Related papers (2020-03-20T18:57:52Z) - Adversarial Distributional Training for Robust Deep Learning [53.300984501078126]
Adversarial training (AT) is among the most effective techniques to improve model robustness by augmenting training data with adversarial examples.
Most existing AT methods adopt a specific attack to craft adversarial examples, leading to the unreliable robustness against other unseen attacks.
In this paper, we introduce adversarial distributional training (ADT), a novel framework for learning robust models.
arXiv Detail & Related papers (2020-02-14T12:36:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.