LTD: Low Temperature Distillation for Robust Adversarial Training
- URL: http://arxiv.org/abs/2111.02331v3
- Date: Fri, 30 Jun 2023 06:56:18 GMT
- Title: LTD: Low Temperature Distillation for Robust Adversarial Training
- Authors: Erh-Chung Chen, Che-Rung Lee
- Abstract summary: Adversarial training has been widely used to enhance the robustness of neural network models against adversarial attacks.
Despite the popularity of neural network models, a significant gap exists between the natural and robust accuracy of these models.
We propose a novel method called Low Temperature Distillation (LTD) that generates soft labels using the modified knowledge distillation framework.
- Score: 1.3300217947936062
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Adversarial training has been widely used to enhance the robustness of neural
network models against adversarial attacks. Despite the popularity of neural
network models, a significant gap exists between the natural and robust
accuracy of these models. In this paper, we identify one of the primary reasons
for this gap is the common use of one-hot vectors as labels, which hinders the
learning process for image recognition. Representing ambiguous images with
one-hot vectors is imprecise and may lead the model to suboptimal solutions. To
overcome this issue, we propose a novel method called Low Temperature
Distillation (LTD) that generates soft labels using the modified knowledge
distillation framework. Unlike previous approaches, LTD uses a relatively low
temperature in the teacher model and fixed, but different temperatures for the
teacher and student models. This modification boosts the model's robustness
without encountering the gradient masking problem that has been addressed in
defensive distillation. The experimental results demonstrate the effectiveness
of the proposed LTD method combined with previous techniques, achieving robust
accuracy rates of 58.19%, 31.13%, and 42.08% on CIFAR-10, CIFAR-100, and
ImageNet data sets, respectively, without additional unlabeled data.
Related papers
- Optimizing YOLOv5s Object Detection through Knowledge Distillation algorithm [37.37311465537091]
This paper explores the application of knowledge distillation technology in target detection tasks.
By using YOLOv5l as the teacher network and a smaller YOLOv5s as the student network, we found that with the increase of distillation temperature, the student's detection accuracy gradually improved.
arXiv Detail & Related papers (2024-10-16T05:58:08Z) - What is Left After Distillation? How Knowledge Transfer Impacts Fairness and Bias [1.03590082373586]
As many as 41% of the classes are statistically significantly affected by distillation when comparing class-wise accuracy.
This study highlights the uneven effects of Knowledge Distillation on certain classes and its potentially significant role in fairness.
arXiv Detail & Related papers (2024-10-10T22:43:00Z) - One Step Diffusion-based Super-Resolution with Time-Aware Distillation [60.262651082672235]
Diffusion-based image super-resolution (SR) methods have shown promise in reconstructing high-resolution images with fine details from low-resolution counterparts.
Recent techniques have been devised to enhance the sampling efficiency of diffusion-based SR models via knowledge distillation.
We propose a time-aware diffusion distillation method, named TAD-SR, to accomplish effective and efficient image super-resolution.
arXiv Detail & Related papers (2024-08-14T11:47:22Z) - Guided Score identity Distillation for Data-Free One-Step Text-to-Image Generation [62.30570286073223]
Diffusion-based text-to-image generation models have demonstrated the ability to produce images aligned with textual descriptions.
We introduce a data-free guided distillation method that enables the efficient distillation of pretrained Diffusion models without access to the real training data.
By exclusively training with synthetic images generated by its one-step generator, our data-free distillation method rapidly improves FID and CLIP scores, achieving state-of-the-art FID performance while maintaining a competitive CLIP score.
arXiv Detail & Related papers (2024-06-03T17:44:11Z) - Reducing Spatial Fitting Error in Distillation of Denoising Diffusion
Models [13.364271265023953]
Knowledge distillation for diffusion models is an effective method to address this limitation with a shortened sampling process.
We attribute the degradation to the spatial fitting error occurring in the training of both the teacher and student model.
SFERD utilizes attention guidance from the teacher model and a designed semantic gradient predictor to reduce the student's fitting error.
We achieve an FID of 5.31 on CIFAR-10 and 9.39 on ImageNet 64$times$64 with only one step, outperforming existing diffusion methods.
arXiv Detail & Related papers (2023-11-07T09:19:28Z) - BOOT: Data-free Distillation of Denoising Diffusion Models with
Bootstrapping [64.54271680071373]
Diffusion models have demonstrated excellent potential for generating diverse images.
Knowledge distillation has been recently proposed as a remedy that can reduce the number of inference steps to one or a few.
We present a novel technique called BOOT, that overcomes limitations with an efficient data-free distillation algorithm.
arXiv Detail & Related papers (2023-06-08T20:30:55Z) - (Certified!!) Adversarial Robustness for Free! [116.6052628829344]
We certify 71% accuracy on ImageNet under adversarial perturbations constrained to be within a 2-norm of 0.5.
We obtain these results using only pretrained diffusion models and image classifiers, without requiring any fine tuning or retraining of model parameters.
arXiv Detail & Related papers (2022-06-21T17:27:27Z) - Churn Reduction via Distillation [54.5952282395487]
We show an equivalence between training with distillation using the base model as the teacher and training with an explicit constraint on the predictive churn.
We then show that distillation performs strongly for low churn training against a number of recent baselines.
arXiv Detail & Related papers (2021-06-04T18:03:31Z) - Contrastive Model Inversion for Data-Free Knowledge Distillation [60.08025054715192]
We propose Contrastive Model Inversion, where the data diversity is explicitly modeled as an optimizable objective.
Our main observation is that, under the constraint of the same amount of data, higher data diversity usually indicates stronger instance discrimination.
Experiments on CIFAR-10, CIFAR-100, and Tiny-ImageNet demonstrate that CMI achieves significantly superior performance when the generated data are used for knowledge distillation.
arXiv Detail & Related papers (2021-05-18T15:13:00Z) - Enhancing Data-Free Adversarial Distillation with Activation
Regularization and Virtual Interpolation [19.778192371420793]
A data-free adversarial distillation framework deploys a generative network to transfer the teacher model's knowledge to the student model.
We add an activation regularizer and a virtual adversarial method to improve the data generation efficiency.
Our model's accuracy is 13.8% higher than the state-of-the-art data-free method on CIFAR-100.
arXiv Detail & Related papers (2021-02-23T11:37:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.