Related papers: LTD: Low Temperature Distillation for Robust Adversarial Training

LTD: Low Temperature Distillation for Robust Adversarial Training

URL: http://arxiv.org/abs/2111.02331v3
Date: Fri, 30 Jun 2023 06:56:18 GMT
Title: LTD: Low Temperature Distillation for Robust Adversarial Training
Authors: Erh-Chung Chen, Che-Rung Lee
Abstract summary: Adversarial training has been widely used to enhance the robustness of neural network models against adversarial attacks. Despite the popularity of neural network models, a significant gap exists between the natural and robust accuracy of these models. We propose a novel method called Low Temperature Distillation (LTD) that generates soft labels using the modified knowledge distillation framework.
Score: 1.3300217947936062
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Adversarial training has been widely used to enhance the robustness of neural network models against adversarial attacks. Despite the popularity of neural network models, a significant gap exists between the natural and robust accuracy of these models. In this paper, we identify one of the primary reasons for this gap is the common use of one-hot vectors as labels, which hinders the learning process for image recognition. Representing ambiguous images with one-hot vectors is imprecise and may lead the model to suboptimal solutions. To overcome this issue, we propose a novel method called Low Temperature Distillation (LTD) that generates soft labels using the modified knowledge distillation framework. Unlike previous approaches, LTD uses a relatively low temperature in the teacher model and fixed, but different temperatures for the teacher and student models. This modification boosts the model's robustness without encountering the gradient masking problem that has been addressed in defensive distillation. The experimental results demonstrate the effectiveness of the proposed LTD method combined with previous techniques, achieving robust accuracy rates of 58.19%, 31.13%, and 42.08% on CIFAR-10, CIFAR-100, and ImageNet data sets, respectively, without additional unlabeled data.

Related papers

MGD$^3$: Mode-Guided Dataset Distillation using Diffusion Models [50.2406741245418]
We propose a mode-guided diffusion model leveraging a pre-trained diffusion model.<n>Our approach addresses dataset diversity in three stages: Mode Discovery to identify distinct data modes, Mode Guidance to enhance intra-class diversity, and Stop Guidance to mitigate artifacts in synthetic samples.<n>Our method eliminates the need for fine-tuning diffusion models with distillation losses, significantly reducing computational costs.
arXiv Detail & Related papers (2025-05-25T03:40:23Z)
Learning from Stochastic Teacher Representations Using Student-Guided Knowledge Distillation [64.15918654558816]
Self-distillation (SSD) training strategy is introduced for filtering and weighting teacher representation to distill from task-relevant representations only. Experimental results on real-world affective computing, wearable/biosignal datasets from the UCR Archive, the HAR dataset, and image classification datasets show that the proposed SSD method can outperform state-of-the-art methods.
arXiv Detail & Related papers (2025-04-19T14:08:56Z)
Optimizing YOLOv5s Object Detection through Knowledge Distillation algorithm [37.37311465537091]
This paper explores the application of knowledge distillation technology in target detection tasks. By using YOLOv5l as the teacher network and a smaller YOLOv5s as the student network, we found that with the increase of distillation temperature, the student's detection accuracy gradually improved.
arXiv Detail & Related papers (2024-10-16T05:58:08Z)
What is Left After Distillation? How Knowledge Transfer Impacts Fairness and Bias [1.03590082373586]
As many as 41% of the classes are statistically significantly affected by distillation when comparing class-wise accuracy. This study highlights the uneven effects of Knowledge Distillation on certain classes and its potentially significant role in fairness.
arXiv Detail & Related papers (2024-10-10T22:43:00Z)
Knowledge Distillation with Refined Logits [31.205248790623703]
We introduce Refined Logit Distillation (RLD) to address the limitations of current logit distillation methods. Our approach is motivated by the observation that even high-performing teacher models can make incorrect predictions. Our method can effectively eliminate misleading information from the teacher while preserving crucial class correlations.
arXiv Detail & Related papers (2024-08-14T17:59:32Z)
One Step Diffusion-based Super-Resolution with Time-Aware Distillation [60.262651082672235]
Diffusion-based image super-resolution (SR) methods have shown promise in reconstructing high-resolution images with fine details from low-resolution counterparts. Recent techniques have been devised to enhance the sampling efficiency of diffusion-based SR models via knowledge distillation. We propose a time-aware diffusion distillation method, named TAD-SR, to accomplish effective and efficient image super-resolution.
arXiv Detail & Related papers (2024-08-14T11:47:22Z)
Guided Score identity Distillation for Data-Free One-Step Text-to-Image Generation [62.30570286073223]
Diffusion-based text-to-image generation models have demonstrated the ability to produce images aligned with textual descriptions. We introduce a data-free guided distillation method that enables the efficient distillation of pretrained Diffusion models without access to the real training data. By exclusively training with synthetic images generated by its one-step generator, our data-free distillation method rapidly improves FID and CLIP scores, achieving state-of-the-art FID performance while maintaining a competitive CLIP score.
arXiv Detail & Related papers (2024-06-03T17:44:11Z)
Reducing Spatial Fitting Error in Distillation of Denoising Diffusion Models [13.364271265023953]
Knowledge distillation for diffusion models is an effective method to address this limitation with a shortened sampling process. We attribute the degradation to the spatial fitting error occurring in the training of both the teacher and student model. SFERD utilizes attention guidance from the teacher model and a designed semantic gradient predictor to reduce the student's fitting error. We achieve an FID of 5.31 on CIFAR-10 and 9.39 on ImageNet 64$times$64 with only one step, outperforming existing diffusion methods.
arXiv Detail & Related papers (2023-11-07T09:19:28Z)
BOOT: Data-free Distillation of Denoising Diffusion Models with Bootstrapping [64.54271680071373]
Diffusion models have demonstrated excellent potential for generating diverse images. Knowledge distillation has been recently proposed as a remedy that can reduce the number of inference steps to one or a few. We present a novel technique called BOOT, that overcomes limitations with an efficient data-free distillation algorithm.
arXiv Detail & Related papers (2023-06-08T20:30:55Z)
Masked Images Are Counterfactual Samples for Robust Fine-tuning [77.82348472169335]
Fine-tuning deep learning models can lead to a trade-off between in-distribution (ID) performance and out-of-distribution (OOD) robustness. We propose a novel fine-tuning method, which uses masked images as counterfactual samples that help improve the robustness of the fine-tuning model.
arXiv Detail & Related papers (2023-03-06T11:51:28Z)
(Certified!!) Adversarial Robustness for Free! [116.6052628829344]
We certify 71% accuracy on ImageNet under adversarial perturbations constrained to be within a 2-norm of 0.5. We obtain these results using only pretrained diffusion models and image classifiers, without requiring any fine tuning or retraining of model parameters.
arXiv Detail & Related papers (2022-06-21T17:27:27Z)
Churn Reduction via Distillation [54.5952282395487]
We show an equivalence between training with distillation using the base model as the teacher and training with an explicit constraint on the predictive churn. We then show that distillation performs strongly for low churn training against a number of recent baselines.
arXiv Detail & Related papers (2021-06-04T18:03:31Z)
Contrastive Model Inversion for Data-Free Knowledge Distillation [60.08025054715192]
We propose Contrastive Model Inversion, where the data diversity is explicitly modeled as an optimizable objective. Our main observation is that, under the constraint of the same amount of data, higher data diversity usually indicates stronger instance discrimination. Experiments on CIFAR-10, CIFAR-100, and Tiny-ImageNet demonstrate that CMI achieves significantly superior performance when the generated data are used for knowledge distillation.
arXiv Detail & Related papers (2021-05-18T15:13:00Z)
Enhancing Data-Free Adversarial Distillation with Activation Regularization and Virtual Interpolation [19.778192371420793]
A data-free adversarial distillation framework deploys a generative network to transfer the teacher model's knowledge to the student model. We add an activation regularizer and a virtual adversarial method to improve the data generation efficiency. Our model's accuracy is 13.8% higher than the state-of-the-art data-free method on CIFAR-100.
arXiv Detail & Related papers (2021-02-23T11:37:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.